Storage
Storage is utilized to provide persistent data storage between container deployments and comes in a few options on Anvil.
The Ceph software is used to provide block, filesystem and object storage on the Anvil composable cluster. File storage provides an interface to access data in a file and folder hierarchy similar to NTFS or NFS. Block storage is a flexible type of storage that allows for snapshotting and is good for database workloads and generic container storage. Object storage is ideal for large unstructured data and features a REST based API providing an S3 compatible endpoint that can be utilized by the preexisting ecosystem of S3 client tools.
Link to section 'Storage Classes' of 'Storage' Storage Classes
Anvil Composable provides two different storage classes based on access characteristics needs of workloads.
- anvil-block - Block storage based on SSDs that can be accessed by a single node (Single-Node Read/Write).
anvil-filesystem
- File storage based on SSDs that can be accessed by multiple nodes (Many-Node Read/Write or Many-Node Read-Only)
Link to section 'Provisioning Block and Filesystem Storage for use in deployments' of 'Storage' Provisioning Block and Filesystem Storage for use in deployments
Block and Filesystem storage can both be provisioned in a similar way.
-
While deploying a Workload, select the Volumes drop down and click Add Volume…
-
Select “Add a new persistent volume (claim)”
-
Set a unique volume name, i.e. “<username>-volume”
-
Select a Storage Class. The default storage class is Ceph for this Kubernetes cluster
-
Request an amount of storage in Gigabytes
-
Click Define
-
Provide a Mount Point for the persistent volume: i.e /data
Link to section 'Backup Strategies' of 'Storage' Backup Strategies
Developers using the Anvil Composable Platform should have a backup strategy in place to ensure that your data is safe and can be recovered in case of a disaster. Below is a list of methods that can be used to backup data on Persistent Volume Claims.
Link to section 'Copying Files to and from a Container' of 'Storage' Copying Files to and from a Container
The kubectl cp
command can be used to copy files into or out of a running container.
# get pod id you want to copy to/form
kubectl -n <namespace> get pods
# copy a file from local filesystem to remote pod
kubectl cp /tmp/myfile <namespace>/<pod>:/tmp/myfile
# copy a file from remote pod to local filesystem
kubectl cp <namespace>/<pod>:/tmp/myfile /tmp/myfile
This method requires the tar
executable to be present in your container, which is usually the case with Linux image. More info can be found in the kubectl docs.
Link to section 'Copying Directories from a Container' of 'Storage' Copying Directories from a Container
The kubectl cp
command can also be used to recusively copy entire directories to local storage or places like Data Depot.
# get pod id you want to copy to/form
kubectl -n <namespace> get pods
# copy a directory from remote pod to local filesystem
kubectl cp <namespace>/<pod>:/pvcdirectory /localstorage
Link to section 'Backing up a Database from a Container' of 'Storage' Backing up a Database from a Container
The kubectl exec
command can be used to create a backup or dump of a database and save it to a local directory. For instance, to backup a MySQL database with kubectl, run the following commands from a local workstation or cluster frontend.
# get pod id of your database pod
kubectl -n <namespace> get pods
# run mysqldump in the remote pod and redirect the output to local storage
kubectl -n <namespace> exec <pod> -- mysqldump --user=<username> --password=<password> my_database > my_database_dump.sql
Link to section 'Backups using common Linux tools' of 'Storage' Backups using common Linux tools
If your container has the OpenSSH client or rsync packages installed, one can use the kubectl exec
command to copy or synchonize another storage location.
# get pod id of your pod
kubectl -n <namespace> get pods
# run scp to transfer data from the pod to a remote storage location
kubectl -n <namespace> <pod> exec -- scp -r /data username@anvil.rcac.purdue.edu:~/backup
Link to section 'Automating Backups' of 'Storage' Automating Backups
Kubernetes CronJob resources can be used with the commands above to create an automated backup solution. For more information, refer to the Kubernetes documentation.
Link to section 'Accessing object storage externally from local machine using Cyberduck' of 'Storage' Accessing object storage externally from local machine using Cyberduck
Cyberduck is a free server and cloud storage browser that can be used to access the public S3 endpoint provided by Anvil.
-
Launch Cyberduck
-
Click + Open Connection at the top of the UI.
-
Select S3 from the dropdown menu
-
Fill in Server, Access Key ID and Secret Access Key fields
-
Click Connect
-
You can now right click to bring up a menu of actions that can be performed against the storage endpoint
Further information about using Cyberduck can be found on the Cyberduck documentation site.