Storage

Storage is utilized to provide persistent data storage between container deployments and comes in a few options on Anvil.

The Ceph software is used to provide block, filesystem and object storage on the Anvil composable cluster. File storage provides an interface to access data in a file and folder hierarchy similar to NTFS or NFS. Block storage is a flexible type of storage that allows for snapshotting and is good for database workloads and generic container storage. Object storage is ideal for large unstructured data and features a REST based API providing an S3 compatible endpoint that can be utilized by the preexisting ecosystem of S3 client tools.

Note: The integrity of the Ceph storage components is accomplished via a redundant disk system (3x replication). RCAC currently provides no backup of Anvil Composable storage, either via snapshots or transfer of data to other storage . No disaster recovery other than the redundant disk systems is currently provided.

Link to section 'Storage Classes' of 'Storage' Storage Classes

Anvil Composable provides two different storage classes based on access characteristics needs of workloads.

anvil-block - Block storage based on SSDs that can be accessed by a single node (Single-Node Read/Write).
anvil-filesystem - File storage based on SSDs that can be accessed by multiple nodes (Many-Node Read/Write or Many-Node Read-Only)

Link to section 'Provisioning Block and Filesystem Storage for use in deployments' of 'Storage' Provisioning Block and Filesystem Storage for use in deployments

Block and Filesystem storage can both be provisioned in a similar way.

While deploying a Workload, select the Volumes drop down and click Add Volume…
Select “Add a new persistent volume (claim)”
Set a unique volume name, i.e. “<username>-volume”
Select a Storage Class. The default storage class is Ceph for this Kubernetes cluster
Request an amount of storage in Gigabytes
Click Define
Provide a Mount Point for the persistent volume: i.e /data

Link to section 'Backup Strategies' of 'Storage' Backup Strategies

Developers using the Anvil Composable Platform should have a backup strategy in place to ensure that your data is safe and can be recovered in case of a disaster. Below is a list of methods that can be used to backup data on Persistent Volume Claims.

Link to section 'Copying Files to and from a Container' of 'Storage' Copying Files to and from a Container

The kubectl cp command can be used to copy files into or out of a running container.

# get pod id you want to copy to/form
kubectl -n <namespace> get pods

# copy a file from local filesystem to remote pod
kubectl cp /tmp/myfile <namespace>/<pod>:/tmp/myfile

# copy a file from remote pod to local filesystem
kubectl cp <namespace>/<pod>:/tmp/myfile /tmp/myfile

This method requires the tar executable to be present in your container, which is usually the case with Linux image. More info can be found in the kubectl docs.

Link to section 'Copying Directories from a Container' of 'Storage' Copying Directories from a Container

The kubectl cp command can also be used to recusively copy entire directories to local storage or places like Data Depot.

# get pod id you want to copy to/form
kubectl -n <namespace> get pods

# copy a directory from remote pod to local filesystem
kubectl cp <namespace>/<pod>:/pvcdirectory /localstorage

Link to section 'Backing up a Database from a Container' of 'Storage' Backing up a Database from a Container

The kubectl exec command can be used to create a backup or dump of a database and save it to a local directory. For instance, to backup a MySQL database with kubectl, run the following commands from a local workstation or cluster frontend.

# get pod id of your database pod
kubectl -n <namespace> get pods

# run mysqldump in the remote pod and redirect the output to local storage
kubectl -n <namespace> exec <pod> -- mysqldump --user=<username> --password=<password> my_database > my_database_dump.sql

Link to section 'Backups using common Linux tools' of 'Storage' Backups using common Linux tools

If your container has the OpenSSH client or rsync packages installed, one can use the kubectl exec command to copy or synchonize another storage location.

# get pod id of your pod
kubectl -n <namespace> get pods

# run scp to transfer data from the pod to a remote storage location
kubectl -n <namespace> <pod> exec -- scp -r /data username@anvil.rcac.purdue.edu:~/backup

Link to section 'Automating Backups' of 'Storage' Automating Backups

Kubernetes CronJob resources can be used with the commands above to create an automated backup solution. For more information, refer to the Kubernetes documentation.

Link to section 'Accessing object storage externally from local machine using Cyberduck' of 'Storage' Accessing object storage externally from local machine using Cyberduck

Cyberduck is a free server and cloud storage browser that can be used to access the public S3 endpoint provided by Anvil.

Download and install Cyberduck
Launch Cyberduck
Click + Open Connection at the top of the UI.
Select S3 from the dropdown menu
Fill in Server, Access Key ID and Secret Access Key fields
Click Connect
You can now right click to bring up a menu of actions that can be performed against the storage endpoint

Further information about using Cyberduck can be found on the Cyberduck documentation site.