Transferring Data from a VM to External Cloud Storage
Introduction
This guide provides instructions for users on how to transfer data from a GPU Virtual Machine (VM) running Ubuntu or Debian to external storage solutions. This can include cloud storage services or personal hardware. The guide also covers how to persist your environment, such as Python and Conda settings, ensuring that your AI/ML software and data are safely backed up.
Prerequisites
- Access to a GPU VM running Ubuntu or Debian.
- Sufficient permissions to install software and execute commands on the VM.
- Access to the destination storage solution (cloud storage credentials or physical storage device).
Step 1: Prepare the VM
Ensure your VM is up to date:
sudo apt update && sudo apt upgrade -y
Install necessary utilities for transferring files (if not already installed):
sudo apt install rsync -y
Step 2: Backing Up Your Environment
To back up your Python or Conda environment, create an environment file:
For Conda environments:
conda env export > environment.yml
For Python virtual environments:
pip freeze > requirements.txt
This file should be included with your data backup to recreate your working environment later.
Step 3: Transferring Data to External Storage
To Cloud Storage
For cloud storage services like AWS S3, Google Cloud Storage, or Azure Blob Storage, first ensure you have the respective CLI tools installed and configured on your VM.
AWS S3 Example
- Install AWS CLI:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
- Configure AWS CLI with your credentials:
aws configure
- Copy data to S3:
aws s3 cp /path/to/your/data s3://your-bucket-name/your-folder/ --recursive
Google Cloud Storage Example
- Install and initialize the Google Cloud SDK:
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
gcloud init
- Copy data to GCS:
gsutil cp -r /path/to/your/data gs://your-bucket-name/your-folder/
Step 4: Verifying the Transfer
After transferring, verify that all files have been correctly copied to your external storage solution. This can typically be done by comparing file sizes or using checksums for integrity verification.