You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When --data_dir points to a cloud storage like S3, we also have to specify a local --metadata_dir because the downloader script doesn't support saving metadata to cloud storage.
The last pip install on the setup_commands section is needed for compatibility with AWS S3, because the required libraries aren't included in the conda environment file.
There is no need to provide additional AWS credentials if the destination bucket is on the same account as the cluster, because it already has S3 full access through an instance profile.
While the cluster has a default instance profile that grants full S3 access, it doesn't seem to work as intended (probably due to rate limit of IMDS endpoint), and I ended up having to pass my local AWS credentials as environment variables.
The Python version in environment.yml must match the Python version of the Ray cluster; make sure that docker.image on cluster.yaml has exactly the same version as the environment.yml from this project.
Usage
Cluster creation
Job submission
Note
Image shards would be saved to the
datacomp-smallAWS S3 bucket, specified with the--data_diroption.Cluster deletion
Configuration
Sample
cluster.ymlObscure details
When
--data_dirpoints to a cloud storage like S3, we also have to specify a local--metadata_dirbecause the downloader script doesn't support saving metadata to cloud storage.The last
pip installon thesetup_commandssection is needed for compatibility with AWS S3, because the required libraries aren't included in thecondaenvironment file.There is no need to provide additional AWS credentials if the destination bucket is on the same account as the cluster, because it already has S3 full access through an instance profile.The Python version in
environment.ymlmust match the Python version of the Ray cluster; make sure thatdocker.imageoncluster.yamlhas exactly the same version as theenvironment.ymlfrom this project.