The gsutil cp
command allows you to copy data between your local file system and the cloud, within the cloud, and between cloud storage providers. For example, to upload all text files from the local directory to a bucket, you can run:
gsutil cp *.txt gs://my-bucket
You can also download data from a bucket. The following command downloads all text files from the top-level of a bucket to your current directory:
gsutil cp gs://my-bucket/*.txt .
You can use the -n
option to prevent overwriting the content of existing files. The following example downloads text files from a bucket without clobbering the data in your directory:
gsutil cp -n gs://my-bucket/*.txt .
Use the -r
option to copy an entire directory tree. For example, to upload the directory tree dir
:
gsutil cp -r dir gs://my-bucket
If you have a large number of files to transfer, you can perform a parallel multi-threaded/multi-processing copy using the top-level gsutil -m
option (see gsutil help options):
gsutil -m cp -r dir gs://my-bucket
You can use the -I
option with stdin
to specify a list of URLs to copy, one per line. This allows you to use gsutil in a pipeline to upload or download objects as generated by a program:
cat filelist | gsutil -m cp -I gs://my-bucket
or:
cat filelist | gsutil -m cp -I ./download_dir
where the output of cat filelist
is a list of files, cloud URLs, and wildcards of files and cloud URLs.
Copying To/From Subdirectories; Distributing Transfers Across Machines
You can use gsutil to copy to and from subdirectories by using a command like this:
gsutil cp -r dir gs://my-bucket/data
This causes dir
and all of its files and nested subdirectories to be copied under the specified destination, resulting in objects with names like gs://my-bucket/data/dir/a/b/c
. Similarly, you can download from bucket subdirectories using the following command:
gsutil cp -r gs://my-bucket/data dir
This causes everything nested under gs://my-bucket/data
to be downloaded into dir
, resulting in files with names like dir/data/a/b/c
.
Copying subdirectories is useful if you want to add data to an existing bucket directory structure over time. It’s also useful if you want to parallelize uploads and downloads across multiple machines (potentially reducing overall transfer time compared with running gsutil -m cp
on one machine). For example, if your bucket contains this structure:
gs://my-bucket/data/result_set_01/ gs://my-bucket/data/result_set_02/ ... gs://my-bucket/data/result_set_99/ you can perform concurrent downloads across 3 machines by running these commands on each machine, respectively: gsutil -m cp -r gs://my-bucket/data/result_set_[0-3]* dir gsutil -m cp -r gs://my-bucket/data/result_set_[4-6]* dir gsutil -m cp -r gs://my-bucket/data/result_set_[7-9]* dir Note that dir could be a local directory on each machine, or a directory mounted off of a shared file server. The performance of the latter depends on several factors, so we recommend experimenting to find out what works best for your computing environment Copying In The Cloud And Metadata Preservation When copying in the cloud, if the destination bucket has Object Versioning enabled, by default gsutil cp copies only live versions of the source object. For example, the following command causes only the single live version of gs://bucket1/obj to be copied to gs://bucket2, even if there are noncurrent versions of gs://bucket1/obj: gsutil cp gs://bucket1/obj gs://bucket2 To also copy noncurrent versions, use the -A flag: gsutil cp -A gs://bucket1/obj gs://bucket2 The top-level gsutil -m flag is not allowed when using the cp -A flag.