Cloud | SAP Basis Administrator

The gsutil cp command allows you to copy data between your local file system and the cloud, within the cloud, and between cloud storage providers. For example, to upload all text files from the local directory to a bucket, you can run:

gsutil cp *.txt gs://my-bucket

You can also download data from a bucket. The following command downloads all text files from the top-level of a bucket to your current directory:

gsutil cp gs://my-bucket/*.txt .

You can use the -n option to prevent overwriting the content of existing files. The following example downloads text files from a bucket without clobbering the data in your directory:

gsutil cp -n gs://my-bucket/*.txt .

Use the -r option to copy an entire directory tree. For example, to upload the directory tree dir:

gsutil cp -r dir gs://my-bucket

If you have a large number of files to transfer, you can perform a parallel multi-threaded/multi-processing copy using the top-level gsutil -m option (see gsutil help options):

gsutil -m cp -r dir gs://my-bucket

You can use the -I option with stdin to specify a list of URLs to copy, one per line. This allows you to use gsutil in a pipeline to upload or download objects as generated by a program:

cat filelist | gsutil -m cp -I gs://my-bucket

or:

cat filelist | gsutil -m cp -I ./download_dir

where the output of cat filelist is a list of files, cloud URLs, and wildcards of files and cloud URLs.

Copying To/From Subdirectories; Distributing Transfers Across Machines

You can use gsutil to copy to and from subdirectories by using a command like this:

gsutil cp -r dir gs://my-bucket/data

This causes dir and all of its files and nested subdirectories to be copied under the specified destination, resulting in objects with names like gs://my-bucket/data/dir/a/b/c. Similarly, you can download from bucket subdirectories using the following command:

gsutil cp -r gs://my-bucket/data dir

This causes everything nested under gs://my-bucket/data to be downloaded into dir, resulting in files with names like dir/data/a/b/c.

Copying subdirectories is useful if you want to add data to an existing bucket directory structure over time. It’s also useful if you want to parallelize uploads and downloads across multiple machines (potentially reducing overall transfer time compared with running gsutil -m cp on one machine). For example, if your bucket contains this structure:

gs://my-bucket/data/result_set_01/
gs://my-bucket/data/result_set_02/
...
gs://my-bucket/data/result_set_99/
you can perform concurrent downloads across 3 machines by running these commands on each machine, respectively:


gsutil -m cp -r gs://my-bucket/data/result_set_[0-3]* dir
gsutil -m cp -r gs://my-bucket/data/result_set_[4-6]* dir
gsutil -m cp -r gs://my-bucket/data/result_set_[7-9]* dir
Note that dir could be a local directory on each machine, or a directory mounted off of a shared file server. The performance of the latter depends on several factors, so we recommend experimenting to find out what works best for your computing environment

Copying In The Cloud And Metadata Preservation
When copying in the cloud, if the destination bucket has Object Versioning enabled, by default gsutil cp copies only live versions of the source object. For example, the following command causes only the single live version of gs://bucket1/obj to be copied to gs://bucket2, even if there are noncurrent versions of gs://bucket1/obj:


gsutil cp gs://bucket1/obj gs://bucket2
To also copy noncurrent versions, use the -A flag:


gsutil cp -A gs://bucket1/obj gs://bucket2
The top-level gsutil -m flag is not allowed when using the cp -A flag.

SAP Basis Administrator

Solution for your sap basis activity

Category Archives: Cloud

cp – Copy files and objects in Google Cloud

Copying To/From Subdirectories; Distributing Transfers Across Machines