Restore & Download a Directory from S3 Glacier

27 July 2021

[aws]

Here's a set of setups to get your files restored and downloaded from AWS S3 Glacier.

First, which Glacier?

Amazon has a couple glaciers. One's called AWS Glacier. You store things there in Vaults. It's legacy, and you shouldn't use it. The other Glacier is in S3. This one's newer and has a unified API with S3. This is the one to use.

Whether something goes into the Glacier on S3 is determined by storage class per object. There is GLACIER and then DEEP_STORAGE class levels.

You'd put something into GLACIER or DEEP_STORAGE storage if you wanted AWS to store it on S3 but store it cheaper per GB with the tradeoff that you wouldn't access it often. Putting something into one of these levels means that the object stored is not immediately available. It's archived such that it requires a restoration in order to access.

Upload on any Bucket

Because the Glacier is in S3, you can upload as you usually would with the AWS CLI. Your bucket doesn't need configured for this. Using the AWS CLI, you add an extra parameter for the storage class on the specific objects you upload:

aws s3 cp <source_dir> s3://<bucket_name>/<path> \
  --storage-class GLACIER --recursive --profile <s3username>

You can also sync, diffing then matching source location objects to the destination location, using the same storage class parameter:

aws s3 sync <source_dir> s3://<bucket_name>/<path> \
  --storage-class GLACIER --profile <s3username>

Restore Before Use

If you want to access this object you've put into GLACIER-level storage, you'll need to restore it. Restoring the object creates a temporary copy in S3 of the object in the STANDARD storage class. You're charged for this separately from the original GLACIER object, so restore it for the minimum time period needed in order to save on costs.

To restore a single object, run:

aws s3api restore-object --bucket <bucket_name> \
  --restore-request '{"Days":1,"GlacierJobParameters":{"Tier":"Standard"}}' \
  --profile <s3username> --key <path_to_object>

To restore multiple objects, you'll have to list them and call restore on each in turn:

aws s3 ls s3://<bucket_name>/<path> --recursive --profile <s3username> | \
  awk '{print substr($0, index($0, $4))}' | \
  xargs -L 1 aws s3api restore-object --bucket <bucket_name> \
    --restore-request '{"Days":1,"GlacierJobParameters":{"Tier":"Standard"}}' \
    --profile <s3username> --key

This command first lists all objects matching the path given, then it splits the output with awk, printing only the object path and removing the preceding metadata columns. Finally, each object is restored one by one.

Once the command has completed and restoration is requested, restoration can take hours to days to complete on the AWS side, depending on storage level and retrieval mode. To check on the status of an object's restoration process, run:

aws s3api head-object --bucket <bucket_name> \
  --key <path_to_object> --profile <s3username>

In the JSON output, you're looking for this line to report that the restoration is not ongoing (but has finished) and that the temporary restoration time window still has not passed:

{ 
  "Restore": "ongoing-request=\"false\", expiry-date=\"Wed, 28 Jul 2021 00:00:00 GMT\"",0
}

Download an Entire Directory

You can use the cp or sync commands in reverse to download the files from S3 to a local path. When done in bulk, as in a recursive copy of directories, the glacier restoration status is ignored and the transfer simply fails. You need to restore first (or it will still fail), but then you will also need to run your command with the flag --force-glacier-transfer, like so:

aws s3 sync s3://<bucket_name>/<path> <destination_path> \
  --force-glacier-transfer --profile <s3username>

And there you have it. You have round-tripped you data -- to the glacier and back again, an cold object's tale.

How else do you handle the restoration and download of your AWS S3 Glacier data?