JetS3t Synchronize
Synchronize is a console (text mode) Java application for synchronizing directories on a computer with an Amazon S3 account.
It is freely available as part of the JetS3t project which provides applications and a Java toolkit for Amazon's S3.
Synchronize offers the following capabilities:
- Upload a directory and all its contents to S3
- Download a directory and all its contents from S3
- Automatically compress (gzip) and/or encrypt files sent to S3
- Sophisticated file comparisons used to determine whether files have changed, so only new or changed files are transferred
- Upload any number of files and/or directories at one time
- Access Control List permissions of uploaded files can be set to PRIVATE, PUBLIC_READ or PUBLIC_READ_WRITE
- When uploading files, specific file/directory paths can be ignored using .jets3t-ignore settings files
Getting Started
Synchronize can be run from the command line using the scripts included in the bin directory of the JetS3t distribution.
For Windows computers use the script synchronize.bat.
For Unixy computers, use the script synchronize.sh.
Files are copied to S3 with an UP(load) operation, and copied from S3 with a DOWN(load). By default, files are only transferred only if they are new or have changed relative to the destination.
Properties File
Synchronize requires a Java properties file named synchronize.properties to be available in the classpath. This properties file must include at least the properties accesskey and secretkey, and may also require the property password if you use the crypto option. This file is a plain text file that will look something like this:
#################################### # Synchronize application properties # # This file must be available on the # classpath when Synchronize is run #################################### # AWS Access Key (required) accesskey=<YourAWSAccessKey> # AWS Secret Key (required) secretkey=<YourAWSSecretKey> # Access Control List setting to apply to uploads, must be one of: PRIVATE, PUBLIC_READ, PUBLIC_READ_WRITE # The ACL setting defaults to PRIVATE if this setting is missing. acl=PRIVATE # Password used when encrypting/decrypting files. Only required when the --crypto setting is used. # password=
We use a properties file to store this information as we believe it will help keep your passwords secret. Storing them in a properties file is slightly more secure that typing them on the command line, and depending on the operating system you use it should be possible to set the access permissions on this file so no-one but you can read it.
Usage Instructions
To view Synchronize's usage instructions, run synchronize.sh --help. These instructions describe the command-line parameters required by Synchronize and the options available. To see some example commands see the Examples section below.
Usage: Synchronize [options] UP (...)
or: Synchronize [options] DOWN
UP : Synchronize the contents of the Local Directory with S3.
DOWN : Synchronize the contents of S3 with the Local Directory
S3Path : A path to the resource in S3. This must include at least the
bucket name, but may also specify a path inside the bucket.
E.g. /Backups/Documents/20060623
File/Directory : A file or directory on your computer to upload
DownloadDirectory : A directory on your computer where downloaded files
will be stored
A property file with the name 'synchronize.properties' must be available in the
classpath and contains the following properties:
accesskey : Your AWS Access Key (Required)
secretkey : Your AWS Secret Key (Required)
password : Encryption password (only required when using crypto)
acl : ACL permissions for uploads (optional)
For more help : Synchronize --help
Options
-------
-h | --help
Displays this help message.
-n | --noaction
No action taken. No files will be changed locally or on S3, instead
a report will be generating showing what will happen if the command
is run without the -n option.
-q | --quiet
Runs quietly, without reporting on each action performed or displaying
progress messages. The summary is still displayed.
-p | --noprogress
Runs somewhat quietly, without displaying progress messages.
The action report and overall summary are still displayed.
-f | --force
Force tool to perform synchronization even when files are up-to-date.
This may be useful if you need to update metadata or timestamps in S3.
-k | --keepfiles
Keep outdated files on destination instead of reverting/removing them.
This option cannot be used with --nodelete.
-d | --nodelete
Keep files on destination that have been removed from the source. This
option is similar to --keepfiles except that files may be reverted.
This option cannot be used with --keepfiles.
-g | --gzip
Compress (GZip) files when backing up and Decompress gzipped files
when restoring.
-c | --crypto
Encrypt files when backing up and decrypt encrypted files when restoring. If
this option is specified the properties must contain a password.
--properties <filename>
Load the synchronizer app properties from the given file instead of from
a synchronizer.properties file in the classpath.
--acl <ACL string>
Specifies the Access Control List setting to apply. This value must be one
of: PRIVATE, PUBLIC_READ, PUBLIC_READ_WRITE. This setting will override any
acl property specified in the synchronize.properties file
Report
------
Report items are printed on a single line with an action flag followed by
the relative path of the file or S3 object. The flag meanings are...
N: A new file/object will be created
U: An existing file/object has changed and will be updated
D: A file/object existing on the target does not exist on the source and
will be deleted.
d: A file/object existing on the target does not exist on the source but
because the --keepfiles or --nodelete option was set it was not deleted.
R: An existing file/object has changed more recently on the target than on the
source. The target version will be reverted to the older source version
r: An existing file/object has changed more recently on the target than on the
source but because the --keepfiles option was set it was not reverted.
-: The file identical locally and in S3, no action is necessary.
F: A file identical locally and in S3 was updated due to the Force option.
WARNING: Be very
careful when restoring files from S3 to a directory that
already contains files. By default Synchronize will delete
any files in the target directory that are not present in
S3, as it is helpfully synchronizing the contents of your
the directory on your computer with the contents of S3.
Mind you, if you do this for the wrong directory you can
lose all your files!
For this reason, always test your Synchronize
commands with the --noaction option before you run
them for real - just to make sure you are not about to do
something you will later regret.
Examples
The best way to get the hang of Synchronize is to experiment with the commands on some test files you don't care about, and use the JetS3t Cockpit application to see how uploads are stored in S3.
Before you start, modify the sample properties text file called synchronize.properties located in the configs directory to include your own S3 Access Key and Secret Key settings (see Getting Started above). Use the synchronize run scripts provided in the bin directory to run Synchronize.
Backing up files in S3
Let's say you have two directories containing important files (eg Documents and Reports) and you want to back them up to an S3 bucket called MyBackups (note that you should really use a more unique bucket name, like <MyAWSAccessKey>.MyBackups):
synchronize.sh UP MyBackups Documents Reports
After you have run this command once on your own computer and uploaded some files, try running the same command again. Synchronize will look at the contents of your S3 account and work out that it already contains your files, so it will not have any work to do.
Now add a file or two to your Documents directory, and perhaps change one of the files as well. This time when you run the command, Synchronize will upload the new and changed files. Alternately, you can run the command with the --noaction option to make Synchronize tell you what it would do without actually doing the work:
synchronize.sh --noaction UP MyBackups Documents Reports
Restoring files from S3
There are a few cases where you might want to restore a directory from S3. The simplest cases are when none of the files exist on your computer, for example if you have deleted the whole directory by mistake (oops!), or you want to download a copy of this directory to a second computer.
Let's simulate this simple case by downloading your S3 Documents to a new directory:
mkdir NewDocumentsDirectory synchronize.sh DOWN MyBackups NewDocumentsDirectory
Synchronize will download all the contents of the S3 directory to the new directory name. Like the UPLOAD example above, re-running this command will do nothing as Synchronize will detect that your NewDocumentsDirectory directory has the same contents as the S3 directory.
This same command can restore missing files, such as if you accidently deleted a single file in your Documents directory. Try it now, delete one of the files in NewDocumentsDirectory and re-run the command to restore it.
Changed files
Things can get more complicated when you already have files that are in S3, but the files' contents do not match. For example, let's say that you backed up your documents earlier but after doing this one of your documents somehow got corrupted. In this case the default Synchronize DOWNLOAD command above will restore the file by reverting it to the backed-up version from S3.
Warning! To repeat: by default, Synchronize will revert changed files when downloading. That is, it will replace changed files on your computer with older versions from S3. So be careful!
If you want to keep files you have updated and only download files not already on your computer you can prevent Synchronize from reverting files with the --keepfiles option. Let's say that you have changed a number of files in your NewDocumentsDir directory but one of them has been corrupted. You want to revert the one corrupt file but keep the others. To do this, delete the corrupted file then run the following Synchronize command:
synchronize.sh --keepfiles DOWN MyBackups NewDocumentsDirectory
With the keepfiles option, Synchronize will replace any missing files (ie the corrupted one) but leave the changed files alone.
Options, Options
This sections describes the Synchronize options in more detail.
noaction: Synchronize will not perform any action, and will not upload or download any files, but it will print reports and summaries as if it was run normally. This option is very useful for checking what actions a Synchronize command will perform before running the command for real.
quiet: Synchronize will only print out a summary of its actions instead of a line-by-line report describing the action taken for each file.
force: Synchronize will upload/download files even when it thinks they have not been changed. You might want to do this if you're worried that Synchronize isn't correctly identifying which files have changed, and you want to force it to update every file.
keepfiles: This option tells Synchronize to keeps files that it would otherwise revert or delete. With this option set, files that have been updated on the destination compared to the source directory, which would normally be reverted, will be left alone. Also, files on the destination that have been deleted in the source directory will be left in place, rather than deleted.
Note: The keepfiles option can sometimes be convenient but isn't intended for regular use. In effect it prevents Synchronize from doing its main job, which is to maintain an identical directory structure between your computer and S3. If you have to use it regularly, Synchronize probably isn't the right tool for what you're trying to do.
gzip: Files are compressed to gzip files prior to being uploaded, and are decompressed when being downloaded. Note If Gzipped files are downloaded without this option they will not be decompressed, and will not have any file extension (like .gz) to indicate that they are gzip files. It will be your responsibility to decompress these files.
crypto: Files are encrypted with the password specified in the Properties File's password setting prior to being uploaded, or are decrypted with this password when being downloaded. Note If encrypted files are downloaded without this option they will not be decrypted, and will not have any file extension to indicate that they are encrypted files. It will be your responsibility to decrypt these files.
Notes
Compressing/encrypting uploads: Synchronize will create temporary files when used with any upload options that change the contents of uploaded files, such as compressing or encrypting them. This means that you will need up to twice as much free space in your default temp directory as taken by the files you intend to upload.