org.jets3t.service.utils
Class FileComparer

java.lang.Object
  extended by org.jets3t.service.utils.FileComparer

public class FileComparer
extends java.lang.Object

File comparison utility to compare files on the local computer with objects present in a service account and determine whether there are any differences. This utility contains methods to build maps of the contents of the local file system or service account for comparison, and methods to find differences in these maps.

File comparisons are based primarily on MD5 hashes of the files' contents. If a local file does not match an object in the service with the same name, this utility determine which of the items is newer by comparing the last modified dates.


Nested Class Summary
 class FileComparer.PartialObjectListing
           
 
Constructor Summary
FileComparer(Jets3tProperties jets3tProperties)
          Constructs the class.
 
Method Summary
 FileComparerResults buildDiscrepancyLists(java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, java.util.Map<java.lang.String,StorageObject> objectsMap)
          Compares the contents of a directory on the local file system with the contents of a service resource.
 FileComparerResults buildDiscrepancyLists(java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, java.util.Map<java.lang.String,StorageObject> objectsMap, BytesProgressWatcher progressWatcher)
          Compares the contents of a directory on the local file system with the contents of a service resource.
 FileComparerResults buildDiscrepancyLists(java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, java.util.Map<java.lang.String,StorageObject> objectsMap, BytesProgressWatcher progressWatcher, boolean isForceUpload)
          Compares the contents of a directory on the local file system with the contents of a service resource.
protected  java.util.List<java.util.regex.Pattern> buildIgnoreRegexpList(java.io.File directory, java.util.List<java.util.regex.Pattern> parentIgnorePatternList)
          If a .jets3t-ignore file is present in the given directory, the file is read and all the paths contained in it are coverted to regular expression Pattern objects.
 java.util.Map<java.lang.String,java.lang.String> buildObjectKeyToFilepathMap(java.io.File[] fileList, java.lang.String fileKeyPrefix, boolean includeDirectories)
          Builds a map of files and directories that exist on the local system, where the map keys are the object key names that will be used for the files in a remote storage service, and the map values are absolute paths (Strings) to that file in the local file system.
protected  void buildObjectKeyToFilepathMapForDirectory(java.io.File directory, java.lang.String fileKeyPrefix, java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, boolean includeDirectories, java.util.List<java.util.regex.Pattern> parentIgnorePatternList)
          Recursively builds a map of object key names to file paths that contains all the files and directories inside the given directory.
 java.util.Map<java.lang.String,StorageObject> buildObjectMap(StorageService service, java.lang.String bucketName, java.lang.String targetPath, java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, boolean forceMetadataDownload, boolean isForceUpload, BytesProgressWatcher progressWatcher, StorageServiceEventListener eventListener)
          Builds a service Object Map containing all the objects within the given target path, where the map's key for each object is the relative path to the object.
 FileComparer.PartialObjectListing buildObjectMapPartial(StorageService service, java.lang.String bucketName, java.lang.String targetPath, java.lang.String priorLastKey, java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, boolean completeListing, boolean forceMetadataDownload, boolean isForceUpload, BytesProgressWatcher progressWatcher, StorageServiceEventListener eventListener)
          Builds a service Object Map containing a partial set of objects within the given target path, where the map's key for each object is the relative path to the object.
 byte[] generateFileMD5Hash(java.io.File file, java.lang.String relativeFilePath, BytesProgressWatcher progressWatcher)
           
static FileComparer getInstance()
           
static FileComparer getInstance(Jets3tProperties jets3tProperties)
           
 java.io.File getMd5FilesRootDirectoryFile()
           
 boolean isAssumeLocalLatestInMismatch()
           
 boolean isGenerateMd5Files()
           
protected  boolean isIgnored(java.util.List<java.util.regex.Pattern> ignorePatternList, java.io.File file)
          Determines whether a file should be ignored when building a file map.
 boolean isSkipMd5FileUpload()
           
 boolean isSkipSymlinks()
           
 boolean isUseMd5Files()
           
 StorageObject[] listObjectsThreaded(StorageService service, java.lang.String bucketName, java.lang.String targetPath)
          Lists the objects in a bucket using a partitioning technique to divide the object namespace into separate partitions that can be listed by multiple simultaneous threads.
 StorageObject[] listObjectsThreaded(StorageService service, java.lang.String bucketName, java.lang.String targetPath, java.lang.String delimiter, int toDepth)
          Lists the objects in a bucket using a partitioning technique to divide the object namespace into separate partitions that can be listed by multiple simultaneous threads.
 java.util.Map<java.lang.String,StorageObject> lookupObjectMetadataForPotentialClashes(StorageService service, java.lang.String bucketName, java.lang.String targetPath, StorageObject[] objectsWithoutMetadata, java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap, boolean forceMetadataDownload, boolean isForceUpload, BytesProgressWatcher progressWatcher, StorageServiceEventListener eventListener)
          Given a set of storage objects for which only minimal information is available, retrieve metadata information for any objects that potentially clash with local files.
protected  java.lang.String normalizeUnicode(java.lang.String str)
          Normalize string into "Normalization Form Canonical Decomposition" (NFD).
 java.util.Map<java.lang.String,StorageObject> populateObjectMap(java.lang.String targetPath, StorageObject[] objects)
          Builds a map of key/object pairs each object is associated with a key based on its location in the service target path.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FileComparer

public FileComparer(Jets3tProperties jets3tProperties)
Constructs the class.

Parameters:
jets3tProperties - the object containing the properties that will be applied in this class.
Method Detail

getInstance

public static FileComparer getInstance(Jets3tProperties jets3tProperties)
Parameters:
jets3tProperties - the object containing the properties that will be applied in the instance.
Returns:
a FileComparer instance.

getInstance

public static FileComparer getInstance()
Returns:
a FileComparer instance initialized with the default JetS3tProperties object.

buildIgnoreRegexpList

protected java.util.List<java.util.regex.Pattern> buildIgnoreRegexpList(java.io.File directory,
                                                                        java.util.List<java.util.regex.Pattern> parentIgnorePatternList)
If a .jets3t-ignore file is present in the given directory, the file is read and all the paths contained in it are coverted to regular expression Pattern objects. If the parent directory's list of patterns is provided, any relevant patterns are also added to the ignore listing. Relevant parent patterns are those with a directory prefix that matches the current directory, or with the wildcard depth pattern (*.*./).

Parameters:
directory - a directory that may contain a .jets3t-ignore file. If this parameter is null or is actually a file and not a directory, an empty list will be returned.
parentIgnorePatternList - a list of Patterns that were applied to the parent directory of the given directory. If this parameter is null, no parent ignore patterns are applied.
Returns:
a list of Pattern objects representing the paths in the ignore file. If there is no ignore file, or if it has no contents, the list returned will be empty.

isIgnored

protected boolean isIgnored(java.util.List<java.util.regex.Pattern> ignorePatternList,
                            java.io.File file)
Determines whether a file should be ignored when building a file map. A file may be ignored in two situations: 1) if it matches a regular expression pattern in the given list of ignore patterns, or 2) if it is a symlink/alias and the JetS3tProperties setting "filecomparer.skip-symlinks" is true.

Parameters:
ignorePatternList - a list of Pattern objects representing the file names to ignore.
file - a file that will either be ignored or not, depending on whether it matches an ignore Pattern or is a symlink/alias.
Returns:
true if the file should be ignored, false otherwise.

normalizeUnicode

protected java.lang.String normalizeUnicode(java.lang.String str)
Normalize string into "Normalization Form Canonical Decomposition" (NFD). References: http://stackoverflow.com/questions/3610013 http://en.wikipedia.org/wiki/Unicode_equivalence

Parameters:
str -
Returns:
string normalized into NFC form.

buildObjectKeyToFilepathMap

public java.util.Map<java.lang.String,java.lang.String> buildObjectKeyToFilepathMap(java.io.File[] fileList,
                                                                                    java.lang.String fileKeyPrefix,
                                                                                    boolean includeDirectories)
Builds a map of files and directories that exist on the local system, where the map keys are the object key names that will be used for the files in a remote storage service, and the map values are absolute paths (Strings) to that file in the local file system. The entire local file hierarchy within the given set of files and directories is traversed (i.e. sub-directories are included.)

A file/directory hierarchy is represented using '/' delimiter characters in object key names.

Any file or directory matching a path in a .jets3t-ignore file will be ignored.

Parameters:
fileList - the set of files and directories to include in the file map.
fileKeyPrefix - A prefix added to each file path key in the map, e.g. the name of the root directory the files belong to. If provided, a '/' suffix is always added to the end of the prefix. If null or empty, no prefix is used.
includeDirectories - If true all directories, including empty ones, will be included in the Map. These directories will be mere place-holder objects with a trailing slash (/) character in the name and the content type Mimetypes.MIMETYPE_BINARY_OCTET_STREAM. If this variable is false directory objects will not be included in the Map, and it will not be possible to store empty directories in the service.
Returns:
a Map of file path keys to File objects.

buildObjectKeyToFilepathMapForDirectory

protected void buildObjectKeyToFilepathMapForDirectory(java.io.File directory,
                                                       java.lang.String fileKeyPrefix,
                                                       java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                       boolean includeDirectories,
                                                       java.util.List<java.util.regex.Pattern> parentIgnorePatternList)
Recursively builds a map of object key names to file paths that contains all the files and directories inside the given directory. The map keys are the object key names that will be used for the files in a remote storage service, and the map values are absolute paths (Strings) to that file in the local file system.

A file/directory hierarchy is represented using '/' delimiter characters in object key names.

Any file or directory matching a path in a .jets3t-ignore file will be ignored.

Parameters:
directory - The directory containing the files/directories of interest. The directory is not included in the result map.
fileKeyPrefix - A prefix added to each file path key in the map, e.g. the name of the root directory the files belong to. This prefix must end with a '/' character.
objectKeyToFilepathMap - map of '/'-delimited object key names to local file absolute paths, to which this method adds items.
includeDirectories - If true all directories, including empty ones, will be included in the Map. These directories will be mere place-holder objects with a trailing slash (/) character in the name and the content type Mimetypes.MIMETYPE_BINARY_OCTET_STREAM. If this variable is false directory objects will not be included in the Map, and it will not be possible to store empty directories in the service.
parentIgnorePatternList - a list of Patterns that were applied to the parent directory of the given directory. This list will be checked to see if any of the parent's patterns should apply to the current directory. See buildIgnoreRegexpList(File, List) for more information. If this parameter is null, no parent ignore patterns are applied.

listObjectsThreaded

public StorageObject[] listObjectsThreaded(StorageService service,
                                           java.lang.String bucketName,
                                           java.lang.String targetPath,
                                           java.lang.String delimiter,
                                           int toDepth)
                                    throws ServiceException
Lists the objects in a bucket using a partitioning technique to divide the object namespace into separate partitions that can be listed by multiple simultaneous threads. This method divides the object namespace using the given delimiter, traverses this space up to the specified depth to identify prefix names for multiple "partitions", and then lists the objects in each partition. It returns the complete list of objects in the bucket path.

This partitioning technique will work best for buckets with many objects that are divided into a number of virtual subdirectories of roughly equal size.

Parameters:
service - the service object that will be used to perform listing requests.
bucketName - the name of the bucket whose contents will be listed.
targetPath - a root path within the bucket to be listed. If this parameter is null, all the bucket's objects will be listed. Otherwise, only the objects below the virtual path specified will be listed.
delimiter - the delimiter string used to identify virtual subdirectory partitions in a bucket. If this parameter is null, or it has a value that is not present in your object names, no partitioning will take place.
toDepth - the number of delimiter levels this method will traverse to identify subdirectory partions. If this value is zero, no partitioning will take place.
Returns:
the list of objects under the target path in the bucket.
Throws:
ServiceException

listObjectsThreaded

public StorageObject[] listObjectsThreaded(StorageService service,
                                           java.lang.String bucketName,
                                           java.lang.String targetPath)
                                    throws ServiceException
Lists the objects in a bucket using a partitioning technique to divide the object namespace into separate partitions that can be listed by multiple simultaneous threads. This method divides the object namespace using the given delimiter, traverses this space up to the specified depth to identify prefix names for multiple "partitions", and then lists the objects in each partition. It returns the complete list of objects in the bucket path.

This partitioning technique will work best for buckets with many objects that are divided into a number of virtual subdirectories of roughly equal size.

The delimiter and depth properties that define how this method will partition the bucket's namespace are set in the jets3t.properties file with the setting: filecomparer.bucket-listing.<bucketname>=<delim>,<depth>
For example: filecomparer.bucket-listing.my-bucket=/,2

Parameters:
service - the service object that will be used to perform listing requests.
bucketName - the name of the bucket whose contents will be listed.
targetPath - a root path within the bucket to be listed. If this parameter is null, all the bucket's objects will be listed. Otherwise, only the objects below the virtual path specified will be listed.
Returns:
the list of objects under the target path in the bucket.
Throws:
ServiceException

buildObjectMap

public java.util.Map<java.lang.String,StorageObject> buildObjectMap(StorageService service,
                                                                    java.lang.String bucketName,
                                                                    java.lang.String targetPath,
                                                                    java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                                    boolean forceMetadataDownload,
                                                                    boolean isForceUpload,
                                                                    BytesProgressWatcher progressWatcher,
                                                                    StorageServiceEventListener eventListener)
                                                             throws ServiceException
Builds a service Object Map containing all the objects within the given target path, where the map's key for each object is the relative path to the object.

Parameters:
service -
bucketName -
targetPath -
objectKeyToFilepathMap - map of '/'-delimited object key names to local file absolute paths
forceMetadataDownload - if true, metadata is always downloaded for objects in the storage service. If false, metadata is only downloaded if deemed necessary. This flag should be set to true when data for any objects in the storage service has been transformed, such as by encryption or compression during upload.
isForceUpload - set to true if the calling tool will upload files regardless of the comparison, so this method will avoid any unnecessary and potentially expensive data/date comparison checks.
progressWatcher - watcher to monitor bytes read during comparison operations, may be null.
eventListener -
Returns:
mapping of keys to StorageObjects
Throws:
ServiceException
See Also:
lookupObjectMetadataForPotentialClashes(StorageService, String, String, StorageObject[], Map, boolean, boolean, BytesProgressWatcher, StorageServiceEventListener)

buildObjectMapPartial

public FileComparer.PartialObjectListing buildObjectMapPartial(StorageService service,
                                                               java.lang.String bucketName,
                                                               java.lang.String targetPath,
                                                               java.lang.String priorLastKey,
                                                               java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                               boolean completeListing,
                                                               boolean forceMetadataDownload,
                                                               boolean isForceUpload,
                                                               BytesProgressWatcher progressWatcher,
                                                               StorageServiceEventListener eventListener)
                                                        throws ServiceException
Builds a service Object Map containing a partial set of objects within the given target path, where the map's key for each object is the relative path to the object.

If the method is asked to perform a complete listing, it will use the listObjectsThreaded(StorageService, String, String) method to list the objects in the bucket, potentially taking advantage of any bucket name partitioning settings you have applied.

If the method is asked to perform only a partial listing, no bucket name partitioning will be applied.

Parameters:
service -
bucketName -
targetPath -
priorLastKey - the prior last key value returned by a prior invocation of this method, if any.
objectKeyToFilepathMap - map of '/'-delimited object key names to local file absolute paths
forceMetadataDownload - if true, metadata is always downloaded for objects in the storage service. If false, metadata is only downloaded if deemed necessary. This flag should be set to true when data for any objects in the storage service has been transformed, such as by encryption or compression during upload.
isForceUpload - set to true if the calling tool will upload files regardless of the comparison, so this method will avoid any unnecessary and potentially expensive data/date comparison checks.
completeListing - if true, this method will perform a complete listing of a service target. If false, the method will list a partial set of objects commencing from the given prior last key.
progressWatcher - watcher to monitor bytes read during comparison operations, may be null.
eventListener -
Returns:
an object containing a mapping of key names to StorageObjects, and the prior last key (if any) that should be used to perform follow-up method calls.
Throws:
ServiceException
See Also:
lookupObjectMetadataForPotentialClashes(StorageService, String, String, StorageObject[], Map, boolean, boolean, BytesProgressWatcher, StorageServiceEventListener)

lookupObjectMetadataForPotentialClashes

public java.util.Map<java.lang.String,StorageObject> lookupObjectMetadataForPotentialClashes(StorageService service,
                                                                                             java.lang.String bucketName,
                                                                                             java.lang.String targetPath,
                                                                                             StorageObject[] objectsWithoutMetadata,
                                                                                             java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                                                             boolean forceMetadataDownload,
                                                                                             boolean isForceUpload,
                                                                                             BytesProgressWatcher progressWatcher,
                                                                                             StorageServiceEventListener eventListener)
                                                                                      throws ServiceException
Given a set of storage objects for which only minimal information is available, retrieve metadata information for any objects that potentially clash with local files. An object is considered a potential clash when it has the same object key name as a local file pending upload/download, and when the hash value of the object data contents either differs from the local file's hash or the hash comparison cannot be performed without the metadata information.

Parameters:
service -
bucketName -
targetPath -
objectsWithoutMetadata -
objectKeyToFilepathMap -
forceMetadataDownload - if true, metadata is always downloaded for objects in the storage service. If false, metadata is only downloaded if deemed necessary. This flag should be set to true when data for any objects in the storage service has been transformed, such as by encryption or compression during upload.
isForceUpload - set to true if the calling tool will upload files regardless of the comparison, so this method will avoid any unnecessary and potentially expensive data/date comparison checks.
progressWatcher - watcher to monitor bytes read during comparison operations, may be null.
eventListener -
Returns:
mapping of keys to StorageObjects
Throws:
ServiceException
See Also:
populateObjectMap(String, StorageObject[])

populateObjectMap

public java.util.Map<java.lang.String,StorageObject> populateObjectMap(java.lang.String targetPath,
                                                                       StorageObject[] objects)
Builds a map of key/object pairs each object is associated with a key based on its location in the service target path.

Parameters:
targetPath -
objects -
Returns:
a map of keys to StorageObjects.

generateFileMD5Hash

public byte[] generateFileMD5Hash(java.io.File file,
                                  java.lang.String relativeFilePath,
                                  BytesProgressWatcher progressWatcher)
                           throws java.io.IOException,
                                  java.security.NoSuchAlgorithmException
Parameters:
file -
relativeFilePath -
progressWatcher - watcher to monitor bytes read during comparison operations, may be null.
Returns:
MD5 hash as bytes
Throws:
java.io.IOException
java.security.NoSuchAlgorithmException

buildDiscrepancyLists

public FileComparerResults buildDiscrepancyLists(java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                 java.util.Map<java.lang.String,StorageObject> objectsMap)
                                          throws java.security.NoSuchAlgorithmException,
                                                 java.io.FileNotFoundException,
                                                 java.io.IOException,
                                                 java.text.ParseException
Compares the contents of a directory on the local file system with the contents of a service resource. This comparison is performed on a map of files and a map of service objects previously generated using other methods in this class.

Parameters:
objectKeyToFilepathMap - map of '/'-delimited object key names to local file absolute paths
objectsMap - a map of keys to StorageObjects.
Returns:
an object containing the results of the file comparison.
Throws:
java.security.NoSuchAlgorithmException
java.io.FileNotFoundException
java.io.IOException
java.text.ParseException

buildDiscrepancyLists

public FileComparerResults buildDiscrepancyLists(java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                 java.util.Map<java.lang.String,StorageObject> objectsMap,
                                                 BytesProgressWatcher progressWatcher)
                                          throws java.security.NoSuchAlgorithmException,
                                                 java.io.FileNotFoundException,
                                                 java.io.IOException,
                                                 java.text.ParseException
Compares the contents of a directory on the local file system with the contents of a service resource. This comparison is performed on a map of files and a map of service objects previously generated using other methods in this class.

Parameters:
objectKeyToFilepathMap - map of '/'-delimited object key names to local file absolute paths
objectsMap - a map of keys to StorageObjects.
progressWatcher - watcher to monitor bytes read during comparison operations, may be null.
Returns:
an object containing the results of the file comparison.
Throws:
java.security.NoSuchAlgorithmException
java.io.FileNotFoundException
java.io.IOException
java.text.ParseException

buildDiscrepancyLists

public FileComparerResults buildDiscrepancyLists(java.util.Map<java.lang.String,java.lang.String> objectKeyToFilepathMap,
                                                 java.util.Map<java.lang.String,StorageObject> objectsMap,
                                                 BytesProgressWatcher progressWatcher,
                                                 boolean isForceUpload)
                                          throws java.security.NoSuchAlgorithmException,
                                                 java.io.FileNotFoundException,
                                                 java.io.IOException,
                                                 java.text.ParseException
Compares the contents of a directory on the local file system with the contents of a service resource. This comparison is performed on a map of files and a map of service objects previously generated using other methods in this class.

Parameters:
objectKeyToFilepathMap - map of '/'-delimited object key names to local file absolute paths
objectsMap - a map of keys to StorageObjects.
progressWatcher - watcher to monitor bytes read during comparison operations, may be null.
isForceUpload - set to true if the calling tool will upload files regardless of the comparison, so this method will avoid any unnecessary and potentially expensive data/date comparison checks.
Returns:
an object containing the results of the file comparison.
Throws:
java.security.NoSuchAlgorithmException
java.io.FileNotFoundException
java.io.IOException
java.text.ParseException

isSkipSymlinks

public boolean isSkipSymlinks()
Returns:
true if the "filecomparer.skip-symlinks" configuration option is set.

isUseMd5Files

public boolean isUseMd5Files()
Returns:
true if the "filecomparer.use-md5-files" configuration option is set.

isGenerateMd5Files

public boolean isGenerateMd5Files()
Returns:
true if the "filecomparer.generate-md5-files" configuration option is set.

isSkipMd5FileUpload

public boolean isSkipMd5FileUpload()
Returns:
true if the "filecomparer.skip-upload-of-md5-files" configuration option is set.

isAssumeLocalLatestInMismatch

public boolean isAssumeLocalLatestInMismatch()
Returns:
true if the "filecomparer.assume-local-latest-in-mismatch" configuration option is set.

getMd5FilesRootDirectoryFile

public java.io.File getMd5FilesRootDirectoryFile()
                                          throws java.io.FileNotFoundException
Returns:
the file represented by the configuration option "filecomparer.md5-files-root-dir" or null if this option is not specified.
Throws:
java.io.FileNotFoundException