Tapestry Specialist Backup Tool

Tapestry is a specialized backup automation utility, written in Python 3.6 for use on unix systems. It is currently tested on Ubuntu 18.04 and Max OS X. Tapestry operates in a somewhat novel way, performing backups of whole files, operating from recursively-generated lists taken from the contents of user-specified files, packaging them using our Blockwise Packaging Algorithm before compressing, encrypting, and signing them for storage - a model which allows the user to eliminate trust in the security of the storage solution completely.

Tapestry is currently in its v 1.1 release. This release implemented the new FTP communication feature and a major security fix. See the changelog for details. At present only unix machines are supported; Windows support is intended in the next feature release, expected in Q1 of 2019.

Tapestry is designed to ensure that:

  • The user need not be overly concerned with the security of the storage media (only it's reliability);
  • Decrypting backup packages generated by Tapestry be prohibitively computationally expensive, and;
  • The user shall have a mechanism to verify the backup has not been tampred with at rest or in transit.

Since the release of Version 0.3.0, Tapestry has leveraged python's native multiprocessing module to take full advantage of the ability of modern operating systems to perform mutiple tasks at once, dramatically decreasing the time required to complete a full backup.

The Blockbuild Algorithm

For various reasons, it can be desirable to split the backup into smaller packages, referred to as "tapblocks". Depending on the user configuration or use case, some backups taken can be very large. What's more, many storage solutions, such as removable media, can be space-limited relative to the entire space available to a system. Some network transfer protocols have size limits, and so do some filesystems, even on very large drives. Since Tapestry doesn't alter or re-arrange the contents of individual files in any way, we need a way to set a maximum size for our tapblocks, and a way to ensure we don't create any more than we absolutely have to.

To achieve this, we created a process we like to call blockbuild, after the name of the function which performs it in the Tapestry source. Blockbuild is analogous to the idealized process for packing a cube van - start with the largest items and pack the smaller items in around it.

Users whose use-case involve resource-limited machines, or network storage, are advised to choose the smallest blocksize practical.

Why Whole-File Backups?

With Tapestry, the philosophy of design was intended toward simplicity and reliability. Many contemporary backup solutions allow for the creation of "Delta" Backups - one complete backup record, coupled with multiple subsequent backups forming a sort of Version Control system - each containing only the data which was changed since the last backup.

When creating Tapestry, we chose to go a different route - each backup "session" should be entirely sufficient to recover from - the easiest way to achieve that aim was to have the backup archives contain the whole files. In fact, Tapestry doesn't even split up files within the archive to minimize the block count - the files are stored whole and unmodified.

For users with use cases where some directories are updated more frequently than others, the '--inc' argument can add a secondary list of directories to the list of target locations.

Compression - Why BZ2?

Tapestry uses the BZ2 compression provided in Python's standard library due to the algorithm's space efficiency, and it's asymmetry - BZ2 decompression is much, much faster than BZ2 compression. The twin advantages of being a powerful compressor and decompressing quickly makes BZ2 an ideal choice for this application.

Encryption: Why and How?

In this day of Cloud Computing, we have never been in less control of the media our data physicaly resides upon. Time and time again, cloud storage providers find themselves announcing major breaches. Why trust your storage at all, when you don't have to? For this reason, we needed a method of encryption which would prevent an attacker from parsing the contents of the backup, and a way to ensure the backup you've eventually restored from was not merely a replacement. In tapestry, this problem is solved using PGP-encrypted-and-signed .tap files.

How does Tapestry use Encryption to Protect My Files?

As mentioned, Tapestry uses PGP. In actuality it does this by using two keys: a Disaster Recovery Key which is, by default, a generic 2048-bit RSA key protected with a passphrase, and a Singing Key, which is an unspecified key meant to belong to the user who generated the backup, and be specific to that user.

All Tapestry-generated backups are encrypted using the preconfigured Disaster Recovery Key, which can be regenerated on-demand using the argument "--genKey".

However, encrypting the files is only half the battle. Anyone with a copy of the public key can encrypt files, right? Couldn't an attacker then pass them off as legitimate. Not if those backups are also signed. Tapestry can do this automatically with a preconfigured key, which is the recommended method. During the signing stage of the process, the user's pinentry program of choice will pop up and request the pin or passphrase of the signing key being used.

Isn't this all rather complicated?

For the users, not at all. The user simply provides the necessary passwords when prompted and presses go. Some management of the keys is necessary to keep them safe, but most users who are familiar with PGP already have a preferred key management policy. My recommendation, and the assumptions that Tapestry makes, are that a copy of the public keys involved will exist on the main keyring of the local GnuPG install. As to the private keys, I currently use the following method: private half of Disaster Recovery lives, oddly enough, as backups in a few different locations (one digital in a safe, one printed in case the safe is stolen, and another printed version stored at a trusted offsite in case my house burns to the ground). As for the private side of the signing key, in my case, that lives on a Yubikey Neo that has been configured to front as an OpenPGP Smart Card.

From the code side, it actually isn't that complicated at all. I am not arrogant enough to attempt to implement my own crypto. Instead, I make use of known-reliable cryptographic packages wherever possible - a python module that interfaces with GnuPG for the PGP, and the user's native Pinentry handles the rest. Tapestry never even sees the PIN or Passphrase that secures either of the keys it relies on.

Minimum-Effort Recovery

Recovery is easy, and it gets easier with each successive generation of Tapestry. Changes to Tapestry between versions may break recovery. For the time being, always note which version you used for which backup to facilitate recovery.Since 0.3.0, the recovery-pkl file has been present in each and every block. The recovery process is as easily as putting the recovery files into the right directory, and running Tapestry.

At present, recovery requires as much as twice as much free hard disk space as the size of the archives, if not more. Tapestry's current behaviour is to quickly copy each disk to /tmp/ by decrypting it there, then spawn worker processes to unpack each tar and correctly rename the file. Recovery is categorywise. If the local tapestry.cfg says docs go somewhere different than where they did before, Tapestry will obey that, placing the file in question in all the appropriate number of sub directories under the category directory.

This means that if you reconfigure the docs category to point at /bar/ instead of foo, and you had once archived /foo/some/deep/path/to/file, the recovered file will, correctly, appear in /bar/some/deep/path/to/file.

If, during recovery, files are encountered for which no category entry is configured, that's not a problem either. Tapestry will create its standard output directory (defaults to the current user's desktop) and generate a subdirectory ~/$category, before obeying the above convention.

At the end of the recovery process, Tapestry cleans up after itself by deleting its temporary working location. If all went well, your files are now back where they aught to be and you can safely repackage your disks.

What's Next?

The current development version of Tapestry will release as version 2.0. Its current features and changes will be:

  • Support for Windows Operating System Machines;
  • Enhanced Recovery Index File Format standard;
  • Improved Test Coverage

Future features planned for implementation:

  • Windows Support
  • Improved functionality as an automated task.
  • Rewritten as standalone-executable, without dependencies
  • Mechanism for Secure, Automatic Updates
  • GUI Support for Configuration and Manual Use

How you can help!

At present, the best way to help with Tapestry is to submit a pull request at the official github repo for it. Of course, Tapestry is by and large a simple tool that is winding down to the end of its active development cycle and approaching its maximum utility. Considering it's a niche tool, you may simply wish to share your thoughts on it with me, available by any of the means on the contact page.

You could also buy me a coffee if you really wanted to:

Buy Me a Coffee at ko-fi.com