.. include:: global.rst.inc

backy2 data layout
==================

backy2 uses two separate data storages: The *data backend* and the *meta
backend*.

The *data backend* stores the binary blocks whereas the *meta
backend* stores to which version the block belongs to, what it's checksum
is, wether it contains data (or it is sparse) and more.

For each backend there may be different implementations. Currently, there is
only one implementation for the *meta backend*, but two for the *data backend*.

meta backend
------------

The *meta backend* is responsible to manage all meta data for all backups.

.. ATTENTION:: As
    this is usually on a dedicated backup server which runs the backy2 process,
    it's recommended to somehow back this metadata up too. Without metadata
    no restore is possible.

    Please refer to the section :ref:`administration-guide-meta-storage` for HA-setups or the import/export
    feature of backy2.

sql meta backend
~~~~~~~~~~~~~~~~

The *sql meta backend* relies on sqlalchemy, a python ORM which works with a
huge number of DBMS, e.g. MySQL, postgreSQL, sqlite, oracle.

For backy2's purpose, you may use any of them depending a bit on how big your
backups are and how many versions you are storing. For a single workstation
backup with 10-20 versions, sqlite is perfectly suitable. However you will
benefit from postgreSQL's performance and stability when doing hundrets of
versions with terabytes of backup data.

To configure the *sql meta backend*, please refer to ``backy.cfg``'s section
``[MetaBackend]``::

    [MetaBackend]
    # Of which type is the Metadata Backend Engine?
    # Available types:
    #   backy2.meta_backends.sql

    #######################################
    # backy2.meta_backends.sql
    #######################################
    type: backy2.meta_backends.sql

    # Which SQL Server?
    # Available servers:
    #   sqlite:////path/to/sqlitefile
    #   postgresql:///database
    #   postgresql://user:password@host:port/database
    engine: sqlite:////var/lib/backy2/backy.sqlite

data backend
------------

There are currently two data backend implementations. Which one is in use is
determined by the ``backy.cfg`` configuration value in the section
``[DataBackend]`` called ``type``::

    [DataBackend]
    # Which data backend to use?
    # Available types:
    #   backy2.data_backends.file
    #   backy2.data_backends.s3


file data backend
~~~~~~~~~~~~~~~~~

The *file data backend* stores backy2's blocks in 4MB files [1]_ in a
2-hierarchical directory structure::

    $ find /var/lib/backy2/data
    /var/lib/backy2/data
    /var/lib/backy2/data/20
    /var/lib/backy2/data/20/7d
    /var/lib/backy2/data/20/7d/207d51da01kibRnRHsfsjdPkwGi9qLVU.blob
    /var/lib/backy2/data/ea
    /var/lib/backy2/data/ea/b2
    /var/lib/backy2/data/ea/b2/eab2e98cee4yccDw2tf9j2HRkJUvDByG.blob
    …

There are several parameters in ``backy.cfg`` which can configure the *file data
backend*::

    [DataBackend]
    type: backy2.data_backends.file

    # Store data to this path. A structure of 2 folders depth will be created
    # in this path (e.g. '0a/33'). Blocks of DEFAULTS.block_size will be stored
    # there. This is your backup storage!
    path: /var/lib/backy2/data

    # How many writes to perform in parallel. This is useful if your backup space
    # can perform parallel writes faster than serial ones.
    simultaneous_writes: 5

    # How many reads to perform in parallel. This is useful if your backup space
    # can perform parallel reads faster than serial ones.
    simultaneous_reads: 5

    # Bandwidth throttling (set to 0 to disable, i.e. use full bandwidth)
    # bytes per second
    #bandwidth_read: 78643200
    #bandwidth_write: 78643200


s3 data backend
~~~~~~~~~~~~~~~

The *s3 data backend* stores backy2's blocks in 4MB objects [1]_ in an S3
compatible storage (e.g. amazon s3, riak cs, ceph object gateway).

These are the parameters in ``backy.cfg`` to configure the *s3 data backend*::

    [DataBackend]
    type: backy2.data_backends.s3

    # Your s3 access key
    aws_access_key_id: key

    # Your s3 secret access key
    aws_secret_access_key: secretkey

    # Your aws host (IP or name)
    host: 127.0.0.1

    # The port to connect to (usually 80 if not secure or 443 if secure)
    port: 10001

    # Use HTTPS?
    is_secure: false

    # Store to this bucket name:
    bucket_name: backy2

    # How many s3 puts to perform in parallel
    simultaneous_writes: 5

    # How many reads to perform in parallel. This is useful if your backup space
    # can perform parallel reads faster than serial ones.
    simultaneous_reads: 5

    # Bandwidth throttling (set to 0 to disable, i.e. use full bandwidth)
    # bytes per second
    #bandwidth_read: 78643200
    #bandwidth_write: 78643200


.. [1] The size of the blobs can be configured in ``backy.cfg`` with the
    ``block_size: 4194304`` parameter. However, changing the ``block_size``
    on existing backup data will render all backups invalid.