Mediatex 0.9
Abstract
Perennial archives means we provide the resources to overcome the technological obsolescence of supports and drives we use.
Objectives
The perennial archival comes with 3 main objectives: conserve the document,
make it accessible and preserve it’s understanding.
- Supports reliability:
On numeric ages, time life of hard drive and optical supports is 5 years.
The numerical archives on hard drives or tapes are sensitive to the earth’s magnetic field.
Burning WORM supports and store them in good conditions provides a compromise (NF-Z-42-013 standard).
- Identify:
Document understanding may be linked to other documents defining a context.
Consequently, it is important to provide a way to identify references from documents to others.
Grouping documents into collection makes indexes easier and gives the perimeter of data to preserve.
- Other datas:
The same way supports need a compatible drive, application data need a software to read them.
It is important to memorise which software and format versions are related to the data file, or best, to store software’s source and format specifications.
In order to do so, we need meta-data.
Geographical duplication
In order to mitigate natural and technological disasters, archives must be duplicated on several sites.
- Distribute meta-data as source code:
Idea is servers share meta-data using a revision control system as programmers use it to share their source codes.
This is not so obvious: meta-data files must be split into acceptable size so as to be merged in memory,
must be human readable as automatic merge may fails and need a human arbitrage, and consume memory to be loaded.
Nevertheless, revision control system does not only provides an elegant solution for geographical duplication,
but also provide change’s history on meta-data.
Although using a centralised deposit, this solution offer the advantage to de-synchronise updates, that is a way to anticipate crash recoveries.
- Pull data into a cache:
Servers by having up to date (or nevertheless converging) meta-data are able to populate their caches in order
to backup unsafe files or to expose files needed from a support.
They extract a file from containers (extraction meta-data) or retrieve it from an other server (connection meta-data).
Indeed, if a cache is deleted, server will rebuild it from local and remote supports.
- Collaborate for data access:
Each server build the same static HTML index (compiling the descriptive meta-data) so as any load-balancer is sufficient to prevent form crash disaster (connection meta-data are shared too).
URL on archives do not change as they point to a CGI script that will deal with all servers and return an HTML redirection to content into a server’s cache.
Reversibility
We shall consider that the ERMS becomes obsolete itself and we need to change it.
A significant effort will be require to return archived content:
compress, cut and send data using a media (support or network)
and export the related meta-data.
Archiving on supports make sense here as we just have to bring them back.
Parsers designed to load the meta-data after updates offer a native software library usable to export them into any format.
Table of Contents
Mediatex
This manual is for Mediatex (version 0.9, 7 October 2017).
Mediatex is an Electronic Record Management System
(ERMS),
focusing on the archival storage entity define by the OAIS draft,
and on the NF Z 42-013 requirements.
Copyright © 2014 2015 2016 2017 Nicolas Roche.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
“GNU Free Documentation License”.