cfvers introduction

Iustin Pop

$Id: manual.html,v 1.4 2005/10/30 13:25:48 iusty Exp $

This document explains the concept and usage of cfvers version 0.5.4, a system tool designed to help with the versioning the configuration files on a system.

Table of Contents

1. About this document

2. Introduction

3. Installation

3.1. Database configuration

4. Quick start

5. Concepts

5.1. The repository

5.2. The areas

5.3. The items

5.3.1. Regular vs. virtual items

5.4. The entries

5.5. The revisions

6. Common operations

6.1. Repository initalization

6.2. Area related

6.3. File operations

6.3.1. Storing files
6.3.2. Searching for files
6.3.3. Retrieving files

6.4. Handling deletions

7. Limitations

7.1. POSIX VFS layer limitations

1. About this document

This is the usermanual for the cfvers project; homepage is at http://www.nongnu.org/cfvers/. You can also get new versions of this document there.

Revision: $Id: manual.html,v 1.4 2005/10/30 13:25:48 iusty Exp $

Making backup is an important aspect of system administration. The techniques of backing up data are explained in any good document about system administration, and they won't be explained here again.

However, the text configuration files are more suited to versioning systems than to full/incremental backups which are targeted at binary files and miscellaneous data. Unfortunately, the versioning systems are not very good at working directly live on the system: the main reasons are creation of extra-files, inability to cope with special files and with keeping permissions intact.

The working model of the classic versioning systems is one (or more) composed of a central repository (very precious) and a multitude of developer's workspaces, which hold semi-important data; by this I mean it's ok to delete or otherwise break a developer's workspace when no changes have been performed to it - all information can be restored from central repository.

In contrast, a versioning system designed for system configuration has its priorities almost reversed: the critical issue is with the filesystem, and the repository is secondary to that. This means that such a software must obey the following rules:

keep the system's integrity: the software must not do anything to the filesystem it hasn't been asked to do
treat the meta-data of versioned items to be as important as the data
when in doubt about the success of the operation, abort rather than do damage on the workspace

cfvers has been designed with these objectives in mind[1].

3. Installation

There are three components which need installing:

the python library
the command line utilities, cfv and cfvadmin
the cfversd server and its configuration files

If you don't run the server, you can run the cfv/cfvadmin scripts from the install directory, since it contains the python library and it will be picked from there. However, the recommended way is to install the python library in its proper place and the scripts to /usr/local/bin or /usr/bin.

The default ./configure invocation will install all these in their location: scripts in bin, server in sbin and the library in lib/python2.3

The configuration files needed by the server (in /etc/cfvers, if not overriden by command line arguments) are:

the logger configuration file, logging.cfg
the server configuration file, cfversd.conf

Note that all these are needed for proper functioning. Also, before running the server, you should set up a proper environment (the Pyro library which is used in the server/clients can customize some variables only through environment variables). The most important one is PYRO_STORAGE. This variable should point to a writable directory used for temporary files. If it does not exist, Pyro will use the current directory (which could be even / for a daemon started from the init scripts). The other variable are not needed, but if you want to customize some parameters of the client-server communication, please see the Pyro documentation. Available settings include for example whether to use compression, how many connections to accept, etc.

3.1. Database configuration

If you will use the sqlite backend, no customization is necessary. Just choose a writable file in a writable directory; writable by the user who will be accessing the database (this is the server in remote configurationa and the tools in local configurations).

If you are using the postgresql backend, you need to create a database and (preferably) a separate user for the database. Remember the username and password as you will need to fill them in the configuration files.

Also, for the postgresql backend, the --name argument to cfv find works only if you install the plpythonu server-side language and create the following function in the database:

CREATE OR REPLACE FUNCTION fnmatch (text, text) RETURNS boolean
LANGUAGE plpythonu AS '
import fnmatch
return fnmatch.fnmatch(args[0], args[1])
';

4. Quick start

How to create your first repository

1. decide wheter to use a client-server setup or direct access to the repository (this can be also remote, in case of postgresql)
2. decide on which back-end to use (either sqlite or postgresql for now)

Based on the above answers, create the configuration files.

local repository, sqlite; just create the configuration file ~/.cfvers:

[server]
server_type=local
repo_meth=sqlite
repo_data=/path/to/file.db
area=default

local repository, postgresql (first create a postgresql database).

[server]
server_type=local
repo_meth=postgresql
repo_data=dbname=mydb user=myuser password=mypass
area=default

remote repository;create the server configuration file (e.g. /etc/cfvers/cfversd.conf):

for sqlite:

[server]
port = 9999
pidfile = /var/run/cfvers/cfversd.pid

[repository]
method=sqlite
connect=/var/lib/cfvers/database

[auth]
users=user1

[user_user1]
client_password=cpw
server_password=spw
valid_from=127.0.0.1,192.168.0.2
areas=default
admin=true

for postgresql:

[server]
port = 9999
pidfile = /var/run/cfvers/cfversd.pid

[repository]
method=postgresql
connect=dbname=mydb user=myuser password=mypass

[auth]
users=user1

[user_user1]
client_password=cpw
server_password=spw
valid_from=127.0.0.1,192.168.0.2
areas=default
admin=true

then create the client configuration file (~/.cfvers):

[server]
server_type=remote
host=192.168.0.1
port=9999
username=user1
client_password=cpw
server_password=spw
area=default

then start the server: /usr/sbin/cfversd -c /etc/cfvers/cfversd.conf

run cfvadmin --init in order to create the initial repository.
run cfv add ITEMS... in order to register the items you want versioned.
run cfv store in order to store the first version.
after every change to the system's configuration, rerun the cfvers store command in order to update the versioned items. New items you want stored must be given in a separate call (cfvers add).
schedule a cron job to watch for differences or do automatic commits.

5. Concepts

I tried to keep cfvers as simple as possible. But I don't think I succeeded.

5.1. The repository

The repository is where the files are stored. The repository is manipulated using the cfvadmin command.

Right now, there are two backends implemented for the repository: postgresql-based and sqlite-based. The sqlite backend is very useful for small or standalone installations.

5.2. The areas

The repository contains areas in which files are stored; this allows to store files from different servers in the same repository. A repository must contain at least one area in order to be able to contain files. The areas are created with the cfvadmin create command and displayed with cfvadmin info.

An area has the following attributes:

name

The name of the area; you use this when referring to the area from the client, either in configuration files or with the -a option to the cfv command

root

The root path on the filesystem for the files contained in this area; this allows you to define for example areas for chroot jails and refer to the files in the area using the path in the chroot.

Default value: /

description

A text describing the area, anything you like

ctime

The creation time of the area

5.3. The items

The files to be versioned are represented by items. Note that an item doesn't contain actual file information, it represents the intent to track a file.

The attributes of an item:

name

The filename which this item represents; this is what will be tracked by cfvers;

flags

The entries of an item are affected by the item's flag attribute. Currently, the flags can affect the following:

Amount of information to store. An entry can store for a file:
- metadata (name, type, size, access/creation/modification times, owner/group, etc.)
- checksum of the contents (for regular files, symbolic links and directories)
- file contents (for regular files, symbolic links and directories)
An entry can store only metadata, metadata and checksum, or all information about a file. This is selected at registration time using cfv add --store=level command, where level is one of metadata, checksum, full.
The kind of the item:
- Regular file: if the flags is one of metadata, checksum or contents, the file will be stored as a regular file.
- Virtual file: if the flags is virtual, the file will be stored as a virtual file.

ctime

Creation time (=registration time) for this item.

area

The area to which this item belongs.

command

If the item is a virtual one, this is the command line used to generate the contents.

5.3.1. Regular vs. virtual items

Usually you will want to track regular files. This is acomplished by defining an item with a certain name and that name will be used as the name of the file to store in the repository.

However, there is another posibility: a virtual file. A virtual file is one whose contents is taken from the output of a command, not from a file in the filesystem. This can be useful for versioning system state, for example: partition tables, either as dd if=/dev/hda bs=512 count=1 or as sfdisk -d /dev/hda, system hardware configuration, as lspci -v, etc.

The command attribute of the item is used to generate the contents of the file. For the moment, both the standard output and the standard error are saved together. The exit code of the command is saved in the entry's exitcode attribute.

5.4. The entries

An entry represents the information about an item at a certain point in time.

The properties of an entry can be split into two group: own attributes and the attributes of the file it represents. Its own attributes are:

item

The item to which this entry belongs

revno

The revision number of the revision this entry belongs

status

The status of this entry, meaning what kind of change to the file it represents. Currently, it can take one of the following values:

A - the entry represents the addition of an item to the area; it does not have any other contents (i.e. the file properties haven't been stored yet)
M - modified; this is a regular entry about a file being update
D - deleted; this is an entry about a file which can no longer be found in the filesystem; see Section 6.4 for more details about deletions

If the entry has the status "M", the file properties will contain:

filetype, size, mode, atime, mtime, ctime, inode, device, nlink, uid/gid, uname/gname, rdev, blocks, blksize: metadata properties of the file
sha1sum: the checksum of the file contents; applicable to regular files, symbolic links and directories;
filecontents: the file contents; applicable to regulare files, symbolic links and directories; for directories, the contents is the list of filenames separated by newlines

5.5. The revisions

A revision groups togheter entries which represent the state of the items tracked at a certain moment in time.

area: The area to thich revision belongs.
revno: The revision number of this revision.
server: The server on which this revision was made.
logmsg: The log message.
ctime: The creation time of this revision.
uid, uname, gid, gname: The numeric and textual representation of the credentials of the process which created this revision.
commiter: A textual description of the person or process of this revision; useful when the revision are made from root but you need a more detailed description.

6. Common operations

6.1. Repository initalization

This should be done only once, otherwise it destroys your data.

Example 1. Repository initialization

$ cfvadmin init

Example 2. Forced repository initalization

$ cfvadmin init --force

6.2. Area related

Generally, you only work with areas at the initial setup of your repositories, or when adding new servers to the setup. There are only two operations posibile on area: creation of a new area and displaying area information.

Example 3. Area creation

$ cfvadmin create -d "my area" -p / area45

Example 4. Displaying area information

$ cfvadmin info
Local repository has 1 area(s)
-------------------------
Name: default
Created at 2004-09-26 04:49:03+0300
Root path: /
Description: Default area
Revision number: 2
Number of items: 102529
$

6.3. File operations

The item/entry operations can be split roughly in three groups:

storing files

searching for files

retrieving files

6.3.1. Storing files

The first step in order to track a file is to register it with the system:

Example 5. Registering files

$ cfv add -m "Log message" /etc/passwd /etc/group /etc/hostname
Status: Added, revision 1
Time begin: 2004-09-26 15:35:02 EEST
Time end:   2004-09-26 15:35:03 EEST
Total skipped (error): 0
Total registered: 3
Total skipped (item already registered): 0
Total skipped (invalid name): 0
$

Then you need to actually order the system to store the contents of those files:

Example 6. Storing files

$ ./cfv store -m "Stored files"
Status: Stored revision 2
Time begin: 2004-09-26 15:37:01 EEST
Time end:   2004-09-26 15:37:02 EEST
Total stored: 3
Total skipped (not changed): 0
Total skipped (error): 0
Total skipped (not registered): 0
Total marked deleted: 0
$

This is all there is to storing files.

6.3.2. Searching for files

You can make two kinds of searches: for files with a certain attributes, or for files for which the filesystem is not in sync with the repository.

Example 7. Search files by attribute

$ cfv find --name passwd -l
-rw-r--r--     2 root     root           92 2004-04-30 00:32:04 /etc/pam.d/passwd
-rw-r--r--     2 root     root         1594 2004-07-20 23:01:57 /etc/passwd
$ cfv find --regex '.*[a-k]nes[^/]'
/etc/X11/xkb/geometry/kinesis
/etc/gconf/schemas/glines.schemas
/etc/snmp/mib2c.column_defines.conf
/etc/xpdf/xpdfrc-japanese
$ cfv find --size '>' 950000 -d
-------------------------
Entry for /etc/gconf/schemas/gnome-terminal.schemas
File registerd at: 2004-09-26T15:45:18+0
Available revisions: 2

-------------------------
Entry for /etc/gconf/schemas/metacity.schemas
File registerd at: 2004-09-26T15:45:18+0
Available revisions: 2

$

Example 8. Searching for modified files

s$ ./cfv diff -l
/tmp/a
$ ./cfv diff
===== Item /tmp/a (rev 2 -> current)
File contents:
--- /tmp/a Sun Sep 26 15:59:05 2004 (rev 2)
+++ /tmp/a Sun Sep 26 15:59:17 2004 (current)
@@ -1,1 +1,1 @@
-Sun Sep 26 15:59:05 EEST 2004
+Test


Attribute mtime:
- 2004-09-26 15:59:05 EEST
+ 2004-09-26 15:59:17 EEST

Attribute ctime:
- 2004-09-26 15:59:05 EEST
+ 2004-09-26 15:59:17 EEST

Attribute size:
- 30
+ 5

Attribute sha1sum:
- dc926ccb39a0c823680bdfeefe59057a6af727fc
+ 1c68ea370b40c06fcaf7f26c8b1dba9d9caf5dea

$ ./cfv diff -l -c mtime
/tmp/a
$

6.3.3. Retrieving files

Once you have found the files you want to retrieve, there are several things you can do with them:

restore them to the filesystem
display their contents
display information about their metadata (like stat)
export them in a tar archive
create a checksum file (SHA1SUM) for external tools to check

Example 9. Retrieving files

$ cfv retrieve /tmp/a
Total retrieved (fully): 1
$ cfv cat /tmp/a
Sun Sep 26 15:59:05 EEST 2004
$ cfv export -Ftar -o /tmp/x.tar
$ cfv export -Fsha1sum
dc926ccb39a0c823680bdfeefe59057a6af727fc  tmp/a
$

6.4. Handling deletions

When a file which is tracked has been removed from the filesystem, cfvers will notice this at the next store command and will register this deletion. The item in question will be displayed (by default) in the output of the command. Then, as long as the file hasn't been recreated, cfvers will ignore it. As soon as the file exists again, it will be tracked normally.

The deletion of a file is registered as an entry with status "D" in the repository. When it appears again, it will have a new status "M" entry.

7. Limitations

This section should be very big. It's small because I didn't have time to fill it, not because cfvers is complete :-)

7.1. POSIX VFS layer limitations

These are limitations or design decisions inherent to the POSIX specification or the GNU/Linux implementation. While developing cfvers, I found:

You can't change the ctime of an inode. This is by design in the POSIX filesystem layer: the ctime is for metadata modifications, and the mtime/atime pair for data write/read accesses. Thus a ctime modification would trigger a ctime modification, since the ctime itself is part of metadata, rendering useless the ctime modification :). A read attribute for the metadata would be innapropriate, I think, because such reads are made in a great amount.
utimes(2) and chmod(2) acts on the destination of a symlink (when given an argument which is a symlink). I can't think why anyone would like this (you could always expand the symlink using readlink, but right now you can't act on the symlink!).

Notes

[1]	However, nobody said it attained these goals - after all, it software!