WP-MIRROR

wp-mirror logotype

Abstract

WP-MIRROR is a utility for mirroring a set of wikipedias.

Purpose

The Wikimedia Foundation offers wikipedias in nearly 300 languages. In addition, the WMF has several other projects (e.g. wikibooks, wiktionary, etc.) for a total of around 1000 wikis.

WP-MIRROR is a free utility for mirroring any desired set of these wikis. That is, it builds a wiki farm that the user can browse locally. Many users need such off-line access, often for reasons of mobility, availability, and privacy.

WP-MIRROR builds a complete mirror with original size images. WP-MIRROR is robust, uses check-pointing to resume after interruption, and employs concurrency to accellerate mirroring of large wikipedias.

WP-MIRROR by default mirrors the simple wikipedia, the simple wiktionary (Simple English means shorter sentences), and the wikidata wiki (a centralized collection of facts usable by all other wikis). The default should work `out-of-the-box' with no user configuration. It should build in 200ks (two days), occupy 130G of disk space, be served locally by virtual hosts http://simple.wikipedia.site/, http://simple.wiktionary.site/, and http://www.wikidata.site/, and update automatically every week. The default should be suitable for anyone who learned English as a second language (ESL).

The top ten wikipedias are the: en, de, nl, fr, it, es, ru, sv, pl, and ja wikipedias. Because WP-MIRROR uses original size image files, the top ten are too large to fit on a laptop with a single 500G disk, unless the user does not need the images (and this is configurable). The en wikipedia is the most demanding case. It should build in 1Ms (twelve days), occupy 3T of disk space, be served locally by a virtual host http://en.wikipedia.site/, and update automatically every month.

Most features are configurable, either through command-line options, or via a configuration file (/etc/wp-mirror/local.conf).

Use Cases

WP-MIRROR by default mirrors the simple wikipedia, the simple wiktionary (Simple English means shorter sentences), and the wikidata wiki (a centralized collection of facts usable by all other wikis), which at 130G should fit on most laptops. Users may edit a configuration file (/etc/wp-mirror/local.conf) to specify any desired set of languages to be mirrored. For example:

Students learning English as a Second Language (ESL) might want Simple English wikipedia or wiktionary side-by-side with that of their native language:

Someone interested in classical languages might configure:

Software developer might speed up the test cycle by choosing smaller languages:

Access

WP-MIRROR sets up virtual hosts (e.g. http://simple.wikipedia.site/, http://simple.wiktionary.site/, and http://www.wikidata.site/) so that the user may access the mirror locally using a web browser.

Process

WP-MIRROR is non-interactive and normally runs in background as a weekly cron job, updating the mirror whenever the Wikimedia Foundation posts new dump files.

WP-MIRROR maintains the state of the mirror in a transactional database (InnoDB which is the ACID compliant storage engine for MySQL). There are three advantages to this:

WP-MIRROR is designed for robustness. WP-MIRROR asserts hardware and software prerequisites, skips over unparsable pages and bad file names, waits for internet access when needed, and exits gracefully if disk space runs low.

wp-mirror in monitor mode using gui wp-mirror in monitor mode using screen

Downloading WP-MIRROR

WP-MIRROR can be found on the main GNU server: http://download.savannah.gnu.org/releases/wp-mirror/ (via HTTP).

Documentation

Documentation for WP-MIRROR is available online. The WP-MIRROR Reference Manual is available in PDF format. If you install from a package, the documentation will be registered automatically with `doc-base' and readily found using `dhelp' or `dwww'.

You may also find more information about WP-MIRROR by running info wp-mirror or man wp-mirror, or by looking at /usr/share/doc/wp-mirror/, /usr/local/doc/wp-mirror/, or similar directories on your system. A brief summary is available by running wp-mirror --help.

Dependencies

WP-MIRROR has numerous dependences including: apache2, Graphics Magick, MediaWiki, and MySQL. For this reason, it is easiest for the user to install WP-MIRROR from a package.

WP-MIRROR 0.7 is available as a DEB package. It works `out-of-the-box' on Debian GNU/Linux 7.4 (wheezy) with backports. Porting to other distributions may be considered for a future release.

WP-MIRROR 0.6 is available as a DEB package. It works `out-of-the-box' on Debian GNU/Linux 7.0 (wheezy) and Ubuntu 12.10 (quantal).

WP-MIRROR 0.5 is available as a DEB package. It works `out-of-the-box' on Debian GNU/Linux 7.0 (wheezy) and Ubuntu 12.10 (quantal).

WP-MIRROR 0.4 is available as a DEB package. It works `out-of-the-box' on Debian GNU/Linux 7.0 (wheezy).

WP-MIRROR 0.3 and earlier versions, were developed on a PC with the Debian GNU/Linux 6.0 (squeeze) distribution installed. User configuration of dependencies is required.

There are no plans to backport WP-MIRROR to earlier distributions.

Installation

Debian GNU/Linux 7.4 (wheezy)

Method 1: Install from Debian package repository

1.1) Import the author's GPG public key into your root-shell's GPG keyring, and into your APT trusted keyring:

root-shell# gpg --keyserver pgpkeys.mit.edu --recv-key 382FBD0C
root-shell# gpg --armor --export 382FBD0C | apt-key add -

1.2) Edit /etc/apt/sources.list by appending the `wheezy-backports' and the `debian-wpmirror' package repositories, like so:

deb http://ftp.us.debian.org/debian/ wheezy           main
deb http://security.debian.org/      wheezy/updates   main
deb http://ftp.us.debian.org/debian/ wheezy-updates   main
deb http://ftp.us.debian.org/debian/ wheezy-backports main
deb http://download.savannah.gnu.org/releases/wp-mirror/debian-wpmirror/ wheezy main

If you are building your mirror on an IPv6 only network, then replace the last line of /etc/apt/sources.list with:

deb http://savannah.c3sl.ufpr.br/wp-mirror/debian-wpmirror/ wheezy main

1.3) Upgrade your Debian distribution:

root-shell# aptitude update
root-shell# aptitude safe-upgrade

1.4) Install WP-MIRROR and its dependencies:

root-shell# aptitude install wp-mirror

1.5) Run:

root-shell# wp-mirror --mirror

Method 2: Download and install DEB packages

Releases are found at http://download.savannah.gnu.org/releases/wp-mirror/. Select the most recent DEB packages, and install them in the following order:

root-shell# dpkg --install mediawiki-mwxml2sql_0.0.2-2_amd64.deb
root-shell# dpkg --install wp-mirror-mediawiki_1.23-1_all.deb
root-shell# dpkg --install wp-mirror-mediawiki-extensions-math-texvc_1.23-1_amd64.deb
root-shell# dpkg --install wp-mirror-mediawiki-extensions_1.23-1_all.deb
root-shell# dpkg --install wp-mirror_0.7.1-1_all.deb

Run:

root-shell# wp-mirror --mirror

WP-MIRROR `just works'. Configuration is entirely automated; and that includes configuration of dependencies such as `apache2', `MediaWiki', and `MySQL'.

Debian GNU/Linux 6.0 (squeeze)

Mailing lists

WP-MIRROR has the following mailing lists:

Getting involved

Development of WP-MIRROR, and GNU in general, is a volunteer effort, and you can contribute. For information, please read How to help GNU. If you'd like to get involved, it's a good idea to join the discussion mailing list (see above).

Test releases
Trying the latest test release (when available) is always appreciated. Test releases of WP-MIRROR can be found at http://download.savannah.gnu.org/releases/wp-mirror/ (via HTTP).
Development
For development sources, issue trackers, and other information, please see the WP-MIRROR project page at savannah.gnu.org.
Translating WP-MIRROR
To translate WP-MIRROR's messages into other languages, please see the Translation Project page for WP-MIRROR. If you have a new translation of the message strings, or updates to the existing strings, please have the changes made in this repository. Only translations from this site will be incorporated into WP-MIRROR. For more information, see the Translation Project.
Maintainer
WP-MIRROR is currently being maintained by Dr. Kent L. Miller. Please use the mailing lists for contact.

Licensing

WP-MIRROR is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

GPLv3 logo