the AraMorph site

Dictionaries file format


Let's have a look at the prefixes dictionary gpl/pierrick/brihaye/aramorph/dictionaries/dictPrefixes :

	  ; conjunctions
	  w	wa	Pref-Wa	and <pos>wa/CONJ+</pos>
	  f	fa	Pref-Wa	and;so <pos>fa/CONJ+</pos>

We can see that commentaries are introduced by ; and that significant lines are divided by tabs whose significance is respectively :

  1. the prefix' consonantic skeleton (using Buckwalter's transliteration system)
  2. the prefix' vocalization (using the same system)
  3. the prefix' morphological category
  4. one or several translations for the prefix, followed by one or several grammatical categories. Notice the + which indicates that a stem is expected after this prefix.

Some informations are optional. One good example is that of the empty prefix :

; The first category is the null prefix (has a null gloss as well):

... where we just have a morphological category.


Let's now have a look at this snippet taken from the suffixes dictionary gpl/pierrick/brihaye/aramorph/dictionaries/dictSuffixes :

; perfect verb, null suffix: banA-h, daEA-h
h	hu	PVSuff-0ah	he/it <verb> it/him        <pos>+(null)/PVSUFF_SUBJ:3MS+hu/PVSUFF_DO:3MS</pos>
hmA	humA	PVSuff-0ah	he/it <verb> them (both)   <pos>+(null)/PVSUFF_SUBJ:3MS+humA/PVSUFF_DO:3D</pos>
hm	hum	PVSuff-0ah	he/it <verb> them          <pos>+(null)/PVSUFF_SUBJ:3MS+hum/PVSUFF_DO:3MP</pos>
hA	hA	PVSuff-0ah	he/it <verb> it/them/her   <pos>+(null)/PVSUFF_SUBJ:3MS+hA/PVSUFF_DO:3FS</pos>
hn	hun~a	PVSuff-0ah	he/it <verb> them          <pos>+(null)/PVSUFF_SUBJ:3MS+hun~a/PVSUFF_DO:3FP</pos>
k	ka	PVSuff-0ah	he/it <verb> you           <pos>+(null)/PVSUFF_SUBJ:3MS+ka/PVSUFF_DO:2MS</pos>
k	ki	PVSuff-0ah	he/it <verb> you           <pos>+(null)/PVSUFF_SUBJ:3MS+ki/PVSUFF_DO:2FS</pos>
kmA	kumA	PVSuff-0ah	he/it <verb> you (both)    <pos>+(null)/PVSUFF_SUBJ:3MS+kumA/PVSUFF_DO:2D</pos>
km	kum	PVSuff-0ah	he/it <verb> you           <pos>+(null)/PVSUFF_SUBJ:3MS+kum/PVSUFF_DO:2MP</pos>
kn	kun~a	PVSuff-0ah	he/it <verb> you           <pos>+(null)/PVSUFF_SUBJ:3MS+kun~a/PVSUFF_DO:2FP</pos>
ny	niy	PVSuff-0ah	he/it <verb> me            <pos>+(null)/PVSUFF_SUBJ:3MS+niy/PVSUFF_DO:1S</pos>
nA	nA	PVSuff-0ah	he/it <verb> us            <pos>+(null)/PVSUFF_SUBJ:3MS+nA/PVSUFF_DO:1P</pos>

The principle is exactly the same, although the example is slightly more complex. Indeed, we have a double suffixes sequence, the first one being the Ø perfective third masculine person suffix, the second one being relative to a pronominal direct object. One will notice the + that operates the junction with the stem then the subsequent + which operates the junction with the Ø suffix.


Let's have a look at this snippet taken from the stems dictionary gpl/pierrick/brihaye/aramorph/dictionaries/dictstems :

;--- ktb
;; katab-u_1
ktb	katab	PV	write
ktb	kotub	IV	write
ktb	kutib	PV_Pass	be written;be fated;be destined
ktb	kotab	IV_Pass_yu	be written;be fated;be destined
;; kAtab_1
kAtb	kAtab	PV	correspond with
kAtb	kAtib	IV_yu	correspond with
;; >akotab_1
>ktb	>akotab	PV	dictate;make write
Aktb	>akotab	PV	dictate;make write
ktb	kotib	IV_yu	dictate;make write
ktb	kotab	IV_Pass_yu	be dictated
;; takAtab_1
tkAtb	takAtab	PV	correspond
tkAtb	takAtab	IV	correspond
;; {inokatab_1
<nktb	{inokatab	PV	subscribe
Anktb	{inokatab	PV	subscribe
nktb	nokatib	IV	subscribe
;; {ikotatab_1
<kttb	{ikotatab	PV	register;enroll
Akttb	{ikotatab	PV	register;enroll
kttb	kotatib	IV	register;enroll
;; {isotakotab_1
<stktb	{isotakotab	PV	make write;dictate
Astktb	{isotakotab	PV	make write;dictate
stktb	sotakotib	IV	make write;dictate
;; kitAb_1
ktAb	kitAb	Ndu	book
ktb	kutub	N	books
;; kitAboxAnap_1
ktAbxAn	kitAboxAn	NapAt	library;bookstore
ktbxAn	kutuboxAn	NapAt	library;bookstore
;; kutubiy~_1
ktby	kutubiy~	Ndu	book-related
;; kutubiy~_2
ktby	kutubiy~	Ndu	bookseller
ktby	kutubiy~	Nap	booksellers     <pos>kutubiy~/NOUN</pos>
;; kut~Ab_1
ktAb	kut~Ab	N	kuttab (village school);Quran school
ktAtyb	katAtiyb	Ndip	kuttab (village schools);Quran schools
;; kutay~ib_1
ktyb	kutay~ib	NduAt	booklet
;; kitAbap_1
ktAb	kitAb	Nap	writing
;; kitAbap_2
ktAb	kitAb	Napdu	essay;piece of writing
ktAb	kitAb	NAt	writings;essays
;; kitAbiy~_1
ktAby	kitAbiy~	N-ap	writing;written     <pos>kitAbiy~/ADJ</pos>
;; katiybap_1
ktyb	katiyb	Napdu	brigade;squadron;corps
ktA}b	katA}ib	Ndip	brigades;squadrons;corps
ktA}b	katA}ib	Ndip	Phalangists
;; katA}ibiy~_1
ktA}by	katA}ibiy~	Nall	brigade;corps     <pos>katA}ibiy~/NOUN</pos>
ktA}by	katA}ibiy~	Nall	brigade;corps     <pos>katA}ibiy~/ADJ</pos>
;; katA}ibiy~_2
ktA}by	katA}ibiy~	Nall	Phalangist     <pos>katA}ibiy~/NOUN</pos>
ktA}by	katA}ibiy~	Nall	Phalangist     <pos>katA}ibiy~/ADJ</pos>
;; makotab_1
mktb	makotab	Ndu	bureau;office;department
mkAtb	makAtib	Ndip	bureaus;offices
;; makotabiy~_1
mktby	makotabiy~	N-ap	office     <pos>makotabiy~/ADJ</pos>
;; makotabap_1
mktb	makotab	NapAt	library;bookstore
mkAtb	makAtib	Ndip	libraries;bookstores
;; mikotAb_1
mktAb	mikotAb	Ndu	printer
;; mukAtabap_1
mkAtb	mukAtab	NapAt	correspondence
;; {ikotitAb_1
<kttAb	{ikotitAb	N/At	enrollment;registration;subscription
AkttAb	{ikotitAb	N/At	enrollment;registration;subscription
;; {isotikotAb_1
<stktAb	{isotikotAb	N/At	dictation
AstktAb	{isotikotAb	N/At	dictation
<stktAby	{isotikotAbiy~	N-ap	dictation     <pos>{isotikotAbiy~/ADJ</pos>
AstktAby	{isotikotAbiy~	N-ap	dictation     <pos>{isotikotAbiy~/ADJ</pos>
;; kAtib_1
kAtb	kAtib	N/ap	writer;author
kAtb	kAtib	N/ap	clerk
ktAb	kut~Ab	N	authors;writers
ktb	katab	Nap	authors;writers
;; kAtib_2
kAtb	kAtib	Nall	writing     <pos>kAtib/ADJ</pos>
;; makotuwb_1
mktwb	makotuwb	N-ap	written     <pos>makotuwb/ADJ</pos>
;; makotuwb_2
mktwb	makotuwb	Ndu	letter;message
mkAtyb	makAtiyb	Ndip	letters;messages
;; mukAtib_1
mkAtb	mukAtib	Nall	correspondent;reporter
;; mukotatib_1
mkttb	mukotatib	Nall	subscriber

The format is slightly different since we have a line beginning by ;; whose purpose is to provide a lemma identifier. The remaining is similar however.

We will notice that the grammatical category is often missing since it can be extrapolated from the morphological category. In some cases however, we will have some examples where the grammatical category is to be explicited because, for example, morphological categories like nisbas, which are morphologically nominal, may have adjectival usages.

Notice that Buckwalter's transliteration system is everywhere.
Fixme (you !)
I'm looking for Perl scripts that would help to convert this text format to XML (in UTF-8 if possible so that every kind of gloss could be typed-in, in particular european accented ones like Védrine or Schröder) and conversely.
This would help in a direct processing of the dictionaries in arabic rather than through the Buckwalter's transliteration system, thus taking profit from Java's native Unicode support.
The version 2.0 of the Aramorph's Perl version, uses XML dictionaries, but is unfortunately not compliant with the GPL.