-------------------------------------------------------------------------- Malayalam International HOWTO ============================= This HOWTO is just _STARTED_ I am planning to put all details about Malayalm computing in free Operating systems here. The introduction is written by Mahesh T Pai. If you can contribute to other sections please contact me at : baiju@freeshell.org Contents :- 1. Introduction 2. Input Methods 3. Font 4. Renderer 5. Locale 6. Applications 7. Translations 8. Speach synthesizer 9. Speach Recognizer 10. OCR 11. Standards 12. Miscelleneous 13. Conclusion 14. Appendix A 15. Appendix B 1. Introduction :- Author : Mahesh T Pai ----------------------------------------- Majority of the Malayalam speaking population world wide lives in Kerala, a state situated at the south-western tip of Indian peninsula. Kerala is the most developed state in India, with close to 100% literacy, one of the highest per capita incomes amongst Indian states and a Human Development Index comparable to most western countries. Though more than 95% of the little more than 30 million population speaks Malayalam as their mother tongue, English is still one of the official languages of the Kerala State, and a large number of schools in the State continue impart education in the English medium. Most of the poplulation speaks English, and a foreigners visiting the State will never encounter any kind of language problem. The judicial system still continues to use English language, except for recording evidence. Endowed with lush greenery, a long coast line, we prefer to call our State "God's Own Contry". The Malayalam language is spoken by more than 35 million people all around the world. They call themselves 'Malayalees'. The language has a well established script with the alphabet divided into vowels, consonants, and a special group of pure consonants called "chillus". The phonetic characteristics of Malayalam script is similar to that of most Indic languages. Code points 0D00 - 0D4F of Malayalam Code set have one-to-one phonetic correspondence for vowels, consonants and vowel modifiers at code points 0900 to 094F of Devanagari Code Set. Hence, transliteration between Malayalam and other Indian (Indic) languages is not much difficult. The unicode standards for Malayalam V. 3.00 consists of 14 vowels and 36 consonants, with code points from U+0D00 to U+0DFF. In addition to these, the Malayalam script has 5 characters, known as 'chillus'. As on date of writing, there is a proposal to include chillus into the unicode standard. The traditional Malayalam script used to have close to 900 glyphs. (On the difference between character and glyph, see .... ). Nowadays, a standard malayalam font file would consist of nearly 120 glyphs. To render the 50 Unicode characters into these glyphs, a computer should have a special kind of software called "rendering engine" installed. More details about rendering engines can be found below in section 4. 2, Input Methods :- The standared input method which is working properly is inscript based. Malayalam inscript keyboard layout is standardised by Kerala government. If any hardware companies planning to make a Malayalam keyboard, I think they will follow this standared. It is very easy to impliment this layout in X Window System by using a XKB file and a secial Compose file for 'chillu' characters. (In future, Compose file may not be required for this, read standardization section below....wait) All about this input method is described at : http://www.nongnu.org/smc/docs/input-methods.html A transliteration based input method will be available very soon. It is written only for GTK+ appps. To impliment other keyboard layout like Modular (Typewriter) it is necessary to hack on XIM or in GTK input method. Applications like Yudit and Varamozhi supports transliteration based input methods. (Can anyone please give me more details about it) (If anyone is going to hack on XIM or IIMF for advanced malayalam input methods, please inform me ) Anyone interested to make a Table Input Method (TIM) for Malayalam? Visit : http://wenju.sf.net I am planning to make this section the biggest in this this documenti, yes only for few years ;-) 3. Font :- TrueType is one of the widely used font format in X Window system. TrueType font is a collection of TrueType vector outlines along with some standared tables. Now OpenType font is going to be more popular in coming years. OpenType defines some advanced tables like GSUB and GPOS for a better support of international languages. In unicode TTF fonts, basic characters of a language is positioned at alloted unicode range for that language. Private/Corporate use area can be use for positioning of other glyphs in a language. Malayalam unicode range is from U+0D00 to U+0D7F. Other glyphs of Malayalam is posioned at Private/Corporate area in TTF font. In OTF font, other glyps are unencoded, and it can access by OpenType tables. GPLed TTF and OTF fonts are available at downloads section. This font is created by Jeroen Hellingman using Metafont and later N.V.Shaji converted into TTF font and now OpenType tables support is added. There is total 136 glyphs . And it is released under GNU General Public License. (I will expand this section soon) 4. Renderer :- For BDF and TTF fonts a font renderer is required to display our glyphs properly. Pango is a font renderer for GTK+ toolkit. Now a pango module for GTK+ is available. GTK+ toolkit is used by GNOME and lots of other applications, so Malayalam font will render properly in all GTK+ applications. For OpenType fonts, OpenType Layout support (or shaper ?) is required reordering of some symbols. 5. Locale :- For any language, locale database is necessary to make *COMPUTING* possible. Locale contains informations about contry (calender,flag etc.) and language fetures (sorting order, dates etc.). These all informations will be stored in unicode values. Now allmost all tables are ready for Malayalam, even LC_COLLATE, it is the sorting order table, ofcourse it requires to test. 6. Applications All GNOME 2.x applications will support Malayalam properly (with proper rendering). 7. Translations Now translation of GNOME glossary is started, all other translations will be based on this. 8. Speach synthesizer I collected a some links, if you interested, go through this : http://forum.gnu.org.in/Volunteers/Members/baiju/bookmarks 9. Speach Recognizer http://forum.gnu.org.in/Volunteers/Members/baiju/bookmarks 10. OCR If you interested continue this project :- http://sourceforge.net/projects/mocr 11. Standards Recently Ministry of IT (Govt. of India) proposed some changes in Malayalam unicode character set, they suggested to include 'chillus' as basic basic building blocks. Now 'chillus' are represented by three unicode characters. 12. Miscelleneous Lots of miscelleneous things are there.... ...wait. -------------------------------------------------------------------------- Copyright (C) 2002 SMC Project Team Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved. Updated: $Date: 2002/10/25 08:15:24 $ $Author: baijum81 $ --------------------------------------------------------------------------