The m17n Library 1.8.4
Loading...
Searching...
No Matches
Macros | Functions | Variables
Charset

Charset objects and API for them. More...

Macros

#define MCHAR_INVALID_CODE
 Invalid code-point.
 

Functions

MSymbol mchar_define_charset (const char *name, MPlist *plist)
 
MSymbol mchar_resolve_charset (MSymbol symbol)
 Resolve charset name.
 
int mchar_list_charset (MSymbol **symbols)
 List symbols representing charsets.
 
int mchar_decode (MSymbol charset_name, unsigned code)
 Decode a code-point.
 
unsigned mchar_encode (MSymbol charset_name, int c)
 Encode a character code.
 
int mchar_map_charset (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg)
 Call a function for all the characters in a specified charset.
 

Variables

MSymbol Mcharset
 

Variables: Symbols representing a charset.

Each of the following symbols represents a predefined charset.

MSymbol Mcharset_ascii
 Symbol representing the charset ASCII.
 
MSymbol Mcharset_iso_8859_1
 Symbol representing the charset ISO/IEC 8859/1.
 
MSymbol Mcharset_unicode
 Symbol representing the charset Unicode.
 
MSymbol Mcharset_m17n
 Symbol representing the largest charset.
 
MSymbol Mcharset_binary
 Symbol representing the charset for ill-decoded characters.
 

Variables: Parameter keys for mchar_define_charset().

These are the predefined symbols to use as parameter keys for the function mchar_define_charset() (which see).

MSymbol Mmethod
 
MSymbol Mdimension
 
MSymbol Mmin_range
 
MSymbol Mmax_range
 
MSymbol Mmin_code
 
MSymbol Mmax_code
 
MSymbol Mascii_compatible
 
MSymbol Mfinal_byte
 
MSymbol Mrevision
 
MSymbol Mmin_char
 
MSymbol Mmapfile
 
MSymbol Mparents
 
MSymbol Msubset_offset
 
MSymbol Mdefine_coding
 
MSymbol Maliases
 

Variables: Symbols representing charset methods.

These are the predefined symbols that can be a value of the Mmethod parameter of a charset used in an argument to the mchar_define_charset() function.

A method specifies how code-points and character codes are converted. See the documentation of the mchar_define_charset() function for the details.

MSymbol Moffset
 
MSymbol Mmap
 Symbol for the map type method of charset.
 
MSymbol Munify
 Symbol for the unify type method of charset.
 
MSymbol Msubset
 
MSymbol Msuperset
 Symbol for the superset type method of charset.
 

Detailed Description

Charset objects and API for them.

The symbol Mcharset.

The m17n library uses charset objects to represent a coded character sets (CCS). The m17n library supports many predefined coded character sets. Moreover, application programs can add other charsets. A character can belong to multiple charsets.

The m17n library distinguishes the following three concepts:

Each charset object defines how characters are converted between code-points and character codes. To encode means converting code-points to character codes and to decode means converting character codes to code-points.



Any decoded M-text has a text property whose key is the predefined symbol Mcharset. The name of Mcharset is "charset".

Macro Definition Documentation

◆ MCHAR_INVALID_CODE

#define MCHAR_INVALID_CODE

Invalid code-point.

The macro MCHAR_INVALID_CODE gives the invalid code-point.

Function Documentation

◆ mchar_define_charset()

MSymbol mchar_define_charset ( const char *  name,
MPlist plist 
)

◆ mchar_resolve_charset()

MSymbol mchar_resolve_charset ( MSymbol  symbol)

Resolve charset name.

The mchar_resolve_charset() function returns symbol if it represents a charset. Otherwise, canonicalize symbol as to a charset name, and if the canonicalized name represents a charset, return it. Otherwise, return Mnil.

◆ mchar_list_charset()

int mchar_list_charset ( MSymbol **  symbols)

List symbols representing charsets.

The mchar_list_charsets() function makes an array of symbols representing a charset, stores the pointer to the array in a place pointed to by symbols, and returns the length of the array.

◆ mchar_decode()

int mchar_decode ( MSymbol  charset_name,
unsigned  code 
)

Decode a code-point.

The mchar_decode() function decodes code-point code in the charset represented by the symbol charset_name to get a character code.

Return value:
If decoding was successful, mchar_decode() returns the decoded character code. Otherwise it returns -1.
See Also:
mchar_encode()

◆ mchar_encode()

unsigned mchar_encode ( MSymbol  charset_name,
int  c 
)

Encode a character code.

The mchar_encode() function encodes character code c to get a code-point in the charset represented by the symbol charset_name.

Return value:
If encoding was successful, mchar_encode() returns the encoded code-point. Otherwise it returns MCHAR_INVALID_CODE.
See Also:
mchar_decode()

◆ mchar_map_charset()

int mchar_map_charset ( MSymbol  charset_name,
void(*)(int from, int to, void *arg)  func,
void *  func_arg 
)

Call a function for all the characters in a specified charset.

The mcharset_map_chars() function calls func for all the characters in the charset named charset_name. A call is done for a chunk of consecutive characters rather than character by character.

func receives three arguments: from, to, and arg. from and to specify the range of character codes in charset. arg is the same as func_arg.

Return value:
If the operation was successful, mcharset_map_chars() returns 0. Otherwise, it returns -1 and assigns an error code to the external variable merror_code.
Errors:
MERROR_CHARSET

Variable Documentation

◆ Mcharset_ascii

MSymbol Mcharset_ascii

Symbol representing the charset ASCII.

The symbol Mcharset_ascii has name "ascii" and represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).

◆ Mcharset_iso_8859_1

MSymbol Mcharset_iso_8859_1

Symbol representing the charset ISO/IEC 8859/1.

The symbol Mcharset_iso_8859_1 has name "iso-8859-1" and represents the charset ISO/IEC 8859-1:1998.

◆ Mcharset_unicode

MSymbol Mcharset_unicode

Symbol representing the charset Unicode.

The symbol Mcharset_unicode has name "unicode" and represents the charset Unicode.

◆ Mcharset_m17n

MSymbol Mcharset_m17n

Symbol representing the largest charset.

The symbol Mcharset_m17n has name "m17n" and represents the charset that contains all characters supported by the m17n library.

◆ Mcharset_binary

MSymbol Mcharset_binary

Symbol representing the charset for ill-decoded characters.

The symbol Mcharset_binary has name "binary" and represents the fake charset which the decoding functions put to an M-text as a text property when they encounter an invalid byte (sequence).

See Code Conversion for more details.

◆ Mmethod

MSymbol Mmethod

◆ Mdimension

MSymbol Mdimension

◆ Mmin_range

MSymbol Mmin_range

◆ Mmax_range

MSymbol Mmax_range

◆ Mmin_code

MSymbol Mmin_code

◆ Mmax_code

MSymbol Mmax_code

◆ Mascii_compatible

MSymbol Mascii_compatible

◆ Mfinal_byte

MSymbol Mfinal_byte

◆ Mrevision

MSymbol Mrevision

◆ Mmin_char

MSymbol Mmin_char

◆ Mmapfile

MSymbol Mmapfile

◆ Mparents

MSymbol Mparents

◆ Msubset_offset

MSymbol Msubset_offset

◆ Mdefine_coding

MSymbol Mdefine_coding

◆ Maliases

MSymbol Maliases

◆ Moffset

MSymbol Moffset
@brief Symbol for the offset type method of charset.

The symbol #Moffset has the name <tt>"offset"</tt> and, when used
as a value of @b Mmethod parameter of a charset, it means that the
conversion of code-points and character codes of the charset is
done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR
where, MIN-CODE is a value of @b Mmin_code parameter of the charset,
and MIN-CHAR is a value of @b Mmin_char parameter.   

◆ Mmap

MSymbol Mmap

Symbol for the map type method of charset.

The symbol Mmap has the name "map" and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up. The map must be given by Mmapfile parameter.

◆ Munify

MSymbol Munify

Symbol for the unify type method of charset.

The symbol Munify has the name "unify" and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up and offsetting. The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous character code space for all characters is assigned.

If the map has an entry for a code-point, the conversion is done by looking up the map. Otherwise, the conversion is done by this calculation:

CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE
where, MIN-CODE is a value of @b Mmin_code parameter of the charset,
and LOWEST-CHAR-CODE is the lowest character code of the assigned
code space.   

◆ Msubset

MSymbol Msubset
@brief Symbol for the subset type method of charset.

The symbol #Msubset has the name <tt>"subset"</tt> and, when used
as a value of @b Mmethod parameter of a charset, it means that the
charset is a subset of a parent charset.  The parent charset must
be given by @b Mparents parameter.  The conversion of code-points
and character codes of the charset is done conceptually by this
calculation:
CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET
where, PARENT-CODE is a pseudo function that returns a character
code of CODE-POINT in the parent charset, and SUBSET-OFFSET is a
value given by @b Msubset_offset parameter.   

◆ Msuperset

MSymbol Msuperset

Symbol for the superset type method of charset.

The symbol Msuperset has the name "superset" and, when used as a value of Mmethod parameter of a charset, it means that the charset is a superset of parent charsets. The parent charsets must be given by Mparents parameter.

◆ Mcharset

MSymbol Mcharset

m17n-lib Home