This module handles Unicode characters and its code points. For all people those who do not satisfied with built-in ord and unichr.
codepoint.codepoint(c)Return the code point of the Unicode character c
Notice that some Unicode characters may be expressed with a couple
of other code points ("surrogate pair"). This function treats
surrogate pairs as representations of original code points; e.g.
codepoint.codepoint(u'\ud842\udf9f') returns 134047 (0x20b9f).
u'\ud842\udf9f' is a surrogate pair expression which means
u'\U00020b9f'.
codepoint.unichr(cp)Return the unicode character for the code point as integer
Notice that some Unicode characters may be expressed with a
couple of other code points ("surrogate pair"). This function may
return a unicode object of which length is more than two; e.g.
codepoint.unichr(0x20b9f) returns u'\U00020b9f' while built-in unichr() may
raise ValueError.
codepoint.characters(s)Return the list of Unicode characters in the unicode string s
The number of iteration may differ from the len(s), because some
characters may be represented as a couple of other code points
("surrogate pair").
codepoint.py is in the public domain.
Comments are always welcomed. You can keep in touch with me in my guestbook page or by e-mail <mshibata at emptypage.jp>.