This module handles Unicode characters and its code points. For all people those who do not satisfied with built-in ord
and unichr
.
codepoint.codepoint(c)
Return the code point of the Unicode character c
Notice that some Unicode characters may be expressed with a couple
of other code points ("surrogate pair"). This function treats
surrogate pairs as representations of original code points; e.g.
codepoint.codepoint(u'\ud842\udf9f')
returns 134047
(0x20b9f
).
u'\ud842\udf9f'
is a surrogate pair expression which means
u'\U00020b9f'
.
codepoint.unichr(cp)
Return the unicode character for the code point as integer
Notice that some Unicode characters may be expressed with a
couple of other code points ("surrogate pair"). This function may
return a unicode object of which length is more than two; e.g.
codepoint.unichr(0x20b9f)
returns u'\U00020b9f'
while built-in unichr()
may
raise ValueError
.
codepoint.characters(s)
Return the list of Unicode characters in the unicode string s
The number of iteration may differ from the len(s)
, because some
characters may be represented as a couple of other code points
("surrogate pair").
codepoint.py is in the public domain.
Comments are always welcomed. You can keep in touch with me in my guestbook page or by e-mail <mshibata at emptypage.jp>.