codepoint.py

Unicode code point utility for Python

Published:
2012-05-01

This module handles Unicode characters and its code points. For all people those who do not satisfied with built-in ord and unichr.

Download

Requirements

Functions

codepoint.codepoint(c)

Return the code point of the Unicode character c

Notice that some Unicode characters may be expressed with a couple of other code points ("surrogate pair"). This function treats surrogate pairs as representations of original code points; e.g. codepoint.codepoint(u'\ud842\udf9f') returns 134047 (0x20b9f). u'\ud842\udf9f' is a surrogate pair expression which means u'\U00020b9f'.

codepoint.unichr(cp)

Return the unicode character for the code point as integer

Notice that some Unicode characters may be expressed with a couple of other code points ("surrogate pair"). This function may return a unicode object of which length is more than two; e.g. codepoint.unichr(0x20b9f) returns u'\U00020b9f' while built-in unichr() may raise ValueError.

codepoint.characters(s)

Return the list of Unicode characters in the unicode string s

The number of iteration may differ from the len(s), because some characters may be represented as a couple of other code points ("surrogate pair").

License

codepoint.py is in the public domain.

History

2012-05-04 (r2093)
Dedicated to the public domain.
2012-05-01 (r2085)
The first release.

Comments are always welcomed. You can keep in touch with me in my guestbook page or by e-mail <mshibata at emptypage.jp>.