You are here
Using Lexical tools to convert Unicode characters to ASCII
Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in most of the worlds writing systems. It is widely used in multilingual NLP (natural language processing) projects. On the other hand, there are some NLP projects still only dealing with ASCII characters. This paper describes methods of utilizing lexical tools to convert Unicode characters (UTF-8) to ASCII (7-bit) characters.