ASCII-ize!

Both a transliterator and a diacritic remover. For Latin letters, removes diacritics. For other alphabets, does some kind of trans­literation – optionally with a couple of very useful non-ASCII letters and symbols that are a bit unwieldy otherwise. For symbols, either finds a semantic approxi­mation or a graphical one. For CJK characters, just prints the code point.

Methodology:

For letters in scripts that have a pre-existing trans­literation scheme into basic ASCII, follows that. If there isn't one, an ad-hoc trans­literation scheme is improvised, typically following examples set by Unicode character names. In any case, trans­literation attempts to be by meaning or sound, not graphical appearance.

For symbols: if a symbol has a clear meaning that's easily representable in ASCII, it's turned into that: ≠ becomes =/=, ± becomes +/-, ÷ becomes /, © becomes (c) and so on. For some math symbols, their HTML entity name or LaTeX name is used: ∈ becomes isin and ∴ becomes there4. For other symbols, an attempt is made at a rendition in plain ASCII, but there are clear limits: ← and → become <- and ->, but ↑ and ↓ become /|\ and \|/, and ░▒▓ become %X#.

For Chinese and Japanese characters, Egyptian hieroglyphs and Sumerian characters, trans­literation breaks down and this system only prints out the Unicode code point of the characters.

If the "Allow a very limited set of non-ASCII symbols" checkbox is checked, a small set of non-ASCII characters are enabled. These have been carefully selected and are used only when trans­literating other scripts and graphical symbols, and have been specifically chosen to provide further context for ambiguous trans­literations – for example, ê and ô are enabled to trans­literate Greek η and ω, distin­guishing them from ε and ο. If this is turned off, sensible pure-ASCII trans­literations are provided anyway, but they might be uglier and/or more ambiguous.

The extra characters:
· ‘ ’ ‾ ■ □ ∧ ∨ ÂâÊêÎîÔôÛû ÖöÜü Č芚Žž Əə
Toggle font