Both a transliterator and a diacritic remover. For Latin letters, removes diacritics. For other alphabets, does some kind of transliteration – optionally with a couple of very useful non-ASCII letters and symbols that are a bit unwieldy otherwise. For symbols, either finds a semantic approximation or a graphical one. For CJK characters, just prints the code point.
For letters in scripts that have a pre-existing transliteration scheme into basic ASCII, follows that. If there isn't one, an ad-hoc transliteration scheme is improvised, typically following examples set by Unicode character names. In any case, transliteration attempts to be by meaning or sound, not graphical appearance.
For symbols: if a symbol has a clear meaning that's easily representable in ASCII, it's turned into that: ≠ becomes =/=
, ± becomes +/-
, ÷ becomes /
, © becomes (c)
and so on. For some math symbols, their HTML entity name or LaTeX name is used: ∈ becomes isin
and ∴ becomes there4
. For other symbols, an attempt is made at a rendition in plain ASCII, but there are clear limits: ← and → become <-
and ->
, but ↑ and ↓ become /|\
and \|/
, and ░▒▓ become %X#
.
For Chinese and Japanese characters, Egyptian hieroglyphs and Sumerian characters, transliteration breaks down and this system only prints out the Unicode code point of the characters.
If the "Allow a very limited set of non-ASCII symbols" checkbox is checked, a small set of non-ASCII characters are enabled. These have been carefully selected and are used only when transliterating other scripts and graphical symbols, and have been specifically chosen to provide further context for ambiguous transliterations – for example, ê and ô are enabled to transliterate Greek η and ω, distinguishing them from ε and ο. If this is turned off, sensible pure-ASCII transliterations are provided anyway, but they might be uglier and/or more ambiguous.