UnidecodeSharp
US-ASCII transliterations of Unicode text. It supports almost all unicode letters, including Chinese, Cyrillic, Umlauts and etc. For more details please look at
Perl descriptionGenerally, idea is:
("\u5317\u4EB0").Unidecode() == "Bei Jing "
Background
Unidecode Sharp is a port from
Python Unidecode that itself port from
Perl unidecode.
(there are also
PHP and
Ruby implementations available)
Current implementation is compatible with .NET 3.5 (because of the generics and extension method - feel free to change it) and of course will work on
Mono environment.
In russian
Для информации на русском, используйте мою
домашнюю страницу
Solution Content
Unidecoder class have only one extension method -
Unidecode. Method signature is:
public static string Unidecode(this string input)
There are some python scripts in the solution (Items project):
- makeCS.py - makes CS file from Python replacement table files.
- makeXml.py - makes XML file from Python replacement table files.
Generally you don't need them. They are left only in case of update.
Current replacement table is generated from: Unidecode 0.04.1
Usage
[Test]
public void PythonTest()
{
Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());
Assert.AreEqual("'\"\r\n", "'\"\r\n".Unidecode());
Assert.AreEqual("CZSczs", "CZSczs".Unidecode());
Assert.AreEqual("a", "?".Unidecode());
Assert.AreEqual("a", "?".Unidecode());
Assert.AreEqual("a", "а".Unidecode());
Assert.AreEqual("chateau", "ch\u00e2teau".Unidecode());
Assert.AreEqual("vinedos", "vi\u00f1edos".Unidecode());
}