UnidecodeSharp

US-ASCII transliterations of Unicode text. It supports almost all unicode letters, including Chinese, Cyrillic, Umlauts and etc. For more details please look at Perl description

Generally, idea is:
("\u5317\u4EB0").Unidecode() == "Bei Jing "

Background

Unidecode Sharp is a port from Python Unidecode that itself port from Perl unidecode.
(there are also PHP and Ruby implementations available)

Current implementation is compatible with .NET 3.5 (because of the generics and extension method - feel free to change it) and of course will work on Mono environment.

In russian

Для информации на русском, используйте мою домашнюю страницу

Solution Content

Unidecoder class have only one extension method - Unidecode. Method signature is:
public static string Unidecode(this string input)

There are some python scripts in the solution (Items project):
  • makeCS.py - makes CS file from Python replacement table files.
  • makeXml.py - makes XML file from Python replacement table files.
Generally you don't need them. They are left only in case of update.

Current replacement table is generated from: Unidecode 0.04.1

Usage

[Test]
public void PythonTest()
{
	Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode());

	Assert.AreEqual("'\"\r\n", "'\"\r\n".Unidecode());
	Assert.AreEqual("CZSczs", "CZSczs".Unidecode());
	Assert.AreEqual("a", "?".Unidecode());
	Assert.AreEqual("a", "?".Unidecode());
	Assert.AreEqual("a", "а".Unidecode());
	Assert.AreEqual("chateau", "ch\u00e2teau".Unidecode());
	Assert.AreEqual("vinedos", "vi\u00f1edos".Unidecode());
}

Last edited May 7, 2010 at 5:09 PM by ikutsin, version 5