.NET regex and Cyrillic (unicode)

Some days ago I had to validate a Cyrillic text using regex class. Of course there is no way to do it using “[а-зА-З]{3}” (for example) . Good thing is that .net supports Unicode in it’s regex engine.
Let’s try to match that string “абв” in Cyrillic. To be able to do that, we have to know the Unicode mapping for each letter in the validated string. For the current example Unicode mapping is:

а - 0430
б - 0431
в – 0432

To create a regex pattern for Unicode validation is required to add \u to the pattern. So to match “а” we need that pattern \u0430. To match the whole string the pattern will be (obviously) \u0430\u0431\u0432
Let see:



Match match = Regex.Match("абв", "\u0430\u0431\u0432");

if(match.Success)
{
Console.WriteLine("We did it :D");
}



Output:
We did it :D

Of course using unicode in regex It’s not only limited to single characters. You can use ranges and so on…

0 коментара:

Публикуване на коментар

top