Emojis and String.Length in C#
Spencer Schneidenbach
—December 16, 2017
Are you using String.Length to compute the length of a string that might include emojis?
If you compute String.Length for such a string, you may not get back exactly what you expect:
var str = "👶👶👶👶🍼🍼";
Console.WriteLine(str.Length); //what do you think will be written?
This will write 12 to the screen. What were YOU expecting?
This happens because C# strings are UTF-16 by default and Unicode characters that fall outside of these 16-bit lengths are stored as surrogate pairs, or two chars that represent one 32-bit character.
However, you may be wanting the number of Unicode characters, not the actual length of the char array, in which case you should use System.Globalization.StringInfo
to compute the length of your strings. Like so:
var str = "👶👶👶👶🍼🍼";
var stringInfo = new System.Globalization.StringInfo(str);
Console.WriteLine(stringInfo.LengthInTextElements);
This will yield what you're looking for: 6
Want more reading? Check out Joel Spolsky's very excellent article on strings and encoding. Remember, there is NO such thing as plain text!