Programming

Unicode

Unicode is a standard to encode text. It assigns unique code points to characters.

β€œU+” means it’s a unicode code point in hex

UTF-8

Byte 1 Byte 2 Byte 3 Byte 4 Free Bits Max Unicode Value
0xxxxxxx Β  Β  Β  7 2^7-1
110xxxxx 10xxxxxx Β  Β  (5+6)=11 2^11-1
1110xxxx 10xxxxxx 10xxxxxx Β  (4+6+6)=16 2^16-1
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (3+6+6+6)=21 2^21-1 = 1 114 111

https://www.fileformat.info/info/unicode/utf8.htm