Code units, points and graphemes — oh my!
This interactive tool lets you interrogate what lies beneath your favorite emoji and illustrates one of my favorite esoteric bug stories about invalid surrogate pairs.
Type or paste an emoji to see how JavaScript represents it internally.
| Character | |
| .length | |
| Code point(s) | |
| UTF-16 code units | |
| Grapheme count | |
| Surrogate pair? |
.slice() PlaygroundPick an emoji and drag the handles to see what .slice(start, end) produces. Watch what happens when you land mid-surrogate-pair.
Default: ๐จโ๐จโ๐งโ๐ง sliced at (6, 8) pulls out ๐ง โ a complete, valid emoji extracted from inside the family. One code unit in either direction and you'll orphan a surrogate or leak a ZWJ. Drag the handles and give try it.
See what other "compound emoji" you can unpack!
.length = 11.length LiesJavaScript uses UTF-16 internally. Characters above U+FFFF need two 16-bit code units (a surrogate pair). .length counts code units, not characters.
A valid surrogate pair is a high surrogate (D800-DBFF) followed by a low surrogate (DC00-DFFF). Drag cards to reorder, or click the × to disable a code unit. Watch what breaks.