Explore Surrogate Pairs

Code units, points and graphemes — oh my!

This interactive tool lets you interrogate what lies beneath your favorite emoji and illustrates one of my favorite esoteric bug stories about invalid surrogate pairs.

Emoji Explorer

Type or paste an emoji to see how JavaScript represents it internally.

Character
.length
Code point(s)
UTF-16 code units
Grapheme count
Surrogate pair?

The .slice() Playground

Pick an emoji and drag the handles to see what .slice(start, end) produces. Watch what happens when you land mid-surrogate-pair.

Default: ๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง sliced at (6, 8) pulls out ๐Ÿ‘ง โ€” a complete, valid emoji extracted from inside the family. One code unit in either direction and you'll orphan a surrogate or leak a ZWJ. Drag the handles and give try it.

See what other "compound emoji" you can unpack!

.length = 11
6 8

Why .length Lies

JavaScript uses UTF-16 internally. Characters above U+FFFF need two 16-bit code units (a surrogate pair). .length counts code units, not characters.

Break It: Invalid Surrogate Pairs

A valid surrogate pair is a high surrogate (D800-DBFF) followed by a low surrogate (DC00-DFFF). Drag cards to reorder, or click the × to disable a code unit. Watch what breaks.