
In this article
What Is a Missing Glyph?
Unicode defines over 150,000 characters. No single font on earth contains all of them. The moment your browser or app needs to draw a character that the current font doesn’t know about, you have a missing glyph.
It’s worth separating two concepts that people often confuse:
| Concept | What it is | Example |
|---|---|---|
| Code point | The abstract Unicode number assigned to a character | U+1F9F8 (Teddy Bear) |
| Glyph | The actual drawn shape that represents the character in a font file | The teddy bear image inside an emoji font |
A font is essentially a dictionary: it maps code points to glyph drawings. If a code point isn’t in the dictionary, the font has nothing to show you. What happens next depends on the rendering pipeline – and that’s where things get interesting.
The Unicode Consortium decides which characters exist. Font designers decide which characters their font draws. These are completely independent decisions. A brand new Unicode character may go years without widespread font support.
The Tofu Problem (□)
When a font encounters a code point it can’t draw, it usually renders a fallback shape in its place. The most common fallback is a plain empty rectangle: □. In typography circles, this is affectionately called tofu – because it’s bland, white, and square.
Google engineers coined the term while building the Noto font family. Their goal was to eliminate tofu entirely – hence the name No Tofu → Noto. The Noto project now covers virtually every Unicode script and is the most comprehensive free font collection in existence.
Different fonts and renderers signal a missing glyph in different ways:
| Symbol | Name | Meaning |
|---|---|---|
| □ | Tofu / empty box | Font has no glyph for this code point |
| ▯ | Tall rectangle | Alternate tofu variant |
| ? | Question mark | Some older renderers substitute a literal ? |
| U+XXXX | Code point literal | Developer tools & terminals often print the raw code point |
| � | Replacement character | U+FFFD – officially means “I couldn’t decode this at all” |
The Replacement Character (U+FFFD – �) is a special case. It’s not a missing glyph in the font-rendering sense; it’s the character Unicode itself tells you to display when incoming bytes couldn’t be decoded – for example, a UTF-8 file with invalid byte sequences. Tofu means “the font doesn’t have it.” � means “I couldn’t even figure out what character this was supposed to be.”
- Character
- REPLACEMENT CHARACTER
- Code Point
- U+FFFD
- Appears when
- Byte sequences are invalid or undecodable for the declared encoding
- Common cause
- Opening a Latin-1 file as UTF-8, or corrupted data
Fallback Fonts & the Font Stack
Modern browsers don’t give up after one font fails. They work through a font stack – an ordered list of fonts to try. The moment a glyph is missing from Font A, the browser silently moves to Font B, then Font C, and so on until it either finds the glyph or runs out of options and shows tofu.
In CSS, you declare this stack explicitly:
css - typical body font stackbody {
font-family:
"Helvetica Neue", /* preferred - high-quality sans */
Arial, /* safe fallback on Windows */
"Liberation Sans", /* open-source equivalent */
sans-serif; /* generic family - OS picks a default */
}
The final keyword – sans-serif, serif, monospace, emoji, etc. – is a generic family. It tells the OS “if nothing else works, pick whatever you think is best for this category.” The OS maps generic families to real installed fonts, which vary by platform.
| Generic family | Windows default | macOS default | Android default |
|---|---|---|---|
sans-serif |
Arial | Helvetica Neue | Roboto |
serif |
Times New Roman | Times New Roman | Noto Serif |
monospace |
Courier New | Courier New | Droid Sans Mono |
emoji |
Segoe UI Emoji | Apple Color Emoji | Noto Color Emoji |
system-ui |
Segoe UI | -apple-system | Roboto |
How the Browser Picks a Font for a Single Character
The font selection process for each character is more granular than most people realise. The browser doesn’t pick one font and apply it to the whole string – it picks a font per character:
what the browser does for "Hello 🌍 مرحبا""H" → try font 1 → found ✓ → use font 1
"e" → try font 1 → found ✓ → use font 1
"l" → try font 1 → found ✓ → use font 1
...
"🌍" → try font 1 → NOT found
→ try font 2 → NOT found
→ try system emoji font → found ✓ → use emoji font
"م" → try font 1 → NOT found
→ try font 2 → NOT found
→ try system Arabic font → found ✓ → use Arabic font
This is why a single sentence can render in three or four different fonts simultaneously, and why mixing scripts in a design requires careful font selection.
If a font has a regular weight but not bold, some browsers will synthesize bold by algorithmically thickening the strokes. The result is usually inferior to a properly designed bold font. You can disable this with font-synthesis: none in CSS if quality matters.
The Unicode Font Fallback Algorithm
Browsers implement the CSS Fonts specification’s font matching algorithm. At a high level it works like this:
- Try each font in the
font-familylist in order. - For each font, check if a glyph exists for the target code point.
- If the glyph is found, use that font for this character.
- If no listed font has the glyph, fall through to the browser’s built-in system fallback list.
- If nothing works: render the .notdef glyph (usually tofu or the font’s own missing-glyph placeholder).
“The font stack is less like a queue and more like a safety net – every character falls through until something catches it.”
Combining Characters
Some Unicode characters aren’t standalone shapes – they’re modifiers that attach to the character before them. These are called combining characters, and they’re a surprisingly common source of display problems.
The classic example is accented letters. The letter é can be encoded two completely different ways:
| Form | Code points | Description | Bytes (UTF-8) |
|---|---|---|---|
| NFC (composed) | U+00E9 |
Single precomposed character “é” | 2 bytes |
| NFD (decomposed) | U+0065 + U+0301 |
Letter “e” + combining acute accent | 3 bytes |
To a human eye – and usually to a browser – these look identical. But to a font renderer, they are fundamentally different challenges. The composed form (U+00E9) is a single glyph that the font either has or doesn’t. The decomposed form requires the renderer to draw e, then overlay a separate combining accent glyph on top, positioning it correctly relative to the base letter.
Not every font includes combining diacritic glyphs, and those that do may not position them correctly for all base characters. You can end up with an accent floating to the right of a letter, stacked too high, or clipping into text above. This is especially common with Arabic, Devanagari, and Southeast Asian scripts that rely heavily on combining marks for correct rendering.
Stacked Combining Characters
You can stack multiple combining characters on a single base. This is occasionally used legitimately (Vietnamese: ộ = o + combining circumflex + combining dot below), but it’s also famously abused to create “Zalgo” text – characters with so many stacked combining marks they overflow their line box and collide with surrounding text:
Zalgo text - legitimate Unicode, chaotic renderingH̷̡̦̋̍͘ę̶͇̹̒l̴͓͊̓l̴͚̻̔̊̓o̸̡̊ ← same "Hello", but with dozens of combining marks
stacked on each base letter
Unicode defines four normalisation forms: NFC, NFD, NFKC, and NFKD. For most text processing, NFC (Canonical Decomposition, followed by Canonical Composition) is the right choice. Normalising to NFC collapses decomposed sequences into precomposed characters wherever possible, reducing rendering inconsistencies and making string comparison reliable. In JavaScript: str.normalize('NFC').
Zero Width Characters
A particularly invisible category of combining characters are zero-width characters – they take up no horizontal space and often don’t render anything visible at all, yet they actively affect how surrounding text behaves:
| Character | Code Point | Effect |
|---|---|---|
| ZWJ | U+200D |
Joins adjacent emoji into a single combined emoji (e.g. 👩💻) |
| ZWNJ | U+200C |
Prevents joining – forces adjacent letters to stay separate in Arabic/Persian |
| ZWSP | U+200B |
Zero-width space – allows line-breaking without a visible space |
| SHY | U+00AD |
Soft hyphen – invisible unless the browser decides to break the line there |
| WJ | U+2060 |
Word joiner – prevents a line break between two characters |
Zero-width characters are invisible but real. They can be inserted into text to watermark it, bypass keyword filters, or confuse text comparisons. If you ever copy-paste text from a web page and find string matching mysteriously failing, invisible characters are a common culprit.
Emoji Rendering Issues
Emoji are where Unicode’s abstract ideals collide hardest with messy reality. Unlike regular characters, emoji have no standardised appearance – only a standardised code point. The laughing-crying face 😂 is U+1F602 on every platform, but what it looks like is entirely up to whoever made the emoji font.
Apple, Google, Samsung, Microsoft, and Twitter/X all draw their own emoji from scratch. The same code point can look dramatically different – sometimes even conveying a different emotion – depending on the platform.
Skin Tone Modifiers
Unicode defines five skin tone modifiers (U+1F3FB through U+1F3FF), based on the Fitzpatrick dermatology scale. When a modifier immediately follows a supporting emoji, the two code points are combined by the renderer into a single skin-toned variant. When the renderer doesn’t support the modifier, you see both characters independently: the base emoji plus a coloured square.
skin tone modifier encoding👍 = U+1F44D (thumbs up, no modifier)
👍🏽 = U+1F44D + U+1F3FD (thumbs up + medium skin tone modifier)
If the font/OS doesn't support the modifier:
→ renders as: 👍 🟫 (two separate code points visible)
ZWJ Sequences
Some emojis are actually sequences of multiple emoji joined by a Zero Width Joiner (U+200D). If the rendering platform recognises the full sequence, it displays a single combined image. If it doesn’t, each component shows separately.
ZWJ sequence example - family emoji👨👩👧👦 = U+1F468 + ZWJ + U+1F469 + ZWJ + U+1F467 + ZWJ + U+1F466
= man + + woman + + girl + + boy
Rendered on a supporting platform: 👨👩👧👦 (one family image)
Rendered on an older platform: 👨 👩 👧 👦 (four separate emoji)
The number of recognised ZWJ sequences has grown from a handful in 2015 to over 1,000 today. Older devices simply don’t know about sequences added after their OS shipped.
Text vs. Emoji Presentation
Many Unicode characters have both a text (monochrome) form and an emoji (colourful) form. Which one you get depends on whether a variation selector follows the character:
| Sequence | Code points | Renders as |
|---|---|---|
| ☎ | U+260E alone |
Black telephone (text style) |
| ☎️ | U+260E + U+FE0F |
🟠 Emoji style (coloured) |
| ☎︎ | U+260E + U+FE0E |
Text style (forced) |
U+FE0F is the emoji variation selector; U+FE0E is the text variation selector. Many emoji look different depending on whether this invisible character is attached. When variation selectors are missing or ignored, platforms make their own decisions – which is why the same character can appear as text on one device and emoji on another.
Flag Emoji
Country flag emoji are encoded as pairs of Regional Indicator letters. The letters A–Z each have a Regional Indicator equivalent (U+1F1E6–U+1F1FF). Pair them correctly – 🇬 + 🇧 – and a supporting platform renders a 🇬🇧 flag. On a platform that doesn’t support flags (notably, Windows doesn’t natively render most flags), you see the two raw regional letters side by side: GB.
Microsoft has historically not shipped flag emoji in Segoe UI Emoji, citing that country flags are politically sensitive. As a result, Windows users often see two-letter codes instead of flag images, unless they’re using a browser (like Chrome) that substitutes its own emoji font.
OS & Version Fragmentation
New Unicode characters are approved regularly – Unicode 17.0 was released in 2025. But a character existing in the Unicode standard doesn’t mean every device can display it. The pipeline from “character approved” to “rendered on your screen” involves several slow-moving links:
the journey from Unicode approval to your screen1. Unicode Consortium approves character (e.g. Unicode 17.0, 2025)
↓
2. Font designers add the glyph (Apple, Google, Microsoft...)
↓
3. Font ships in an OS update
↓
4. User installs OS update
↓
5. Character finally renders ✓
Each step can take months to years.
This means a perfectly valid Unicode character can display on a fully updated iPhone, turn to tofu on an Android that’s two OS versions behind, and show a question mark on an old Windows 10 install that skipped the relevant font update.
| Platform | Emoji font | Update mechanism |
|---|---|---|
| macOS / iOS | Apple Color Emoji | OS updates – tightly coupled to system |
| Windows | Segoe UI Emoji | Windows updates – often delayed or skipped |
| Android | Noto Color Emoji | OS updates or Google Play system updates |
| Chrome / Edge | Bundled Noto (some versions) | Browser auto-updates – often faster than OS |
| Linux | Noto Color Emoji (if installed) | Package manager – manual or distro-dependent |
“A new emoji is only as universal as the oldest device in your audience’s pocket.”
This fragmentation is why the this site exists – it documents what each emoji looks like across every platform and version, so you can check whether your chosen emoji will render consistently for your audience.
How to Fix It – For Users & Developers
Whether you’re a user seeing boxes everywhere or a developer shipping text to millions of people, there are concrete steps you can take.
For Users
Update your OS
Emoji fonts and system character support ship with OS updates. The most common reason for tofu is a device that hasn’t been updated in a year or more.
Install a comprehensive font
Install Noto fonts (free, by Google) or GNU Unifont. Noto covers virtually every Unicode block. Most Linux distros let you install fonts-noto via the package manager.
Use a modern browser
Chrome, Firefox, and Edge all ship with better fallback font logic than older browsers. Chrome in particular bundles Noto emoji and handles complex ZWJ sequences well.
Check your encoding
If you’re seeing � or garbled text, the file was opened with the wrong encoding. In most text editors, you can re-open with UTF-8 specified explicitly.
For Web Developers
Always declare UTF-8 in every HTML document. Without this, browsers have to guess, and they sometimes guess wrong:
html - declare encoding early in <head><meta charset="UTF-8">
Use a thoughtful font stack that covers the scripts your audience uses. If you support a multilingual audience, consider loading subset fonts per language and letting the browser handle the rest via system fallbacks:
css - multilingual-aware font stackbody {
font-family:
"Your Brand Font",
/* Latin fallback */
"Helvetica Neue", Arial,
/* CJK fallback */
"Noto Sans CJK SC", "PingFang SC", "Microsoft YaHei",
/* Arabic fallback */
"Noto Sans Arabic", "Segoe UI",
/* Catch-all */
sans-serif;
}
Use @font-face with unicode-range to load font subsets only when they’re needed. This avoids loading a massive pan-Unicode font for users who only need Latin characters:
css - unicode-range subsetting@font-face {
font-family: "MyFont";
src: url("myfont-latin.woff2");
unicode-range: U+0000-00FF; /* Basic Latin + Latin-1 */
}
@font-face {
font-family: "MyFont";
src: url("myfont-devanagari.woff2");
unicode-range: U+0900-097F; /* Devanagari */
}
Normalise strings server-side. If users input text that may come from different platforms (iOS vs Android vs desktop), normalise to NFC before storing or comparing:
javascript - normalise to NFCconst userInput = req.body.text;
const normalised = userInput.normalize('NFC');
// Now safe to store, compare, and display
Count grapheme clusters, not code points when you need to measure “user-visible length”. JavaScript’s Intl.Segmenter handles this correctly:
javascript - correct character countingconst text = "👩💻 café";
// Wrong approaches:
text.length // → 10 (counts UTF-16 code units)
[...text].length // → 7 (counts code points, still wrong for ZWJ emoji)
// Correct approach:
const segmenter = new Intl.Segmenter();
const segments = [...segmenter.segment(text)];
segments.length // → 7 (counts grapheme clusters as a human would)
When building text-heavy UIs, test with content that includes emoji, CJK characters, Arabic (right-to-left), combining diacritics, and long unbreakable strings. Edge cases in font fallback and layout almost always hide until you throw real diverse content at a design.
For Developers Targeting Older Devices
If your app needs to support users on older OS versions that don’t have recent emoji, consider one of these strategies:
| Strategy | How it works | Trade-off |
|---|---|---|
| Emoji polyfill | Libraries like Twemoji replace text emoji with inline SVG/PNG images from a CDN | ✓ Universal support ✗ Layout shifts, CDN dependency |
| Stick to old emoji | Only use emoji from Unicode 6.0 (2010) – these exist on virtually every device still in use | ✓ Zero render risk ⚠ Limited range |
| Emoji version check | Detect OS version server-side and serve different content | ✓ Precise ✗ Complex to maintain |
| Custom icon font | Map your required symbols to Private Use Area code points in a custom font you control | ✓ Full control ✗ Accessibility concerns, font file overhead |
TL;DR Cheat Sheet
| Problem | Cause | Fix |
|---|---|---|
| Tofu (□) | Font has no glyph for the code point | Install Noto fonts; use a font stack with good fallbacks |
| � everywhere | File decoded with wrong encoding | Force UTF-8; add <meta charset="UTF-8"> |
| Floating accents | Decomposed combining characters; font doesn’t position them | Normalise to NFC; use fonts with good diacritic support |
| Emoji showing as components | OS doesn’t recognise ZWJ sequence or skin tone modifier | Update OS; use Twemoji polyfill for old device support |
| Flags as two letters | Platform doesn’t support Regional Indicator pairs (especially Windows) | Use Twemoji; test cross-platform; consider text alternatives |
| New emoji not showing | Character approved by Unicode but OS font not updated | Check here for support matrix; stick to established emoji |
"😀".length === 2 |
JavaScript strings are UTF-16; emoji are surrogate pairs | Use Intl.Segmenter or spread operator [...str] for code points |
| Same char, different look | Missing or extra variation selector (U+FE0E / U+FE0F) |
Check for invisible variation selectors in source text |
| String compare fails | Mixed NFC / NFD normalisation or invisible zero-width chars | Normalise inputs; strip zero-width characters if not needed |
Characters go missing because fonts are finite and Unicode is vast. Browsers try to bridge the gap with font stacks – falling through fonts until they find a glyph. When nothing works, you get tofu (□). Combining characters and emoji add more complexity: a single visible symbol may be several code points that only newer platforms know how to merge. The fix is layered: declare UTF-8, build smart font stacks, normalise text input, use Intl.Segmenter to count correctly, and test with diverse real-world content.

