Why Some Characters Don't Display in Your Browser or Font

Published on April 10, 2026By adminCategory Characters

What Is a Missing Glyph?

Unicode defines over 150,000 characters. No single font on earth contains all of them. The moment your browser or app needs to draw a character that the current font doesn’t know about, you have a missing glyph.

It’s worth separating two concepts that people often confuse:

Concept	What it is	Example
Code point	The abstract Unicode number assigned to a character	`U+1F9F8` (Teddy Bear)
Glyph	The actual drawn shape that represents the character in a font file	The teddy bear image inside an emoji font

A font is essentially a dictionary: it maps code points to glyph drawings. If a code point isn’t in the dictionary, the font has nothing to show you. What happens next depends on the rendering pipeline – and that’s where things get interesting.

💡 Unicode ≠ Font

The Unicode Consortium decides which characters exist. Font designers decide which characters their font draws. These are completely independent decisions. A brand new Unicode character may go years without widespread font support.

Section 02

The Tofu Problem (□)

When a font encounters a code point it can’t draw, it usually renders a fallback shape in its place. The most common fallback is a plain empty rectangle: □. In typography circles, this is affectionately called tofu – because it’s bland, white, and square.

⚠️ Why “tofu”?

Google engineers coined the term while building the Noto font family. Their goal was to eliminate tofu entirely – hence the name No Tofu → Noto. The Noto project now covers virtually every Unicode script and is the most comprehensive free font collection in existence.

Different fonts and renderers signal a missing glyph in different ways:

Symbol	Name	Meaning
□	Tofu / empty box	Font has no glyph for this code point
▯	Tall rectangle	Alternate tofu variant
?	Question mark	Some older renderers substitute a literal `?`
U+XXXX	Code point literal	Developer tools & terminals often print the raw code point
�	Replacement character	`U+FFFD` – officially means “I couldn’t decode this at all”

The Replacement Character (U+FFFD – �) is a special case. It’s not a missing glyph in the font-rendering sense; it’s the character Unicode itself tells you to display when incoming bytes couldn’t be decoded – for example, a UTF-8 file with invalid byte sequences. Tofu means “the font doesn’t have it.” � means “I couldn’t even figure out what character this was supposed to be.”

Character: REPLACEMENT CHARACTER
Code Point: U+FFFD
Appears when: Byte sequences are invalid or undecodable for the declared encoding
Common cause: Opening a Latin-1 file as UTF-8, or corrupted data

Section 03

Fallback Fonts & the Font Stack

Modern browsers don’t give up after one font fails. They work through a font stack – an ordered list of fonts to try. The moment a glyph is missing from Font A, the browser silently moves to Font B, then Font C, and so on until it either finds the glyph or runs out of options and shows tofu.

In CSS, you declare this stack explicitly:

css - typical body font stackbody {
  font-family:
    "Helvetica Neue",   /* preferred - high-quality sans */
    Arial,              /* safe fallback on Windows */
    "Liberation Sans",  /* open-source equivalent */
    sans-serif;         /* generic family - OS picks a default */
}

The final keyword – sans-serif, serif, monospace, emoji, etc. – is a generic family. It tells the OS “if nothing else works, pick whatever you think is best for this category.” The OS maps generic families to real installed fonts, which vary by platform.

Generic family	Windows default	macOS default	Android default
`sans-serif`	Arial	Helvetica Neue	Roboto
`serif`	Times New Roman	Times New Roman	Noto Serif
`monospace`	Courier New	Courier New	Droid Sans Mono
`emoji`	Segoe UI Emoji	Apple Color Emoji	Noto Color Emoji
`system-ui`	Segoe UI	-apple-system	Roboto

How the Browser Picks a Font for a Single Character

The font selection process for each character is more granular than most people realise. The browser doesn’t pick one font and apply it to the whole string – it picks a font per character:

what the browser does for "Hello 🌍 مرحبا""H"  → try font 1 → found ✓ → use font 1
"e"  → try font 1 → found ✓ → use font 1
"l"  → try font 1 → found ✓ → use font 1
...
"🌍" → try font 1 → NOT found
     → try font 2 → NOT found
     → try system emoji font → found ✓ → use emoji font
"م"  → try font 1 → NOT found
     → try font 2 → NOT found
     → try system Arabic font → found ✓ → use Arabic font

This is why a single sentence can render in three or four different fonts simultaneously, and why mixing scripts in a design requires careful font selection.

💡 Font synthesis

If a font has a regular weight but not bold, some browsers will synthesize bold by algorithmically thickening the strokes. The result is usually inferior to a properly designed bold font. You can disable this with font-synthesis: none in CSS if quality matters.

The Unicode Font Fallback Algorithm

Browsers implement the CSS Fonts specification’s font matching algorithm. At a high level it works like this:

Try each font in the font-family list in order.
For each font, check if a glyph exists for the target code point.
If the glyph is found, use that font for this character.
If no listed font has the glyph, fall through to the browser’s built-in system fallback list.
If nothing works: render the .notdef glyph (usually tofu or the font’s own missing-glyph placeholder).

“The font stack is less like a queue and more like a safety net – every character falls through until something catches it.”

Section 04

Combining Characters

Some Unicode characters aren’t standalone shapes – they’re modifiers that attach to the character before them. These are called combining characters, and they’re a surprisingly common source of display problems.

The classic example is accented letters. The letter é can be encoded two completely different ways:

Form	Code points	Description	Bytes (UTF-8)
NFC (composed)	`U+00E9`	Single precomposed character “é”	2 bytes
NFD (decomposed)	`U+0065` + `U+0301`	Letter “e” + combining acute accent	3 bytes

To a human eye – and usually to a browser – these look identical. But to a font renderer, they are fundamentally different challenges. The composed form (U+00E9) is a single glyph that the font either has or doesn’t. The decomposed form requires the renderer to draw e, then overlay a separate combining accent glyph on top, positioning it correctly relative to the base letter.

⚠️ The combining diacritic problem

Not every font includes combining diacritic glyphs, and those that do may not position them correctly for all base characters. You can end up with an accent floating to the right of a letter, stacked too high, or clipping into text above. This is especially common with Arabic, Devanagari, and Southeast Asian scripts that rely heavily on combining marks for correct rendering.

Stacked Combining Characters

You can stack multiple combining characters on a single base. This is occasionally used legitimately (Vietnamese: ộ = o + combining circumflex + combining dot below), but it’s also famously abused to create “Zalgo” text – characters with so many stacked combining marks they overflow their line box and collide with surrounding text:

Zalgo text - legitimate Unicode, chaotic renderingH̷̡̦̋̍͘ę̶͇̹̒l̴͓͊̓l̴͚̻̔̊̓o̸̡̊ ← same "Hello", but with dozens of combining marks
                  stacked on each base letter

✅ Normalisation to the rescue

Unicode defines four normalisation forms: NFC, NFD, NFKC, and NFKD. For most text processing, NFC (Canonical Decomposition, followed by Canonical Composition) is the right choice. Normalising to NFC collapses decomposed sequences into precomposed characters wherever possible, reducing rendering inconsistencies and making string comparison reliable. In JavaScript: str.normalize('NFC').

Zero Width Characters

A particularly invisible category of combining characters are zero-width characters – they take up no horizontal space and often don’t render anything visible at all, yet they actively affect how surrounding text behaves:

Character	Code Point	Effect
ZWJ	`U+200D`	Joins adjacent emoji into a single combined emoji (e.g. 👩‍💻)
ZWNJ	`U+200C`	Prevents joining – forces adjacent letters to stay separate in Arabic/Persian
ZWSP	`U+200B`	Zero-width space – allows line-breaking without a visible space
SHY	`U+00AD`	Soft hyphen – invisible unless the browser decides to break the line there
WJ	`U+2060`	Word joiner – prevents a line break between two characters

Zero-width characters are invisible but real. They can be inserted into text to watermark it, bypass keyword filters, or confuse text comparisons. If you ever copy-paste text from a web page and find string matching mysteriously failing, invisible characters are a common culprit.

Section 05

Emoji Rendering Issues

Emoji are where Unicode’s abstract ideals collide hardest with messy reality. Unlike regular characters, emoji have no standardised appearance – only a standardised code point. The laughing-crying face 😂 is U+1F602 on every platform, but what it looks like is entirely up to whoever made the emoji font.

⚠️ Emoji are not standardised images

Apple, Google, Samsung, Microsoft, and Twitter/X all draw their own emoji from scratch. The same code point can look dramatically different – sometimes even conveying a different emotion – depending on the platform.

Skin Tone Modifiers

Unicode defines five skin tone modifiers (U+1F3FB through U+1F3FF), based on the Fitzpatrick dermatology scale. When a modifier immediately follows a supporting emoji, the two code points are combined by the renderer into a single skin-toned variant. When the renderer doesn’t support the modifier, you see both characters independently: the base emoji plus a coloured square.

skin tone modifier encoding👍      = U+1F44D                 (thumbs up, no modifier)
👍🏽     = U+1F44D + U+1F3FD       (thumbs up + medium skin tone modifier)

If the font/OS doesn't support the modifier:
→ renders as: 👍 🟫  (two separate code points visible)

ZWJ Sequences

Some emojis are actually sequences of multiple emoji joined by a Zero Width Joiner (U+200D). If the rendering platform recognises the full sequence, it displays a single combined image. If it doesn’t, each component shows separately.

ZWJ sequence example - family emoji👨‍👩‍👧‍👦  = U+1F468 + ZWJ + U+1F469 + ZWJ + U+1F467 + ZWJ + U+1F466
       = man  +      + woman +      + girl  +      + boy

Rendered on a supporting platform: 👨‍👩‍👧‍👦  (one family image)
Rendered on an older platform:      👨 👩 👧 👦  (four separate emoji)

The number of recognised ZWJ sequences has grown from a handful in 2015 to over 1,000 today. Older devices simply don’t know about sequences added after their OS shipped.

Text vs. Emoji Presentation

Many Unicode characters have both a text (monochrome) form and an emoji (colourful) form. Which one you get depends on whether a variation selector follows the character:

Sequence	Code points	Renders as
☎	`U+260E` alone	Black telephone (text style)
☎️	`U+260E` + `U+FE0F`	🟠 Emoji style (coloured)
☎︎	`U+260E` + `U+FE0E`	Text style (forced)

U+FE0F is the emoji variation selector; U+FE0E is the text variation selector. Many emoji look different depending on whether this invisible character is attached. When variation selectors are missing or ignored, platforms make their own decisions – which is why the same character can appear as text on one device and emoji on another.

Flag Emoji

Country flag emoji are encoded as pairs of Regional Indicator letters. The letters A–Z each have a Regional Indicator equivalent (U+1F1E6–U+1F1FF). Pair them correctly – 🇬 + 🇧 – and a supporting platform renders a 🇬🇧 flag. On a platform that doesn’t support flags (notably, Windows doesn’t natively render most flags), you see the two raw regional letters side by side: GB.

💡 Windows and flag emoji

Microsoft has historically not shipped flag emoji in Segoe UI Emoji, citing that country flags are politically sensitive. As a result, Windows users often see two-letter codes instead of flag images, unless they’re using a browser (like Chrome) that substitutes its own emoji font.

Section 06

OS & Version Fragmentation

New Unicode characters are approved regularly – Unicode 17.0 was released in 2025. But a character existing in the Unicode standard doesn’t mean every device can display it. The pipeline from “character approved” to “rendered on your screen” involves several slow-moving links:

the journey from Unicode approval to your screen1. Unicode Consortium approves character (e.g. Unicode 17.0, 2025)
        ↓
2. Font designers add the glyph (Apple, Google, Microsoft...)
        ↓
3. Font ships in an OS update
        ↓
4. User installs OS update
        ↓
5. Character finally renders ✓

Each step can take months to years.

This means a perfectly valid Unicode character can display on a fully updated iPhone, turn to tofu on an Android that’s two OS versions behind, and show a question mark on an old Windows 10 install that skipped the relevant font update.

Platform	Emoji font	Update mechanism
macOS / iOS	Apple Color Emoji	OS updates – tightly coupled to system
Windows	Segoe UI Emoji	Windows updates – often delayed or skipped
Android	Noto Color Emoji	OS updates or Google Play system updates
Chrome / Edge	Bundled Noto (some versions)	Browser auto-updates – often faster than OS
Linux	Noto Color Emoji (if installed)	Package manager – manual or distro-dependent

“A new emoji is only as universal as the oldest device in your audience’s pocket.”

This fragmentation is why the this site exists – it documents what each emoji looks like across every platform and version, so you can check whether your chosen emoji will render consistently for your audience.

Section 07

How to Fix It – For Users & Developers

Whether you’re a user seeing boxes everywhere or a developer shipping text to millions of people, there are concrete steps you can take.

For Users

Fix 01

Update your OS

Emoji fonts and system character support ship with OS updates. The most common reason for tofu is a device that hasn’t been updated in a year or more.

Fix 02

Install a comprehensive font

Install Noto fonts (free, by Google) or GNU Unifont. Noto covers virtually every Unicode block. Most Linux distros let you install fonts-noto via the package manager.

Fix 03

Use a modern browser

Chrome, Firefox, and Edge all ship with better fallback font logic than older browsers. Chrome in particular bundles Noto emoji and handles complex ZWJ sequences well.

Fix 04

Check your encoding

If you’re seeing � or garbled text, the file was opened with the wrong encoding. In most text editors, you can re-open with UTF-8 specified explicitly.

For Web Developers

Always declare UTF-8 in every HTML document. Without this, browsers have to guess, and they sometimes guess wrong:

html - declare encoding early in <head><meta charset="UTF-8">

Use a thoughtful font stack that covers the scripts your audience uses. If you support a multilingual audience, consider loading subset fonts per language and letting the browser handle the rest via system fallbacks:

css - multilingual-aware font stackbody {
  font-family:
    "Your Brand Font",
    /* Latin fallback */
    "Helvetica Neue", Arial,
    /* CJK fallback */
    "Noto Sans CJK SC", "PingFang SC", "Microsoft YaHei",
    /* Arabic fallback */
    "Noto Sans Arabic", "Segoe UI",
    /* Catch-all */
    sans-serif;
}

Use @font-face with unicode-range to load font subsets only when they’re needed. This avoids loading a massive pan-Unicode font for users who only need Latin characters:

css - unicode-range subsetting@font-face {
  font-family: "MyFont";
  src: url("myfont-latin.woff2");
  unicode-range: U+0000-00FF; /* Basic Latin + Latin-1 */
}

@font-face {
  font-family: "MyFont";
  src: url("myfont-devanagari.woff2");
  unicode-range: U+0900-097F; /* Devanagari */
}

Normalise strings server-side. If users input text that may come from different platforms (iOS vs Android vs desktop), normalise to NFC before storing or comparing:

javascript - normalise to NFCconst userInput = req.body.text;
const normalised = userInput.normalize('NFC');

// Now safe to store, compare, and display

Count grapheme clusters, not code points when you need to measure “user-visible length”. JavaScript’s Intl.Segmenter handles this correctly:

javascript - correct character countingconst text = "👩‍💻 café";

// Wrong approaches:
text.length              // → 10  (counts UTF-16 code units)
[...text].length         // → 7   (counts code points, still wrong for ZWJ emoji)

// Correct approach:
const segmenter = new Intl.Segmenter();
const segments = [...segmenter.segment(text)];
segments.length          // → 7   (counts grapheme clusters as a human would)

✅ Test with real content

When building text-heavy UIs, test with content that includes emoji, CJK characters, Arabic (right-to-left), combining diacritics, and long unbreakable strings. Edge cases in font fallback and layout almost always hide until you throw real diverse content at a design.

For Developers Targeting Older Devices

If your app needs to support users on older OS versions that don’t have recent emoji, consider one of these strategies:

Strategy	How it works	Trade-off
Emoji polyfill	Libraries like Twemoji replace text emoji with inline SVG/PNG images from a CDN	✓ Universal support ✗ Layout shifts, CDN dependency
Stick to old emoji	Only use emoji from Unicode 6.0 (2010) – these exist on virtually every device still in use	✓ Zero render risk ⚠ Limited range
Emoji version check	Detect OS version server-side and serve different content	✓ Precise ✗ Complex to maintain
Custom icon font	Map your required symbols to Private Use Area code points in a custom font you control	✓ Full control ✗ Accessibility concerns, font file overhead

Section 08

TL;DR Cheat Sheet

Problem	Cause	Fix
Tofu (□)	Font has no glyph for the code point	Install Noto fonts; use a font stack with good fallbacks
� everywhere	File decoded with wrong encoding	Force UTF-8; add `<meta charset="UTF-8">`
Floating accents	Decomposed combining characters; font doesn’t position them	Normalise to NFC; use fonts with good diacritic support
Emoji showing as components	OS doesn’t recognise ZWJ sequence or skin tone modifier	Update OS; use Twemoji polyfill for old device support
Flags as two letters	Platform doesn’t support Regional Indicator pairs (especially Windows)	Use Twemoji; test cross-platform; consider text alternatives
New emoji not showing	Character approved by Unicode but OS font not updated	Check here for support matrix; stick to established emoji
`"😀".length === 2`	JavaScript strings are UTF-16; emoji are surrogate pairs	Use `Intl.Segmenter` or spread operator `[...str]` for code points
Same char, different look	Missing or extra variation selector (`U+FE0E` / `U+FE0F`)	Check for invisible variation selectors in source text
String compare fails	Mixed NFC / NFD normalisation or invisible zero-width chars	Normalise inputs; strip zero-width characters if not needed

🎯 The one-paragraph summary

Characters go missing because fonts are finite and Unicode is vast. Browsers try to bridge the gap with font stacks – falling through fonts until they find a glyph. When nothing works, you get tofu (□). Combining characters and emoji add more complexity: a single visible symbol may be several code points that only newer platforms know how to merge. The fix is layered: declare UTF-8, build smart font stacks, normalise text input, use Intl.Segmenter to count correctly, and test with diverse real-world content.

Why Some Characters Don’t Display in Your Browser or Font

In this article