Things I Learned About Fonts While Making a Java Font Library (That You Didn’t Want to Know)
Join us as we dive into the nitty gritty of implementing font libraries in Java. See how deeply they're nested, how to optimize them for the web, and more.
Join the DZone community and get the full member experience.
Join For FreeA few months ago, I was working a bit on Pdf2Dom in my free time, which is an open source Java PDF-to-HTML converter. At some point, I bumped into PDFs that were using old and obscure Adobe font formats, which we didn’t support quite yet. So, I started Googling about old Adobe font formats and found out if I wanted a pure Java implementation I was on my own, kid.
I decided to break off the new font features to a project that goes by the catchy name of FontVerter, and I learned a few ugly things about how fonts work.
So here’s some things I learned — some ugly, some not. Among them is a small part of the real root cause for browser font rendering differences (apologies if the first few points are too basic for you, scroll down to the lower ones, where I get deeper into the technical details)
Font Formats Are Nested Up to Four Levels Deep Now
The WOFF font format is a web wrapper around formats like Microsoft’s OpenType, which was an extension of Apple’s TrueType spec and has the ability to wrap Adobe’s CFF fonts, which wraps PostScript Type 1 fonts.
Yep, fonts are just like the above picture of cute little Russian nesting dolls — except usually more confusing and less cute.
I find it rather interesting that newer font specs have just piled onto the existing formats originally created in the 80s by multiple large tech corporations for so long. I can’t think of another non-trivial file format that’s done something like this (though someone will likely tell me I’m wrong and there is one).
What .ttf and .otf File Extensions Really Mean
OpenType is just an extension of the TrueType spec, an OpenType font can use either TrueType outlines or an Adobe CFF font. When CFF is used, the extension is always .otf, and when TrueType outlines are used, it’s .ttf. TrueType only supports true type outlines so it’s file endings are (almost) always .ttf. Though I found whether .ttf or .otf is used to be irrelevant to all the major browsers, as they figure it out just by looking at the file.
TrueType is the older parent of OpenType, so if you have an older actual TrueType spec conforming .ttf font lying around, you're more likely to run into issues getting it to render in browsers — as they’ve stopped caring. Remember the subtle difference between a TrueType font and an OpenType font using TrueType Outlines.
Why Is WOFF for the Web?
WOFF is just a compression wrapper around the old font formats. Before working on this library, all I knew about WOFF was that it was meant for the web. Occasionally, I wondered if WOFF fonts were inferior to True/OpenType, as "web" usually means stripped down and fewer features or more lossy, but I never had time to Google it.
Nope, WOFF is really just a lossless compression wrapper around standard font formats. WOFF 2.0 uses Google’s newfangled Brotli compression, and WOFF 1.0 uses zlib compress2. Brotli compression is superior to the usual GZIP way for web server resources but was adopted much quicker for fonts with WOFF2 than other web resources, like images. No idea why that is or why they felt it warranted another level of font nesting.
Shout out to the amazing WOFF 1 and 2 specs, though. Their clarity and simplicity is a far cry from the True/OpenType and Adobe specs. Since Google’s sfntly isn’t on Maven Central (SHAME!) I decided to just write my own WOFF1 and 2 code for FontVerter, as the spec seemed so simple and clear and, indeed, implementing them was pretty straightforward.
The Specs Are Full of ‘Neat’ Optimizations.
Since the original font specs were made back in the day when rendering fonts could still be a CPU-intensive task, there was a bit of effort in optimizing various parts of the specs to shave off a cycle or two for the renderer or to take a byte off the file size. Modern specs, like WOFF, still have a few tricks to save a byte or two in the non-compressed header area, as, for web optimization, we still care about every last additional bit in the file.
Here’s an example describing an “obscure indexing trick” for a format 4 cmap subtable:
If the idRangeOffset value for the segment is not 0, the mapping of character codes relies on glyphIdArray. The character code offset from startCode is added to the idRangeOffset value. This sum is used as an offset from the current location within idRangeOffset itself to index out the correct glyphIdArray value. This obscure indexing trick works because glyphIdArray immediately follows idRangeOffset in the font file. The C expression that yields the glyph index is:
*(idRangeOffset[i]/2
+ (c – startCount[i])
+ &idRangeOffset[i])
And there’re a few obscure data types they use to shave a byte here and there, like variable length encoded integers in WOFF2:
UIntBase128 is a different variable length encoding of unsigned integers, suitable for values up to 232-1. A UIntBase128 encoded number is a sequence of bytes for which the most significant bit is set for all but the last byte, and clear for the last byte. The number itself is base 128 encoded in the lower 7 bits of each byte. Thus, a decoding procedure for a UIntBase128 is: start with value = 0. Consume a byte, setting value = old value times 128 + (byte bitwise-and 127). Repeat last step until the most significant bit of byte is false.
UIntBase128 encoding format allows a possibility of sub-optimal encoding…
There are also a few places where a font spec tells you to do some calculation based on already-stored values to store in a separate table entry, and my first thought was usually, "Why didn’t they just have the font renderer do that calculation at runtime so the file is slightly smaller?" The answer is that it slightly lowers the load on the end user’s already strained CPU, which is much more important than the programmer adding a few more lines of code and having to wait longer to generate the font, especially in a time when processors like Intel 8088’s were still in use.
The Font Specs Are Complex and Ambiguous
Anyone who has read parts of the major font specs (besides WOFF) usually agrees they were written horribly and are full of potential ambiguity at every turn. Some, but not all of the apparent ambiguousness can be solved be extremely carefully and literally rereading that part of the spec.
It’s like a 6th-grade teacher doing a lesson on ambiguity in technical writing who gives you a sheet of instructions that winds up being very ambiguous, with everyone in the class getting a different final answer.
You also have a number of properties that are repeated in separate tables with differing clarity from the spec on which to read first or what to do if they don’t match.
Most people notice the font rendering differences between Windows and Linux, and if you’re really picky about fonts, you’ve likely noticed it’s different, even in separate browsers. A small part of the reason for that is if you look at the font code for Chrome and Firefox, their answers to ambiguity in the spec can differ. I partly enjoyed discovering some of the real reasons for font rendering differences with some hands-on work, but I also partly hated it, as I just wanted my converted fonts to work already.
They Have Their Own Instruction Set
TrueType fonts contain glyphs that are made up of glyph outline paths and a separate glyph program. That program is written in the font's own special instruction set with the usual pop(0x21), push(0xB0), add, etc., along with ones specific to font rendering. Which means any TTF font renderer needs it’s own virtual machine/interpreter implementation.
Firefox and Chrome also, very helpfully, have differences between their font program interpreters. I had an old TTF font that rendered perfectly in Firefox but not at all in Chrome — and gave no errors in the console. I traced the issue back down to having something to do with Chrome processing one of the 100+ instructions for certain glyph programs in the font differently, and that’s when I got tired of fonts and decided to take a break from FontVerter.
Conclusion
If you’re like me, you likely constantly fiddle with what font to use for your text editor or IDE. Consolas or Inconsolata or maybe Source Code Pro? I could never decide. But after working with fonts on such a low level, I no longer care. The default Consolas will do, just leave me alone you fonts, you!
(However, I did recently write a blog article about the best programming fonts recently, because I’m on a challenge to write a blog post every day, and I had nothing to write about that day.)
Thank you for reading!
Published at DZone with permission of Maddie Abboud. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments