Fontconfig is a library that many graphical programs use to figure out what font to use. A program can ask Fontconfig for a font matching a pattern and Fontconfig will return a font which may or may not be "anything like the requested pattern". The important thing is that we can mess with this matching process using configuration files—for example to set default fonts in a way that will work across many programs.
The fc-match(1)
utility that ships with Fontconfig can be used to test what fonts
are returned for a given pattern:
$ fc-match
Ubuntu-R.ttf: "Ubuntu" "Regular"
$ fc-match serif
RobotoSlab-Regular.ttf: "Roboto Slab" "Regular"
Since fc-match
uses "the normal fontconfig matching rules", the above
output implies that (with my configuration) a program that (reasonably) wants to use the
most default font possible will use Ubuntu, and a program that wants the default serif
font will use Roboto Slab.
I use members of the Ubuntu family as the default sans-serif and monospace typefaces and Roboto Slab as the default serif one. Google's Noto font family is the fallback for missing characters and emoji.
There are some nuances that make this more tricky than it sounds.
Major tangent: Han unification
If your system is configured well and has the necessary fonts installed, you will see two similar but non-identical Han characters here:
- 練
- 練
Unicode does not. It assigns the same code point (number) to both characters. The
only reason they (hopefully) look distinct is that I added lang
attributes to the
list items. Those two characters can only coexist when additional metadata is
provided—possible on a webpage, but try copying both characters into your browser's
address bar, a text editor, or a terminal: I bet they'll look the
same.[1]
Relying on additional metadata for correct rendering of text seems like a weird choice in hindsight, but the consequence that's relevant here is this: choosing a fallback font also determines which variant of some Han characters will appear in contexts that lack language metadata. Plainly using Noto Serif or Noto Sans apparently means that the Japanese kanji forms are used. I suppose this is because the Noto fonts for Japanese are alphabetically first among their respective groups of language-specific Noto CJK fonts:
$ fc-match -a sans-serif | grep '"Noto Sans CJK .*" "Regular"'
NotoSansCJK-Regular.ttc: "Noto Sans CJK JP" "Regular"
NotoSansCJK-Regular.ttc: "Noto Sans CJK KR" "Regular"
NotoSansCJK-Regular.ttc: "Noto Sans CJK SC" "Regular"
NotoSansCJK-Regular.ttc: "Noto Sans CJK TC" "Regular"
Curiously, each font "does support all four languages and includes the complete set
of glyphs". Notice how the above four fonts even all resolve to the same one
file, NotoSansCJK-Regular.ttc
. A special OpenType feature allows programs that
support it to "access language-specific variants other than the default language". (I guess getting a glyph from an OpenType font file is much more complicated than
just asking for a code point.)
Anyway. I want traditional Chinese characters when no metadata is available, so I'm using Noto Serif CJK TC and Noto Sans CJK TC.
It's good to know the order in which Fontconfig loads configuration files. There usually
are lots in /etc/fonts/conf.d/
and they interfere with user-specific configuration. The
only explanation I've found is in the Tuning Fontconfig section of Beyond Linux
From Scratch: files in /etc/fonts/conf.d/
have names starting with a two-digit
number followed by a hyphen and smaller numbers are loaded first.
Loading files from the configuration paths specified by fonts-conf(5)
isn't
intrinsic behavior of Fontconfig. Instead, the master /etc/fonts/fonts.conf
file
contains <include>
directives. On my
system,[2] it only includes files in
/etc/fonts/conf.d/
, but in there is 50-user.conf
which includes (among other things)
~/.config/fontconfig/fonts.conf
. The takeaway is that the user-specific
configuration here is loaded sort of after one half and before one half of the system-wide
configuration files.
My configuration file started off based on the one in this section of the
Fonts ArchWiki article. The important parts are <alias>
elements
such as:
<alias>
<family>sans-serif</family>
<prefer>
<family>Ubuntu</family>
<family>Noto Sans CJK TC</family>
<family>Noto Color Emoji</family>
<family>Noto Sans</family>
</prefer>
</alias>
The element says: prepend those four font families to the list of best-matching fonts in
that order when "sans-serif" is requested. My fonts.conf
consists of such <alias>
elements for "serif", "sans-serif", and
"monospace".[3]
This works but I ran into one problem. Something else was also prepending "Noto Sans"
with the effect that it ended up at the very top of the sans-serif font list. The same
thing happened for serif and monospace fonts. I identified
30-infinality-aliases.conf
, which I got from the fonts-meta-extended-lt
package, as the culprit. It does this:
<alias>
<family>sans-serif</family>
<prefer><family>Noto Sans</family></prefer>
</alias>
But wait! How can 30-infinality-aliases.conf
override an alias that
is ultimately included from 50-user.conf
? Well, there are two ways in
which one may prepend fonts with Fontconfig and <prefer>
ing is syntactic sugar for
inserting before the matching <family>
but not actually at the top.
30-infinality-aliases.conf
does this before my own configuration and consequently it
wins.[4]
I forked 30-infinality-aliases.conf
and removed
the problematic lines.
We can test the results with the -s
flag of fc-match
:
$ fc-match -s serif | head -4
RobotoSlab-Regular.ttf: "Roboto Slab" "Regular"
NotoSerifCJK-Regular.ttc: "Noto Serif CJK TC" "Regular"
NotoColorEmoji.ttf: "Noto Color Emoji" "Regular"
NotoSerif-Regular.ttf: "Noto Serif" "Regular"
$ fc-match -s sans-serif | head -4
Ubuntu-R.ttf: "Ubuntu" "Regular"
NotoSansCJK-Regular.ttc: "Noto Sans CJK TC" "Regular"
NotoColorEmoji.ttf: "Noto Color Emoji" "Regular"
NotoSans-Regular.ttf: "Noto Sans" "Regular"
$ fc-match -s monospace | head -3
UbuntuMono-R.ttf: "Ubuntu Mono" "Regular"
NotoSansCJK-Regular.ttc: "Noto Sans Mono CJK TC" "Regular"
NotoSansMono-Regular.ttf: "Noto Sans Mono" "Regular"
🙂
Here are most of the articles and other resources that I referenced, as well some more that are relevant and interesting:
Fontconfig
- I stared into the fontconfig, and the fontconfig stared back at me by Eevee
- The Tuning Fontconfig section of Beyond Linux From Scratch
- The Fonts, Font configuration, and Font configuration/Examples ArchWiki articles
- The Fontconfig Wikipedia article
fonts-conf(5)
(this seems to be the primary documentation of Fontconfig)fc-match(1)
- Examples of Fontconfig configuration files you can probably find in
/etc/fonts/conf.d/
Unicode
- Joel Spolsky's blog post about Unicode with a very long title
- Around the 🌎 with Unicode by Nora Sandler
- The Unicode HOWTO at
docs.python.org
- The Secret Life of Unicode by Suzanne Topping
- Unicode Revisited by Steven J. Searle
- Jonathan New's article about the JavaScript
length
of emoji - The Unicode, Han unification, and Variant Chinese character Wikipedia articles
Other
- The Noto fonts Wikipedia article
- This help page about Noto CJK fonts
- There may be another way: "Variation Selector format characters [...] are used to specify a specific glyph variant for a Unicode character, such as the Japanese, Chinese, Korean, or Taiwanese form of a particular CJK ideograph."
- Did I tell you I use Arch Linux?
-
The semantics of Fontconfig's XML schema are documented in
fonts-conf(5)
. -
I think
30-infinality-aliases.conf
disregards the conventional naming scheme in doing so: "generic aliases" should appear in files with numbers 60 to 69 (see the various files section).