Skip to content
Julien Marcou edited this page Nov 22, 2020 · 12 revisions

Unicode Emoji Wiki

Table of contents

Data source

Components metadata (skin tones & hair styles) :

  • code point
  • group
  • subgroup
  • version

And emojis metadata :

  • code points
  • group
  • subgroup
  • version
  • relationship between a base emoji and its variations
  • components used by a variation (skin tone & hair style)

Are all retrieved based on the fully-qualified emojis and components from the emoji-test.txt file, which is available on Unicode's website (https://unicode.org/Public/emoji/).
The version 13.1 has been used, and because the file's structure has been subject to changes, previous versions are not able to produce the complete data set.

Components & emojis translations :

  • text to speech (description)
  • keywords

Are retrieved from the common/annotations/en.xml and common/annotationsDerived/en.xml files, which are available on Unicode's CLDR (Common Locale Data Repository) (https://github.com/unicode-org/cldr).
The release-38 tag has been used to ensure that the file structure doesn't change over time.

Data consolidation

Some consolidation of the data has been made to make them more useful.

The 🀝 handskake emoji's skin-tone variations (🀝🏻🀝🏼🀝🏽🀝🏾🀝🏿) are missing from Unicode's website and CLDR, and have been manually added to the results.

An additional metadata named category has been added to each emojis to more conventionally reflect emojis grouping on mobile devices, making groups of emojis more balanced. Though it differs a little from Android, it's quite similar to how emojis are grouped on it. If you don't like it, you can make your own grouping logic using the original group and subgroup metadata.

Translations

Only the en (American English) locale is actually provided, as translations are quite heavy (~500kB per locale).

Until a way is found, so that people can load only the required locales to reduce the size of their projects, I recommend generating translations yourself using this repository :
You just need to change the unicodeCldrLocale variable inside the generate-unicode-emoji.cjs file to whatever locale is available on Unicode's CLDR and then run the node generate-unicode-emoji.cjs command.

Fitzpatrick scale

Skin tones are based on the Fitzpatrick scale (https://en.wikipedia.org/wiki/Fitzpatrick_scale) :

Emoji Description Fitzpatrick scale
🏻 Light skin tone Type I and II
🏼 Medium-light skin tone Type III
🏽 Medium skin tone Type IV
🏾 Medium-dark skin tone Type V
🏿 Dark skin tone Type VI

Code points conversion

JavaScript natively supports code points in string using unicode escapes \u.

Code points between U+0000 and U+FFFF doesn't require to be surrounded by {} :

const heartEmoji = '\u2764\uFE0F';
console.log(heartEmoji); // ❀️

Code points greater than U+FFFF (named astral code points) are internally represented as surrogate pairs and need either to be broken down into two code points (the surrogate pair) or to be surrounded by {} :

const grinningEmojiWithSurrogatePair = '\uD83D\uDE00';
console.log(grinningEmojiWithSurrogatePair); // πŸ˜€

const grinningEmojiWithAstralCodePoint = '\u{1F600}'
console.log(grinningEmojiWithAstralCodePoint); // πŸ˜€

I recommend always surrounding the code points with {} to avoid error and improve readability.

const pirateFlagEmoji = '\u{1F3F4}\u{200D}\u{2620}\u{FE0F}';
console.log(pirateFlagEmoji); // πŸ΄β€β˜ οΈ

If you prefer, you can also programmatically retrieve an emoji using an array of code points like this :

const pirateFlagCodePoints = ['1F3F4', '200D', '2620', 'FE0F'];
const pirateFlagEmoji = String.fromCodePoint(
  ...pirateFlagCodePoints.map(codePoint => parseInt(codePoint, 16))
);
console.log(pirateFlagEmoji); // πŸ΄β€β˜ οΈ

And retrieve the code points of an emoji like this :

const pirateFlagEmoji = 'πŸ΄β€β˜ οΈ';
const pirateFlagCodePoints = Array.from(pirateFlagEmoji).map(character => {
  return character.codePointAt(0).toString(16).toUpperCase();
});
console.log(pirateFlagCodePoints); // ['1F3F4', '200D', '2620', 'FE0F']

Code points composition

Complex emojis and emoji's variations often consist of one or more base emojis.

Unicode uses the 200D code point as a ligature code point (zero-width joiner) between two base emojis to combine them :

const blackFlagEmojiEmoji = '\u{1F3F4}';
console.log(blackFlagEmojiEmoji); // 🏴

const skullAndCrossbonesEmoji = '\u{2620}\u{FE0F}';
console.log(skullAndCrossbonesEmoji); // ☠️

const ligatureCodePoint = '\u{200D}';
const pirateFlagEmoji =
  blackFlagEmojiEmoji +
  ligatureCodePoint +
  skullAndCrossbonesEmoji; 
console.log(pirateFlagEmoji); // πŸ΄β€β˜ οΈ

This even works for more complex compositions :

const womanEmoji = '\u{1F469}';
console.log(womanEmoji); // πŸ‘©

const heartEmoji = '\u{2764}\u{FE0F}';
console.log(heartEmoji); // ❀️

const kissEmoji = '\u{1F48B}';
console.log(kissEmoji); // πŸ’‹

const manEmoji = '\u{1F468}';
console.log(manEmoji); // πŸ‘¨

const ligatureCodePoint = '\u{200D}';
const womanAndManKissingEmoji =
  womanEmoji +
  ligatureCodePoint +
  heartEmoji +
  ligatureCodePoint +
  kissEmoji +
  ligatureCodePoint +
  manEmoji;
console.log(womanAndManKissingEmoji); // πŸ‘©β€β€οΈβ€πŸ’‹β€πŸ‘¨

Skin tone components must not use the ligature code point, and be placed directly after base emojis that support skin tone variations, howerver, if the base emojis ends up with the FE0F code point (which serves as a presentation selector), you'll need to remove it first :

// Emoji without presentation selector
const thumbsUpBaseEmoji = '\u{1F44D}';
console.log(thumbsUpBaseEmoji); // πŸ‘

const lightSkinToneComponent = '\u{1F3FB}';
console.log(lightSkinToneComponent); // 🏻

const thumbsUpWithLightSkinToneEmoji =
  thumbsUpBaseEmoji +
  lightSkinToneComponent;
console.log(thumbsUpWithLightSkinToneEmoji); // πŸ‘πŸ»
// Emoji with presentation selector
const victoryHandBaseEmoji = '\u{270C}\u{FE0F}';
console.log(victoryHandBaseEmoji); // ✌️

const darkSkinToneComponent = '\u{1F3FF}';
console.log(darkSkinToneComponent); // 🏿

const presentationSelectorCodePoint = '\u{FE0F}'
const victoryHandWithDarkSkinToneEmoji =
  victoryHandBaseEmoji.replace(presentationSelectorCodePoint, '') +
  darkSkinToneComponent;
console.log(victoryHandWithDarkSkinToneEmoji); // ✌🏿

Now you can combine both, skin tone variations and ligature code points to create even more complex emojis :

const personFacepalmingEmoji = '\u{1F926}'; // Note this is a genderless emoji
console.log(personFacepalmingEmoji); // 🀦

const mediumSkinToneComponent = '\u{1F3FD}';
console.log(mediumSkinToneComponent); // 🏽

const femaleSignEmoji = '\u{2640}\u{FE0F}';
console.log(femaleSignEmoji); // ♀️

const ligatureCodePoint = '\u{200D}';
const womanFacepalmingWithMediumSkinToneEmoji =
  personFacepalmingEmoji +
  mediumSkinToneComponent +
  ligatureCodePoint +
  femaleSignEmoji;
console.log(womanFacepalmingWithMediumSkinToneEmoji); // πŸ€¦πŸ½β€β™€οΈ
const womanEmoji = '\u{1F469}';
console.log(womanEmoji); // πŸ‘©

const mediumLightSkinToneComponent = '\u{1F3FC}';
console.log(mediumLightSkinToneComponent); // 🏼

const handshakeEmoji = '\u{1F91D}';
console.log(handshakeEmoji); // 🀝

const manEmoji = '\u{1F468}';
console.log(manEmoji); // πŸ‘¨

const mediumDarkSkinToneComponent = '\u{1F3FE}';
console.log(mediumDarkSkinToneComponent); // 🏾

const ligatureCodePoint = '\u{200D}';
const womanWithMediumLightSkinToneAndManWithMediumDarkSkinToneHoldingHandsEmoji =
  womanEmoji +
  mediumLightSkinToneComponent +
  ligatureCodePoint +
  handshakeEmoji +
  ligatureCodePoint +
  manEmoji +
  mediumDarkSkinToneComponent;
console.log(womanWithMediumLightSkinToneAndManWithMediumDarkSkinToneHoldingHandsEmoji); // πŸ‘©πŸΌβ€πŸ€β€πŸ‘¨πŸΎοΈ
Clone this wiki locally