Unicode is a standard that maps certain values (=numbers) to a meaning. Those mappings are called codepoints. Let’s take the turtle as an example: “U+1F422 TURTLE”, here we can see the number 128034 (1F422 is its hexadecimal representation) is a codepoint, it has for example the name “TURTLE” associated. That’s what it does at this point, it associates a meaning, 128034 should represent a turtle, 41 (U+0041 LATIN CAPITAL LETTER A) represents an uppercase latin ‘A’ and so on.
A font maps symbols or pictograms to codepoints. These days those usually are vector graphics on the technical level.
An encoding specifies how a sequence of codepoints is encoded to a sequence of bytes. A byte is a sequence of 8 bits. A bit can be one of two numbers, 0 and 1, thus 8 bits (= 1 byte) can represent two to the eight numbers, 256 different numbers, 0-255. As you can see Unicode defines a lot more numbers than 256, so you need to combine multiple bytes in a certain way to reach those. An encoding specifies on how to do that. For Unicode the most common encoding is UTF-8.
So to get our turtle from a byte sequence somewhere stored in a database and then transmitted through the internet to your browser, we need to go through these three things. Decode the byte sequence using UTF-8 to Unicode codepoints and then use a font to map those codepoints to some vector graphic.
With post parsing I refer to the process that turns your Markdown into HTML, turns your hashtags into links you can click on, turns your mentions into links you can click on and so on. Things like Twemoji would hook into that process and search for the codepoint of the turtle, then replace it with an HTML image tag or something like that.
And for the last one I just repeat myself once more: I don’t like those images, they look to childish and flashy to me and they don’t integrate nice into the text flow, since the font rendering system can’t influence them. I’m not saying that the fonts look amazingly beautiful, but they look better to me than the image sets I’ve seen.
@jhass I see, so there isn’t that much of a problem after all. I guess I can go on and close the proposal, assuming that everyone agrees for full Unicode support.
However, we’ll now have to decide whether we should merge @dumitruursu’s work into Diaspora*.
So, I guess someone can create a pull request to merge this fork by @dumitruursu into diaspora*?
Already done before you even opened your proposal: https://github.com/diaspora/diaspora/pull/5530
The technical change is about supporting all of Unicode codepoints, something we most certainly want, it’s just rather unnecessarily argued with emoji support, since that’s what @dumitruursu was working on while hitting that issue. To add to the confusion he accidentally pushed his twemoji changes to that branch too, which was undone in the meantime though.
@jhass Oh! That’s nice.
@dumitruursu Can the Twemoji changes be found on another branch though?
@gp yes, there is another branch https://github.com/dimaursu/diaspora/tree/twemoji ; note - this does not work yet, I could not find the right hooks into the backbone views.
@jhass got it right, I wanted emoji, and it turned out that on a pod with Posgres as the database, unicode emoji worked fine, but MySQL has issue - utf8 in MySQL is misleading, it’s not “real” utf8, just a subset, more specifically 3-byte per character UTF, when the standard has 4.
Now, the problems gets a bit more complicated - I did a conversion of the database, and it worked fine - in the meantime, I found a issue while making the database from scratch - MySQL needs a ton of attention to have this feature work properly. I will solve that issue soon.
The “bird’s eye view” of the problem is like this:
All is fine when using Postgres, and soon will be like this with MySQL too. That means, in the database, we will have utf-8 strings, emojis included (the simplified ones, not the graphical fancy ones).
To make emojis pretty we then can either:
Patch the font we are using, with something more apealing (that’s kinda hard, and the designers will be limited) - it will look similar with FontAwesome, that kind of things.
use a library like twemoji, replacing in the view with img-tags. That’s what’s happening in the branch I posted above.
@dumitruursu That’s very good work, thank you.
I really want emojis to be cross-platform, and I would prefer them to be colored. However, I admit that it is a preference, and not a necessity. (Though it is a fact that emojis are popularly known to be colored and not monochrome.)
Can font-based emojis be colored? If not, I would support the use of Twemoji.
… but feel free to share all the possible libraries and fonts that we can choose from. For now, I only know Google Noto Emoji (which I think cannot be used), Twemoji (which is used by WordPress and Twitter) and Emoji One.
Apple’s emoji characters sure are not open source. I wonder why GitHub uses them…
Not yet. In the next version of Unicode (now we have version 7, version 8 will have “skin tones”).
The only reasonable solution is to use a library like twemoji at this moment. It’s the most customizable too, it’s way harder to edit fonts.
I can think of another issues with a font-based solution - even if you will never use a emoji in your life, you will have to download a huge font, containing them.
@dumitruursu True! I know that using a lot of fonts on a website can affect page load. A colored emoji font would take a heck of a time to load…
Customization is an advantage too.
@dumitruursu Would Twemoji be hosted directly into diaspora* or would it fetch it from Twitter’s servers?
Are we ready for another Loomio proposal? I would think of something along the lines of…
Diaspora* should replace Unicode emoji characters with images from a library like Twemoji
What do you think? Too vague, too precise? Wrong terms? Too long question? I think we’ve had enough repetitions in the comments to make the details quite clear to anyone.
On this line, if we specify a “base” url like so, we use our servers. If we don’t, then MaxCDN is used. That’s the default, MaxCDN.
@dumitruursu I see. Such a feature in such a short line, that’s wonderful
Alright, so here are a few samples from Twitter’s and WordPress’ Twemoji library.
According to WordPress’ article, there are 872 emojis converted to images so far.
They were designed by The Iconfactory.
Proposal: Diaspora* should replace Unicode emoji characters with images from a library like Twemoji
Agree: You want diaspora* by default to replace Unicode emoji characters such as “” (mouse) with an equivalent image that displays consistently on all platforms (e.g. https://abs.twimg.com/emoji/v1/72x72/1f42d.png for the mouse).
Disagree: You want diaspora* by default to keep Unicode emoji characters as font characters, even though that might make some of these characters unintelligible on certain platforms (e.g. “” would look like a square instead of a turtle on some platforms). Further discussion would decide whether a monochrome font should be implemented to support all emojis on all platforms.
Outcome: Diaspora will, by default, keep Unicode emoji characters as font characters. However, since almost 47.5% of votes support emoji library support, the feature can be offered as an opt-in on an individual basis once it is ready.
Further decisions may discuss about emoji fonts, emoji image libraries, and choice of such technologies.
Note: This proposal was imported from Loomio. Vote details, some comments and metadata were not imported. Click here to view the proposal with all details on Loomio.
I extended the voting time for this one to include two weekends, since it might get rather controversial.
Could Symbola be used as a web font?
I personally believe the previous decision should have had a longer voting time. Remember that the Diaspora community is global - with that voting time, a lot of people were excluded from voting due to time zone (me being one of them).
Please try to keep this in mind when creating decisions in the future, @gp