Can't post with non-BMP characters such as emoji

Found an issue with the phpBB system here at NESdev? Use this forum to report problems.

Moderator: Moderators

Post Reply
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Can't post with non-BMP characters such as emoji

Post by tepples »

In this post, I was trying to make a distinction between SNROM with battery and SNROM without battery by including the Unicode character BATTERY (U+1F50B). I can preview the post just fine:
Preview works fine.
Preview works fine.
sgemoji.png (756 Bytes) Viewed 17589 times
But when I try to post I get this error:
General Error
SQL ERROR [ mysqli ]

Incorrect string value: '\xF0\x9F\x94\x8B\x0A\x0A...' for column 'post_text' at row 1 [1366]
it appears MySQL by default does not support UTF-8 code unit sequences that correspond to Unicode code points outside the Basic Multilingual Plane (U+0000 through U+FFFD). In GTK+ applications, this character can be typed with Ctrl-Shift-U 1f50b Space. A related question on Stack Overflow is “Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC? which implies that certain settings will need to be flipped from utf8 (BMP-only UTF-8) to utf8mb4 (UTF-8 including extra planes, the NES 2.0 of Unicode), which was introduced in MySQL 5.5.
User avatar
Dwedit
Posts: 4922
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Can't post with non-BMP characters such as emoji

Post by Dwedit »

I would have mistaken the battery graphic for a logic gate or something.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Can't post with non-BMP characters such as emoji

Post by tepples »

It's not just the battery. Other characters, such as the emoji that inspired several emblems in the NES homebrew game Concentration Room, can't be posted because the column for the text of a post is set to CHARACTER SET utf8 instead of CHARACTER SET utf8mb4. Part of the reason for this is that forums.nesdev.com runs MySQL 5.1.62, not 5.5.3 or later.
I hereby request that the server administrator do it for me.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Can't post with non-BMP characters such as emoji

Post by tepples »

Today I discovered that MySQL had been upgraded to 5.5.53, but the tables still had not been upgraded from utf8 to utf8mb4.
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Can't post with non-BMP characters such as emoji

Post by koitsu »

tepples, please remember: just because a Unicode glyph exists doesn't mean everyone can view it. Not everyone's devices have every version of Unicode on them. For example, my mobile phone is from 2013 and there are many present-day glyphs that show up as [X]. Battery symbol is Unicode 6.0, so my phone has it, but just because *my* device has it doesn't mean it's wise to use. It also can cause problems when such content is copy-pasted into non-Unicode mediums.

In other words: use of said glyph (vs. an actual word) brings nothing to the table (pun not intended) content-wise. You're just being pedantic / obsessed.

That said: upgrading of the MySQL tables from utf8 to utf8mb4 should probably happen anyways, but you should do some digging to see if there are negative ramifications of that. Examples include: bugs in MySQL client, bugs in MySQL server, performance implications in MySQL server, disk space concerns, incompatibility/breakage with future phpBB upgrades (these upgrades often do ALTER TABLE or other table mangling), and so on. It's always best to follow what the software (phpBB) recommends, even if that means giving up something you personally want. One must be practical.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Can't post with non-BMP characters such as emoji

Post by tepples »

Hopefully some of the performance problems have been worked out in the eight years since utf8mb4 was introduced. When is a full backup scheduled in preparation for an upgrade to MySQL 5.5.3 or later so that the tables can be converted to utf8mb4? Or is phpBB generally structured such that the only way to convert it from utf8 to utf8mb4 is by wiping all posts and starting over?
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Can't post with non-BMP characters such as emoji

Post by tepples »

As of the latest phpBB upgrade, this pig 🐖 can be used in posts but not in word censors. So can the battery 🔋 that started the whole mess.
Post Reply