Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

This made me stop and think because I've never really wondered about what belongs and doesn't belong in Unicode, and then I went looking at what strange corners Unicode has and wow the "Miscellaneous Technical" code block is such a strange thing.

* It has APL symbols (checking the Wiki article on APL syntax and the APL standard, it seems that APL programs can be represented entirely in Unicode - which makes sense, but is still a little surprising).

* There's a benzene code point: ⌬

* "ERASE TO THE LEFT", better known as backspace, is a thing: ⌫

...and a little more bizarro: https://en.wikipedia.org/wiki/Miscellaneous_Technical



Unicode was designed to unify all existing character sets in use at the time. So there are lots of weird historical things which wouldn't necessarily be accepted in Unicode as new proposals.

If the "external link" symbol had occurred in some legacy character set, then it would have been included automatically.

But since it is a new proposal, it will be validated according to Unicodes policy for new symbols.


> what belongs and doesn't belong in Unicode

That's simple. The justification for most characters, including the exemplary one you mentioned, is that they were already encoded in some other repertoire. The consortium's task is to collect and unify them.


In particular there was a desire to make sure that a round trip from some legacy character set to Unicode and back to the legacy character set wouldn't lose information. So many characters that you would assume to be the same are given multiple code points to make that possible.



APL is a means of communicating mathematical ideas between people, so it shouldn't be too surprising that it's in Unicode.


The backspace symbol makes perfect sense. You may very well want to say "Press ⌫ to delete the selected object".


But you may also say "Next to the link you will see [EXTERNAL LINK] which indicates that the link leads to another website."


I guess that is a good point, but I guess that the backspace symbol gets a bit of a free ride by literally being on your keyboard.


That reasoning would make every conceivable icon a valid target for inclusion.

To actually get the character included, you can't just reason that it might be used in a certain way, you will have to demonstrate it with actual and authentic use in the wild. The accepted proposal for the inclusion of the power symbol is a nice example of how this works.

If you can find a bunch of printed manuals or books or websites that actually use this icon in this manner you might be able to submit a successful proposal.


The web is full of those symbols, so I'm sure there are books (or online resources) explaining them. And if not, someone should do something about that.


I'm not opposed to the backspace symbol, but by that logic you could include anything in unicode as anything could a symbol you'd like to refer to.

I mean, you might want to say "External links can be identified by their □ marker" or "The Windows key is marked □ and can be found to the left of the space bar"


Unicode is intended to replace all previous codes for text, so from the outset and almost by definition it has sought to identify things like APL and encode those because if it didn't you still need an encoding for your old APL programs.


I made this thing once upon a time to browse all the symbols http://tingletech.github.io/unicodetoy/


that page lists the NEXT PAGE symbol ⎘ and I wonder if that couldn't do as an "external link" symbol in a pinch, seeing as I've never seen it used as a next page symbol before.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: