I find myself agreeing with the rejection. As much as I dislike the ridiculous amount of emojis in Unicode, and their increasingly widespread use, they do fit plain text communication, whereas an "external link" symbol does not. My heuristic is something I'll call SMS sniff test: would it make sense to type that symbol into a text message? If it wouldn't, then it doesn't belong in Unicode.
"External link" symbol tells you, "that last bit of differently-formatted text is an active element in this application, leading to an outside resource". It's not something that makes sense in plain text, because any external link in a plain text message is both visible and obviously a link.
Elsewhere in the thread someone mentioned the play/pause/stop symbols from cassette recorders/VCRs. But those have been used culturally as symbols denoting starting, pausing and stopping for decades now, so they're an idea communication tool that makes sense in plain text, and thus pass the SMS sniff test.
(Note that I'm not sure if all symbols in Unicode actually pass the SMS sniff test. I suppose the best option for those wanting "external link" symbol to be included would be to resubmit it as the DOUBLE ARROW POINTING TOP RIGHT OUT OF SQUARE symbol, or something like that.)
I don't really buy this. Google image search "external link symbol", the symbol is just generically useful.
I can buy that the name in unicode should be something other than "external link". It could be something more generic like "external reference". Boom, now it can be accepted because it's not tied to <a>.
The problem with the rejection is that "that last bit of differently-formatted text is an active element in this application, leading to an outside resource" is that it's kind of a strawman: why be so specific just to reject it because you've decided to be so specific?
Though all of that justification is kind of unnecessary. Really, I just find it ridiculous that the unicode group is going to suddenly become prude about a ubiquitous symbol yet have 20 different check mark symbols. Sure, if everyone made the "but there are 20 different check mark symbols so why can't I have <pet symbol>?" argument, unicode would be even more ridiculous. And that's an argument to reject <pet symbol>, not one of the most ubiquitous symbols on the internet.
"Unicode has a bunch of stupid symbols, so it should at least add the most useful symbols that we use on the internet" really is a good argument.
This made me stop and think because I've never really wondered about what belongs and doesn't belong in Unicode, and then I went looking at what strange corners Unicode has and wow the "Miscellaneous Technical" code block is such
a strange thing.
* It has APL symbols (checking the Wiki article on APL syntax and the APL standard, it seems that APL programs can be represented entirely in Unicode - which makes sense, but is still a little surprising).
* There's a benzene code point: ⌬
* "ERASE TO THE LEFT", better known as backspace, is a thing: ⌫
Unicode was designed to unify all existing character sets in use at the time. So there are lots of weird historical things which wouldn't necessarily be accepted in Unicode as new proposals.
If the "external link" symbol had occurred in some legacy character set, then it would have been included automatically.
But since it is a new proposal, it will be validated according to Unicodes policy for new symbols.
That's simple. The justification for most characters, including the exemplary one you mentioned, is that they were already encoded in some other repertoire. The consortium's task is to collect and unify them.
In particular there was a desire to make sure that a round trip from some legacy character set to Unicode and back to the legacy character set wouldn't lose information. So many characters that you would assume to be the same are given multiple code points to make that possible.
That reasoning would make every conceivable icon a valid target for inclusion.
To actually get the character included, you can't just reason that it might be used in a certain way, you will have to demonstrate it with actual and authentic use in the wild. The accepted proposal for the inclusion of the power symbol is a nice example of how this works.
If you can find a bunch of printed manuals or books or websites that actually use this icon in this manner you might be able to submit a successful proposal.
The web is full of those symbols, so I'm sure there are books (or online resources) explaining them. And if not, someone should do something about that.
I'm not opposed to the backspace symbol, but by that logic you could include anything in unicode as anything could a symbol you'd like to refer to.
I mean, you might want to say "External links can be identified by their □ marker" or "The Windows key is marked □ and can be found to the left of the space bar"
Unicode is intended to replace all previous codes for text, so from the outset and almost by definition it has sought to identify things like APL and encode those because if it didn't you still need an encoding for your old APL programs.
that page lists the NEXT PAGE symbol ⎘ and I wonder if that couldn't do as an "external link" symbol in a pinch, seeing as I've never seen it used as a next page symbol before.
Oh also I just wanna note that if we go by a previous convention, there is a perfectly good symbol for "this is an external link": back in the days when Geocities was relevant, it was pretty common to see external links marked by a little Earth gif after them, because it was a link to somewhere else on the World! Wide! Web.
If your browser feels like displaying Wingdings codepoints, something like this: http://egypt.urnash.com
Scripts like Egyptian Hieroglyphs are exactly the purpose of Unicode. But a bunch of other symbols are mostly there because they come from some preexisting character set which Unicode had to include in order to be a viable replacement for legacy character sets.
Yeah a lot of arguments are of the form "when Unicode can include the pile-of-poop-emoji, surely it should include <my favorite symbol>".
But the question is not if the icon is practical or silly, the question is if it makes sense as a Unicode character or should be handled at some other level like a GUI control or widget.
Should GUI icons, widgets and controls be covered by Unicode? That is a can of worms.
I understand the usefulness of the "external link" icon, but should it really be something you arbitrarily type into a text? Shouldn't it be part of the rendering of a link?
As for the pile-of-poop emoji there is a very good reason for it to be included in Unicode: it is one of the original emoji, made by Softbank in 1997.
The reason: Softbank and Docomo both used proprietary emoji sets, widely used in Japan for text messages. Not including these in Unicode would have resulted in a choice: be compatible with the rest of the world, or keep using the emojis that became part of their culture. Not satisfactory. So basically, they had to put emoji in Unicode for the Japanese to use it, and excluding Japan is not really an option if you want a universal standard.
SoftBank actually put emoji into the Private Use Area of Unicode before they were codified (just as they were in an unused area of Shift-JIS before that). The original iPhone 3GS use the PUA method.
The encoding of emoji was in order to unify Softbank/docomo/au under one set so that iPhones/Androids sold by the different carriers could send emoji among each other without relying on email translators (as had been the technical solution with feature phones until then)
It made total sense to include emojis in Unicode initially.
And a great idea to put them outside the 16-bit plane, so platforms would be forced to support characters beyond the BMP because of the public demand for emojis.
But I think going forward, emojis should be handled outside of Unicode. They are really small illustrations rather than characters or symbols.
Thus the inherent contradiction: Unicode is for text, it must include all existing character sets which it aims to unify, and previous character sets weren’t only used for text.
The obvious hack is to make a custom character set called “Rejected by Unicode”.
It would probably be easier just to design a png or svg icon for the UI widgets you need. Then you don't have to lobby a consortium and wait for them to accept your icon.
Yeah, it was basically a hack to extend the character repertoire in the time of 8-bit character sets. The kind of shenanigans Unicode was designed to replace.
You still see this kind of mischief when Outlook users type a smiley, and it is rendered as a "J" with the wingdings font. Readers of the mail where styling is stripped or who haven't the font installed just sees a confusing "J".
Unicode is full of whole blocks of weird sets of lines and borders. As far as I’m concerned adding the symbol just to make it easier to style links makes total sense.
Being pedantic, those box drawing characters are neither ASCII nor ANSI. They are part of the original IBM PC character sets. Windows refers to them as the OEM character sets.
> My heuristic is something I'll call SMS sniff test: would it make sense to type that symbol into a text message? If it wouldn't, then it doesn't belong in Unicode.
I accept that heuristic, and find myself agreeing with it for the purposes of the external link character. Let me try to convince you:
This is a real conversation:
A> Are you familiar with Homebrew? You can get it from brew.sh.
B> Ok. Where is brew.sh?
If we were not limited by plain text, we could use colour and underlining to distinguish that as a domain name. A lot of websites find this poor for accessibility reasons so they use the link character.
Another way to think about this is to imagine the link-character is pronounced "https://" as in:
* Are you familiar with Homebrew? You can get it from https://brew.sh.
* Are you familiar with Homebrew? You can get it from {link character} brew.sh.
Isn't that plain text? I'm not sure how to pronounce it, but there's a lot of emoji I don't know how to pronounce. Do you think this is important?
So if you get that far, how much further is this really?
* Are you familiar with Homebrew? You can get it from brew.sh {link character}.
There is no squiggle that I can not hypothesize some way of using in a text message. The question is more, is there already some existing demand for it, before you asked the question? To which the answer would seem to be "no".
That's a contingent answer. If you, say, set out to try to convince the world to use it that way, and created that demand, then by golly, that demand would exist at that point and the answer could change. But hypothesizing some possible text message that could use it isn't strong enough to argue for inclusion in the standard because every possible proposal passes that test.
That sounds like an issue that could be solved by updating the renderer's gtld list with the various new ones that have been coming out instead of adding invisible meta characters.
Phones already do it for .com, .gov, etc, but our cutesy .sh, .dev, .rocks, .xyz, etc will take a bit to catch up.
> That sounds like an issue that could be solved by updating the renderer's gtld list with the various new ones that have been coming out instead of adding invisible meta characters.
The request is for a code point to represent the visible external link character, not for an invisible control-code to decorate some structured data (which cannot "appear" on paper).
The use of symbols in text messages is one reason to include them in Unicode, but not the only one. Another is accessibility, particularly for blind people. The more commonly used symbols are in Unicode, the less we have to remind website authors to include alt text, or that the symbols in their custom icon font pose an accessibility problem. Also, co-opting an existing symbol, such as the degrees symbol (which I've observed some websites do for external links), is confusing when using a screen reader.
Wouldn't a screen-reader recognize a link even without the symbol? I can see the problem of the screen-reader telling the user that there's some garbage image at the end of the link, but the image really is garbage in that case, since I imagine the screen-reader would make it obvious that there's a link there.
Absolutely. But if a web author quite reasonably adds a symbol for the benefit of the sighted majority, using an unlabeled image, an icon font, or coopting an existing symbol (e.g. degrees), that leads to extra clutter or confusion for blind users. If the symbol is in Unicode, we can have a solution that works well for everyone.
Right. And I've come across websites where the alt text for their external-link image is something fairly verbose like "external link opens in new window". Now, imagine if instead, we had a standard symbol, and screen readers could map it to short sound effect that users would come to recognize.
My heuristic is something I'll call SMS sniff test...
That's certainly a useful sniff test, and I think you've made a good case with it. I wonder, though, whether one should also consider a "documentation sniff test", asking: would it make sense to type that symbol into some documentation one is writing about how to use, for example, some software? I think the answer to that question is a clear yes: it might make sense to put this symbol onto such a page, and although being on inert paper (it wouldn't also be an active element) it would certainly be a useful symbol in that hypothetical text.
In an RTL language, you don't even need to mirror it.
Or, to put it another way: if you think it needs to point to the right, is that because the text you read flows that way?
This then makes me wonder if the large portion of the world's population that read right-to-left find the currently widely-used external link symbol (as discussed in the article) a bit jarring.
> makes me wonder if the large portion of the world's population that read right-to-left find the currently widely-used external link symbol (as discussed in the article) a bit jarring
I'm not a regular user of SMS, but it seems to me like adding links to SMS messages is either already available or not a bad idea. The "SMS sniff test" seems pretty weak to me. A better one would be a printed document.
You can put a URL into an SMS message, but you can't put an anchor tag (link) which is comprised of [a hidden URL and visible link text]. The latter is the only one of these that benefits from the symbol in question, because the external nature of the hidden URL is disguised. The former doesn't benefit because the external nature of the URL is obvious.
Modern SMS readers will linkify the URL, but will set the visible text equal to the URL, so again it wouldn't benefit from the symbol.
This explanation mirrors the rejection rationale: you don't need a symbol to be a Unicode character if the document is already rich text; the symbol can be an image instead. If you have anchor tags, you probably also have image tags or CSS.
> This explanation mirrors the rejection rationale: you don't need a symbol to be a Unicode character if the document is already rich text;
No, that's not the rejection rationale. The rejection rationale isn't about having access to image tags or CSS. It's about hypertext (click link, go to another text) vs having text (can't click link, no other text). If that was the rationale, they wouldn't allow emojis in unicode, because you could insert them as images.
Sorry, let me rephrase in the way that I did in another comment: it seems that to be a codepoint, it needs to be useful in plain text scenarios.
Emoji are useful (arguably I guess) in plain text scenarios because they exist to convey additional information about an author's emotion, and that author might use plain text. The external link symbol is not useful in plain text scenarios because it exists to convey additional information about an author's preceeding hypertext, and that author isn't using hypertext.
I thought about SMS messages because they're plaintext. Printed documents are raster graphics; once the ink hits the paper, you don't have text anymore.
Unicode has a lot of old "markup" symbols in typography, like the right pointy finger ([1], example in [2]). I see the external link symbol as a widely-used markup symbol of the current age, and it could reasonably be in Unicode.
It serves the same function as footnote daggers. The Unicode consortium has odd priorities when they will happily add hundreds of emojis but more generally useful things are held in disregard.
On the other hand, I used to use both "play" and "link" symbols in past resumes, where I could have really used a proper link symbol. Ideally I wanted something universally known, but the box with an arrow that Wikipedia used wasn't as well known then, and was always a small image and not available as a character. I opted for the interlinked chain links symbol instead.
I would propose updating this SMS sniff test to a Wikipedia sniff test.
FWIW, interlinked chain links symbol is in the Unicode. And it's been the widely used symbol for hyperlink. The box-with-arrow one is a symbol for external link, which is a concept that matters only in few contexts, and which was usually displayed with a globe icon (also in Unicode).
I explicitly don't want it to be "Wikipedia sniff test", because that's equivalent to a "Webpage sniff test", which is equivalent to "anything goes", because icon fonts exist now and are used.
That said, other comments made me realize that Unicode is better described as "whatever was there in all the codepages around the world at the time of bootstrapping the Unicode standard" plus what's typically used in a written language, plus SMS sniff test.
I don’t think it does. On HN you can’t have links with custom text so an external link symbol would be pointless but seeing a dagger or [1] is reasonable. Similarly I don’t think seeing it in an SMS is unimaginable so by the standard of the parent comment I don’t think a dagger is the same as an external link symbol.
Plain text environments never encounter the problem that this symbol solves, which is to reveal that an anchor tag (which is not plain text, and typically has visible text that isn't the URL) leads to some other authority.
But for a text mode interface that does have full blown anchor tags -- not very mainstream -- you have a point.
In Reddit, you [link via Markdown](https://en.wikipedia.org/wiki/Markdown). The site renders this its own way, so it can add a "external link" character if needed - and that character is a part of the user interface; a side channel, not a main signal. E.g. copying from a Reddit post, you wouldn't want to find that character in the pasted text.
"External link" symbol tells you, "that last bit of differently-formatted text is an active element in this application, leading to an outside resource". It's not something that makes sense in plain text, because any external link in a plain text message is both visible and obviously a link.
Elsewhere in the thread someone mentioned the play/pause/stop symbols from cassette recorders/VCRs. But those have been used culturally as symbols denoting starting, pausing and stopping for decades now, so they're an idea communication tool that makes sense in plain text, and thus pass the SMS sniff test.
(Note that I'm not sure if all symbols in Unicode actually pass the SMS sniff test. I suppose the best option for those wanting "external link" symbol to be included would be to resubmit it as the DOUBLE ARROW POINTING TOP RIGHT OUT OF SQUARE symbol, or something like that.)