Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> Of course, we can build tools that make... call it "unsound assumptions"... and I'll happily use them and encourage their use, because you can make the correct judgement call that those assumptions should hold in your context (and that the one causing the assumptions to be broken, if they ever are, is the one "at fault" rather than the tools.)

That's a pretty bad way to standardise a data format IMO. If readers, writers, and tools all want these representations to be equivalent, far better to make that equivalence part of the standard - the point of the standard is to support the use cases, and being able to sensibly reformat HTML is far more valuable than being able to preserve a distinction that doesn't show up in any browser and most writers would never intend anyway.



The need for equivalence, if any, is in the parsing and not the visual presentation. HTML does not consider itself, according to its maintainers, to be a presentation format.


Which makes it decidedly unfortunate that you cannot determine whether a sequence of spaces are collapsible or not without consulting the presentation layer, even though this should be a semantic/parsing question.


As a semantic/parsing consideration the white space is preserved.


The concept "collapsible spaces" is not a part of the HTML format, it is a decision that certain renderers apply, and others may not.


Are there actual renderers that don't, and are there real users who consider that reasonable behaviour? I mean maybe someone somewhere has a spacebar heating workflow that relies on it, but file formats and standards should not add ways to shoot yourself in the foot if they can help it.


As this article shows, browsers actually collapse spaces differently based on the specific CSS applied - and this is in fact intended behavior, not some corner case.

Also, the output of HTML parsers is the HTML structure, and changing that to collapse spaces would break numerous tools. So while probably all HTML renderers do some kind of space collapsing, there are many other uses of HTML parsing that don't. Most likely the syntax highlighting in your HTML editor of choice in fact relies on a space-preserving HTML parser, just for one example.


> browsers actually collapse spaces differently based on the specific CSS applied

Sure. But they do all collapse spaces. I don't think anyone wants their browser to always preserve all the spaces that are in the source.

> and this is in fact intended behavior, not some corner case.

Eh maybe. They collapse the spaces of block elements like block elements and the spaces of inline elements like inline elements; that seems like the obvious thing that your renderer would do if you didn't make any deliberate design decision.

> So while probably all HTML renderers do some kind of space collapsing, there are many other uses of HTML parsing that don't. Most likely the syntax highlighting in your HTML editor of choice in fact relies on a space-preserving HTML parser, just for one example.

I very much doubt it. And even if it did, that would be an incredibly backwards reason to keep that behaviour - "we've spent all this effort working around our bad standard, that would be wasted if we fixed the standard".


Creating parsers which entirely ignore parts of the input is generally a bad idea, because you lose the ability to round-trip. That is, it's often a desirable property to have a way to go text1 -> DOM -> text2, and have text2 be identical to text1, or at least very close to it. This is particularly true for markup languages, which intermix text and tags.


But somehow almost every programming language and data format manages to define these equivalences and have it not ruin their editors. JSON is whitespace-insensitive but syntax highlighting it in my editor works fine; I don't know or care what the parser implementation that accomplishes that is, but it's never caused any problems I've heard of.


I really don't get what you mean. HTML and JSON behave essentially the same way in relation to spaces. It's you who seems to be asking for the HTML parsers to apply display logic in the parsing step. And sure, JSON parsers discard whitespace information outside of JSON strings, but that only works because JSON has an explicit string type. In HTML everything is a user-visible string unless it's a tag, so the same logic fundamentally can't be applied.

In fact JSON is the perfect example - if you have multiple spaces or \n in a JSON string and load that into some DOM element with JS at runtime, those spaces will be eaten up just as much by the browser renderer as any spaces that were part of the original HTML. Because, again, HTML and even the DOM don't do any kind of space collapsing; only the browser render step does that, as instructed by CSS.


> JSON parsers discard whitespace information outside of JSON strings, but that only works because JSON has an explicit string type. In HTML everything is a user-visible string unless it's a tag, so the same logic fundamentally can't be applied.

Well, sure. The point is that's an unfortunate design.


But it's a core part of the concept, the whole idea behind a markup language. Basically the whole point of HTML, and even of SGML before it, is that you are adding annotations in-line in a text, not representing a text as a tree-like data structure, at least for much of it.


Huh?

Loads of HTML has to do with presentation.

b, colspan, map, etc


The b tag has been deprecated for about 20 years. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/b

The colspan attribute is not even a tag and is an artifact of organization, not presentation.

I suggest you read more about what these things are in HTML, the history of HTML, accessibility, CSS, and on and on and on...


The b tag is not and never has been marked deprecated. [1]

Colspan is an HTML attribute, yes. And column-span is a CSS property. HTML has many such duplications.

Might I suggest a healthy dose of your recommendation.

[1] https://html.spec.whatwg.org/multipage/text-level-semantics....




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: