As much as i would like to see comments in json: if we start throwing around jso...

coldtea · on March 2, 2014

>Are we seriously going to keep insisting on json as a configuration format?

Yes. It has good universal support, often without needing any libraries, it's simple, succint, and has good tooling.

>As Stormbrew already pointed out, we already have a format that is ideal for configurations (and sure, data exchange, why not), and it is called yaml.

Let's just not go there. YAML is a pain in the ass to parse, has different incompatible versions, the libraries are of widely varying quality, is not natively (without third party stuff) supported in most languages, and it's generally a mess.

clarkevans · on March 2, 2014

There are lots of problems with YAML; it does too much. If I had time, I'd definitely want to do a 2.0 that gives it a small haircut removing the most bothersome of problems. I've been unable find time for work required: about a year of discussing, writing, coding, testing, packaging, and forging consensus.

What I'd keep in YAML is the information model. When we started YAML ~12 years ago, it was obvious that configuration should be in XML, and that XML's information model was the correct way to organize data structures. Part of YAML's work was explaining a different way of doing things to those who'd otherwise use XML. This isn't a concern these days...

That said, the productions look painful because the specification doesn't separate the scanner from the parser. Once you do that, the syntax is quite a bit more sane to grok (see PyYAML source). It's not nearly as bad as what you may think... IF you see it this way.

Besides a few unfortunate syntax structures, YAML's complexity and sharp edges comes from it's venture into typed objects, type-spaces, and implicit typing. Much of this, for configuration files, is unnecessary.

When we wrote YAML, we only had a few years experience with it; and well, it wasn't done as a full time endeavour. It was a guess as to how things should work. It wasn't easy to bootstrap YAML. It's now ten years later... and, well, lots of people have experience with it. It's probably time for the haircut.

jrochkind1 · on March 2, 2014

oh, please do this.

I love yaml -- except on the rare by very painful occasions where I get hugely bitten by incredibly weird problems arising from unexpected interplay of yaml features.

Some of the originators of yaml are probably the only people who the social power to promulgate a revision with a haircut. That would be awesome.

lifeisstillgood · on March 2, 2014

So if there were people willing to take on the lions share of the years work of consensus forming, what would it take to get you on board to YAML-haircut? What forms of decision making would make it attractive to you?

clarkevans · on March 2, 2014

That's a great question; those who only participate at the periphery can hardly expect significant influence.

lifeisstillgood · on March 2, 2014

Perhaps a different question then - if such a group were to form whose consensus would they need to pull in, what rough roadmap would you suggest they follow. What, in short, is going to hurt and when should they duck?

coldtea · on March 2, 2014

Thanks for the response.

I like the promise of YAML, but having tried it a could of projects, found all these issues which prevent it to be a simple, turn-key solution that JSON can be (at least for simple needs).

How do you feel about TOML? I find it a sane compromise, at least for congiguration file needs.

clarkevans · on March 3, 2014

YAML uses dash (-) and colon (:) and white-space for a reason, it's not a random selection of structural markers.

burntsushi · on March 2, 2014

How about TOML? There was an implementation for most languages under the sun within a week of its release.[1]

(Certainly I acknowledge that it is not as ubiquitous as JSON or YAML, but I do like TOML better than both of those for configuration.)

[1] - https://github.com/mojombo/toml#implementations

comex · on March 2, 2014

YAML looks nice, but it's very overcomplicated for what it's usually used for [1]. Nobody wants to figure out what %TAG directives or "|", "|-", and "|+" at the end of lines mean, the difference between the folded style and the literal style, etc. just to read a configuration file.

I don't like JSON either due to the comment issue; simple ad-hoc configuration formats like most C programs seem to have mostly work, but aren't as nice as a standard format. If anything, I like configuration files expressed as scripts in whatever language the program is written in, since they're very flexible (if I want 100 almost-identical entries for whatever reason, I can say so in the file rather than writing a separate generator), and while programming languages are complicated, people tend to already know them; but that does tie you to a specific language.

[1] http://www.yaml.org/spec/1.2/spec.html

stormbrew · on March 2, 2014

I think this is an entirely valid criticism of yaml. I'd absolutely support there being a simplified form of yaml (YAML The Good Parts?) that covers what people actually want to do and doesn't try to be a swiss army knife object serialization format.

Ygg2 · on March 2, 2014

Problem is people don't agree on Good Parts. People seem to think keeping YAML a superset of JSON is the good part. I'd think otherwise.

Anyway, http://ogdl.org/ is one candidate for YTGP (YAML The good parts) but it's comments can carry metadata, which is a huge turn-off, others think TOML (https://github.com/mojombo/toml) is a good replacement, but it has no support for alternate number types. You write in something like

   mask = 0xDEADBEEF

into something like

   mask = 3735928559

philwelch · on March 2, 2014

Ideally you would have a strict subset of YAML rather than a totally different language for compatibility reasons. You can call it the Friendly, Readable, Declarative YAML standard so we can have a standards war that is FRDY vs. JSON.

Ygg2 · on March 2, 2014

That movie was horrible :P Would not like to watch again.

stormbrew · on March 2, 2014

Hah. I love it.

georgewfraser · on March 1, 2014

Json originally had comments, they were intentionally removed:

https://plus.google.com/app/basic/stream/z12ztpczbxrdglfgl04...

swift · on March 2, 2014

Crockford's explanation is pretty absurd, though.

nitrogen · on March 2, 2014

It makes perfect sense -- if you provide a freeform content section of an otherwise strictly formed document (for interoperability reasons), then people will abuse it to store arbitrary, uninteroperable data (as was seen with binary blobs being dumped into XML).

The point of a standardized serialization format is well-defined parsing semantics and universal interoperability.

coldtea · on March 2, 2014

>It makes perfect sense -- if you provide a freeform content section of an otherwise strictly formed document (for interoperability reasons), then people will abuse it to store arbitrary, uninteroperable data (as was seen with binary blobs being dumped into XML).

That will then be their own bloody problem, not Crockfords.

makomk · on March 2, 2014

Except that in practice, this has just meant that people have defined their own ad-hoc extensions to JSON that add support for comments, since it's so useful for stuff like JSON-formatted config files.

TazeTSchnitzel · on March 2, 2014

It's also meant that JSON documents don't have data encoded in comments. Working as intended.

einhverfr · on March 2, 2014

Indeed. I see SQL comments and Postgres COMMENT ON statements used to send information to applications. Really funny, that....

troels · on March 2, 2014

And you can't base64 encode it and store it in a string?

rat87 · on March 2, 2014

I thought xml was developed to be able to include binary parts on purpose since it can be useful.

balls187 · on March 2, 2014

How so?

Also "Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser."

Seems like a perfectly fine way to have comments (if you absolutely need them) in a production environment.

DougBTX · on March 2, 2014

Presumably JSMin removes optional quotes in object laterals, so it probably doesn't output valid JSON.

spellboots · on March 2, 2014

That would be a pretty big mistake for the author of both the JSON spec and JSMin to make? Maybe it is but it seems unlikely.

reverius42 · on March 2, 2014

JSMin is not the right tool for this. I'm sure it conforms to the JavaScript (ECMAScript) spec but probably not the JSON spec. Here's a trivial JavaScript function to convert JSON5 to regular JSON with no comments and quoted identifiers and all that good stuff:

function JSON5_to_JSON(str) { return JSON5.parse(str).stringify(); }

This is exactly what is suggested in the Usage section of the linked article.

spellboots · on March 2, 2014

If the author of the JSON spec and JSMin says it's the right tool for the job, I am inclined to trust him on that barring further evidence that it is not...

reverius42 · on March 8, 2014

Since when does Crockford say that JSMin is the right tool for this particular job: translating JSON5 into regular JSON?

jrpt · on March 2, 2014

Yeah but lots of people disagree with his decision. Lack of comments makes it worse for config files.

X-Istence · on March 1, 2014

YAML is annoying, can't use tabs, you have to use spaces... and most yaml parses if they see a tab don't warn you or anything.

Highly annoying for configuration files.

coherentpony · on March 2, 2014

> YAML is annoying, can't use tabs, you have to use spaces

Are you serious? Would you put tabs in source code too?

> and most yaml parses if they see a tab don't warn you or anything

If they warned you, you still can't use tabs it's just that you're more aware that you can't use them.

It's not entirely clear to me what your point is. Perhaps it's that tabs is your preference. Unfortunately, spaces are the preferred whitespace marker for 99% of programmers.

Also, pulled directly from the YAML FAQ [1]:

    Why does YAML forbid tabs?

    Tabs have been outlawed since they are treated differently by different editors and tools.
    And since indentation is so critical to proper interpretation of YAML, this issue is just too
    tricky to even attempt.  Indeed Guido van Rossum of Python has acknowledged that allowing TABs
    in Python source is a headache for many people and that were he to design Python again, he
    would forbid them.

[1]: http://www.yaml.org/faq.html

coldtea · on March 2, 2014

>Are you serious? Would you put tabs in source code too?

Of course. Why the fuck wouldn't I put tabs in source code?

We INDENT the code pressing tab. We don't press 4 spaces. Why shouldn't the code reflect that? And tabs are symbolic (logical entities), so they are customizable.

You suggest we'd rather use the elaborate kludges to handle spaces as tabs in the editor? What year is this? 1978?

collyw · on March 2, 2014

Because tabs are a pain in the arse when you open them in a different editor and neat code suddenly becomes unreadable as the indentation gets messed up.

Most editors have a "tab button prints spaces" option that isn't too difficult to find. (Plenty of them have a format code option as well, so it isn't such a big deal, but overall I find it easier just to switch everything to spaces.)

coldtea · on March 3, 2014

>Because tabs are a pain in the arse when you open them in a different editor and neat code suddenly becomes unreadable as the indentation gets messed up.

I never had that experience, and I've used Vim, Eclipse, ST3, TextMate and BBEdit. How does that ever happen?

A tab is a tab, no matter the editor. One might be set to show it as 8 chars wide or 4 chars wide etc (since it's a logical unit), but no indentation gets "messed up".

If by identation you mean: "variables arranged to start at the same point because the programmer has OCD", maybe. But no declarations or indentation that matters, like braces etc ever changes.

>Most editors have a "tab button prints spaces" option

Yes. The back-to-the-seventies elaborate kludge I've already mentioned. It's 2014.

collyw · on March 3, 2014

It happens when you open up code written by others. That's when I usually see it.

In that case you would have to set your editor to display 8 spaces per tab, or 4 or whatever the code editor where the code was written was using. I don't see that being any less of a kludge than changing it to do spaces when a tab is pressed.

I have given you a reason why I prefer spaces over tabs. Your reasoning seems to be "because I can". Then start harping back to the 70's despite that part of your comment being irrelevant.

Is there an actual reason you prefer tabs over spaces, given that you can't visually tell the difference most of the time?

collyw · on March 3, 2014

Also as I mainly use Python, it is the preferred way to indent.

http://legacy.python.org/dev/peps/pep-0008/#tabs-or-spaces

raverbashing · on March 2, 2014

"spaces are the preferred whitespace marker for 99% of programmers."

99% of which programmers?

The Linux Kernel uses tabs for indenting

eq- · on March 2, 2014

Well, I use spaces for marking whitespace in many cases. Just not for indenting, that's what tabs are for :).

stormbrew · on March 2, 2014

I think you might overstate the percentage here, but I think the overall sentiment is true. But I've always thought a better approach would be to ban combining leading tabs with leading spaces at the file level and leaving the choice up to the user beyond that. I believe python3 actually takes this approach.

But I am glad that YAML made any choice at all. Allowing mixing is absolutely the worst possible option.

GeneralMayhem · on March 2, 2014

ahem

http://www.emacswiki.org/SmartTabs

stormbrew · on March 2, 2014

It's a lovely theory, but put into practice it's a lot of work to maintain, especially with more than one editor. I was a proponent of this concept for a long time, but it's just not worth it [1]. If you want to be able to align the leading edge to something other than a tab stop you are better off from a practical perspective to just enforce the use of spaces.

[1] Think in particular of when code moves around. Sometimes it moves to a place where the indent level is different but the total number of expanded spaces is the same. This looks fine until you look in another editor.

GeneralMayhem · on March 3, 2014

>Sometimes it moves to a place where the indent level is different but the total number of expanded spaces is the same.

This is exactly why Smart Tabs is the only intelligent way to do it - if you change the width of a tab, everything resizes instantly to match the current user's aesthetics, but things that need to be aligned stay aligned. The only case in which it breaks is if you switch to a non-monospaced font, and in that case, God help you.

eq- · on March 2, 2014

Just use a capable editor. It's 2014, for heaven's sake!

X-Istence · on March 2, 2014

The problem occurs when you are SSH'ed to a machine you are setting up, or one that is customer owned, you use whatever editor is around to make a quick edit, not realising that the file labeled .conf is actually a YAML file, you hit tab because it makes logical sense and then shit breaks.

No thanks, YAML is a terrible choice for configuration files.

GeneralMayhem · on March 3, 2014

If you're connecting to a machine over SSH and you're making enough changes that it's going to be difficult to maintain an indentation scheme, open it up on a local editor with SFTP. Emacs supports it out of the box, and Sublime has an excellent nagware SFTP package in the standard repos.

derefr · on March 2, 2014

All YAML files are required to begin with "---". If you're aware of what YAML is, and you've seen YAML ebfore, and yet you somehow "aren't aware" that something is a YAML file, you're incompetent.

X-Istence · on March 2, 2014

There is no requirement for a YAML file to begin with "---".

Look at the database.yml files that come with Rails applications, or look at the +MANIFEST file that pkgng requires on FreeBSD.

Those are yaml files, no triple dashes.

Calling me incompetent isn't helping your case either.

cben · on March 7, 2014

Which is precisely why tabs-only is a problem: tabs cannot express character-granularity alignment, so you HAVE to mix them with spaces in this way. (Or give up pretty alignment in favor of fixed indentation for continued expressions, which I'm not willing.)

Spaces just get out of my way.

P.S. I'm undoubtly biased by Python. The community's agreement on spaces and specifically 4 spaces has been a pure blessing.

X-Istence · on March 2, 2014

Because not every editor I use will automatically turn the tab into spaces. This is especially a problem when I am editing a YAML file over SSH using vi or nano, or pico or thousands of other editors.

They stick the damn tab in, then the program that is parsing said YAML file does the wrong thing and I am confused, annoyed, and most times really pissed off because I will spend hours trying to figure out what went wrong.

YAML is stupid for configuration files.

Dewie · on March 2, 2014

Tabs vs. spaces... in some respects programming is so incredibly primitive.

vithlani · on March 4, 2014

So are "seeing" and "hearing".

I for one am looking forward to the time when neural implants will translate my high level programming thoughts into microcode sent directly to the CPU for execution.

ballard · on March 2, 2014

Yeap, opposite of makefiles. I opt for TOML. YAML is great for fixtures.

wisty · on March 2, 2014

Or you can do this:

config = { "version": "1.0", "comment": "JSON has comments too!" }

jawngee · on March 2, 2014

I hope you're joking.

kgabis · on March 2, 2014

Why? It's a valid solution.

lowboy · on March 2, 2014

Mixing metadata in with data isn't ideal.

yxhuvud · on March 2, 2014

Mixing metadata with a serialization format isn't ideal. Want metadata? Use a markup language.

coldtea · on March 2, 2014

Actually it is. Then you can treat them the same way.

And since it's a configuration format, you know what keys are accepted and what keys are for the metadata, and thus won't clash.

zimbatm · on March 2, 2014

But metadata is also data.

arunoda · on March 2, 2014

this is what is recommended for package.json in node. Specially "//" as the key.

mpyne · on March 3, 2014

That's what we did for some of our KDE build infrastructure metadata, since there was no way I was using YAML if I could avoid it for something this simple.

harshreality · on March 2, 2014

> As Stormbrew already pointed out, we already have a format that is ideal for configurations (and sure, data exchange, why not), and it is called yaml.

Unfortunately YAML for untrusted input and data exchange is unsafe by default, depending on the language and implementation. A flag might need to be set, or extra modules included like SafeYAML[1] to keep Yaml from instantiating arbitrary objects.

[1] https://github.com/dtao/safe_yaml

rat87 · on March 2, 2014

I thought the problem wasn't with yaml but with allowing deserialize arbitrary objects which is unsafe by default for a format used both for 'trusted' and 'untrusted' input, If you have a json library which tries to allow deserializing arbitrary objects by default (with a load rather then unsafe_load method). Python's pickle serialization is unsafe but it warns you that its unsafe and is not widely used leading to it not being used as a serialization format for for unsafe input.

crdoconnor · on March 2, 2014

>The problem with YAML is it's not safe by default

Why would that be an issue when using it as a configuration format?

coldtea · on March 2, 2014

Because the parsers for that configuration format are unsafe too? Duh!

alptrv · on March 1, 2014

I see this comment every time for many of years in almost every discussion about using JSON for configuration and while YAML is certainly used by many projects most of the people still continue to use JSON and I think that's because YAML sometimes feels like Scala of serialization markups when people just want something like Python. I personally think that TOML[0] is not only more simple but also as easy to read as YAML and we use it in our projects without any issues.

0. https://github.com/mojombo/toml

stormbrew · on March 2, 2014

TOML is ok too, though I still like YAML better personally. I honestly didn't really like ini files much at any point, even when everything on windows used them, so TOML starts from an unhappy premise for me.

derefr · on March 1, 2014

If web browsers had native YAML parsing, we probably wouldn't need JSON5. Web browsers aren't going to get native YAML parsing.

stormbrew · on March 1, 2014

They probably aren't going to get native json5 parsing either (except in the sense that you can do something stupid like eval it). That said, I don't think there's any particular need for yaml in the browser. Browser code is usually dealing with machine-generated data, where even normal json is just fine.

derefr · on March 1, 2014

Rich client-side apps need configuration files too.

yeukhon · on March 1, 2014

If this was merely a configuration (say for node.js) which sits on the server, then you probably just import a npm to read yaml (I don't write node.js so I don't know if using .yaml is feasible as node.js config file or not). So the use case is limited.

I used to work on a project which users could edit a configuration file through a web editor and we chose YAML because writing JSON by hand is painful (I hate the comma error!). But we processed this YAML file for the user on the server side, so having a native YAML parser in browser and Javascript wouldn't really help me at all.

derefr · on March 1, 2014

No, I'm talking about configuration files that get interpreted by code that is executing within the browser.

For example, a configuration-data file format for specifying a "brush" in a JS-client paint program. That'd obviously be a schema on top of JSON, right? Well, now you've got all of JSON's inherent limitations.

j03w · on March 2, 2014

Then you don't need JSON, all you need is the plain old JS object.

derefr · on March 2, 2014

You're thinking of the live representation of the "model" of a brush in the program. I'm talking about the "definition" of a brush, from which the program loads that model. Another example would be, say, a "level" in an HTML5 game. These things ship alongside the game as blobs of data. Those blobs need a format that the browser can parse. Currently, JSON is that format, and it's inadequate for that.

novaleaf · on March 3, 2014

don't need native json5 parsing. just load the lib up. the problem is yaml is there isn't any safe+browser version that I know of.

drdaeman · on March 1, 2014

> Web browsers aren't going to get native YAML parsing.

Because of what?

SimHacker · on March 2, 2014

YAML failed the intelligence test of getting its name right in the first place: "Yet Another Markup Language". YAML is NOT a markup language, in any way shape or form, yet the people who designed and named it earnestly thought they were designing a markup language, and that there was a need for yet another one.

Only when somebody pointed out that obvious fact to them, did they come up with a recursive retronym to paper over their initial stupidity: "YAML Ain't Markup Language". How clever by half.

I prefer to use formats that were designed by people who actually knew what they were doing and what it was called and how it was meant to be used.

daGrevis · on March 2, 2014

We tried to use YAML at first, but problems with data types (can't correctly remember what, but there was no way to force something to be something) made us to rewrite our test fixtures to JSON. The only problem with JSON for us is that it doesn't support any comments, but JSON5 seems to fix it.

There's a interesting format for configuration called TOML[1], you should check it out!

[1] https://github.com/mojombo/toml

kazagistar · on March 2, 2014

JSON Schema has a ton of implementations and tools hanging off of it. It is possible to load and validate a config, and then display a reasonably nice UI for editing it (in such a way that the resulting state is also valid), all by creating a single declarative JSON Schema. Nothing comparable exists for YAML as far as I can tell, and by virtue of its complexity, it is unlikely to exist for quite some time.

malkia · on March 1, 2014

Whitespaces. These are the real killers.

SixSigma · on March 2, 2014

s-expressions