As much as i would like to see comments in json:
if we start throwing around json files that area not really json, but we call them json, (at least in everyday talk), we will end up breaking more apps then we fix.
Maybe the question is instead; why the hell do we need comments (and loosening of t he syntax, etc) in the first place?
Are we seriously going to keep insisting on json as a configuration format?
As Stormbrew already pointed out, we already have a format that is ideal for configurations (and sure, data exchange, why not), and it is called yaml.
yaml have comments
yaml makes it easy to enter multiline strings
and most if all; yaml is very very easy to write!
tl;dr: Just use a format suited for your needs instead of trying to change something that doesn't. Oh, and a couple of smiley faces thrown in there to ensure people don't read this in the wrong tone. People do that.. Like, all the time.. damn, now my tl;dr is too damn long! i have to add another.
>Are we seriously going to keep insisting on json as a configuration format?
Yes. It has good universal support, often without needing any libraries, it's simple, succint, and has good tooling.
>As Stormbrew already pointed out, we already have a format that is ideal for configurations (and sure, data exchange, why not), and it is called yaml.
Let's just not go there. YAML is a pain in the ass to parse, has different incompatible versions, the libraries are of widely varying quality, is not natively (without third party stuff) supported in most languages, and it's generally a mess.
There are lots of problems with YAML; it does too much. If I had time, I'd definitely want to do a 2.0 that gives it a small haircut removing the most bothersome of problems. I've been unable find time for work required: about a year of discussing, writing, coding, testing, packaging, and forging consensus.
What I'd keep in YAML is the information model. When we started YAML ~12 years ago, it was obvious that configuration should be in XML, and that XML's information model was the correct way to organize data structures. Part of YAML's work was explaining a different way of doing things to those who'd otherwise use XML. This isn't a concern these days...
That said, the productions look painful because the specification doesn't separate the scanner from the parser. Once you do that, the syntax is quite a bit more sane to grok (see PyYAML source). It's not nearly as bad as what you may think... IF you see it this way.
Besides a few unfortunate syntax structures, YAML's complexity and sharp edges comes from it's venture into typed objects, type-spaces, and implicit typing. Much of this, for configuration files, is unnecessary.
When we wrote YAML, we only had a few years experience with it; and well, it wasn't done as a full time endeavour. It was a guess as to how things should work. It wasn't easy to bootstrap YAML. It's now ten years later... and, well, lots of people have experience with it. It's probably time for the haircut.
I love yaml -- except on the rare by very painful occasions where I get hugely bitten by incredibly weird problems arising from unexpected interplay of yaml features.
Some of the originators of yaml are probably the only people who the social power to promulgate a revision with a haircut. That would be awesome.
So if there were people willing to take on the lions share of the years work of consensus forming, what would it take to get you on board to YAML-haircut? What forms of decision making would make it attractive to you?
Perhaps a different question then - if such a group were to form whose consensus would they need to pull in, what rough roadmap would you suggest they follow. What, in short, is going to hurt and when should they duck?
I like the promise of YAML, but having tried it a could of projects, found all these issues which prevent it to be a simple, turn-key solution that JSON can be (at least for simple needs).
How do you feel about TOML? I find it a sane compromise, at least for congiguration file needs.
YAML looks nice, but it's very overcomplicated for what it's usually used for [1]. Nobody wants to figure out what %TAG directives or "|", "|-", and "|+" at the end of lines mean, the difference between the folded style and the literal style, etc. just to read a configuration file.
I don't like JSON either due to the comment issue; simple ad-hoc configuration formats like most C programs seem to have mostly work, but aren't as nice as a standard format. If anything, I like configuration files expressed as scripts in whatever language the program is written in, since they're very flexible (if I want 100 almost-identical entries for whatever reason, I can say so in the file rather than writing a separate generator), and while programming languages are complicated, people tend to already know them; but that does tie you to a specific language.
I think this is an entirely valid criticism of yaml. I'd absolutely support there being a simplified form of yaml (YAML The Good Parts?) that covers what people actually want to do and doesn't try to be a swiss army knife object serialization format.
Problem is people don't agree on Good Parts. People seem to think keeping YAML a superset of JSON is the good part. I'd think otherwise.
Anyway, http://ogdl.org/ is one candidate for YTGP (YAML The good parts) but it's comments can carry metadata, which is a huge turn-off, others think TOML (https://github.com/mojombo/toml) is a good replacement, but it has no support for alternate number types. You write in something like
Ideally you would have a strict subset of YAML rather than a totally different language for compatibility reasons. You can call it the Friendly, Readable, Declarative YAML standard so we can have a standards war that is FRDY vs. JSON.
It makes perfect sense -- if you provide a freeform content section of an otherwise strictly formed document (for interoperability reasons), then people will abuse it to store arbitrary, uninteroperable data (as was seen with binary blobs being dumped into XML).
The point of a standardized serialization format is well-defined parsing semantics and universal interoperability.
>It makes perfect sense -- if you provide a freeform content section of an otherwise strictly formed document (for interoperability reasons), then people will abuse it to store arbitrary, uninteroperable data (as was seen with binary blobs being dumped into XML).
That will then be their own bloody problem, not Crockfords.
Except that in practice, this has just meant that people have defined their own ad-hoc extensions to JSON that add support for comments, since it's so useful for stuff like JSON-formatted config files.
Also "Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser."
Seems like a perfectly fine way to have comments (if you absolutely need them) in a production environment.
JSMin is not the right tool for this. I'm sure it conforms to the JavaScript (ECMAScript) spec but probably not the JSON spec. Here's a trivial JavaScript function to convert JSON5 to regular JSON with no comments and quoted identifiers and all that good stuff:
function JSON5_to_JSON(str) {
return JSON5.parse(str).stringify();
}
This is exactly what is suggested in the Usage section of the linked article.
If the author of the JSON spec and JSMin says it's the right tool for the job, I am inclined to trust him on that barring further evidence that it is not...
> YAML is annoying, can't use tabs, you have to use spaces
Are you serious? Would you put tabs in source code too?
> and most yaml parses if they see a tab don't warn you or anything
If they warned you, you still can't use tabs it's just that you're more aware that you can't use them.
It's not entirely clear to me what your point is. Perhaps it's that tabs is your preference. Unfortunately, spaces are the preferred whitespace marker for 99% of programmers.
Also, pulled directly from the YAML FAQ [1]:
Why does YAML forbid tabs?
Tabs have been outlawed since they are treated differently by different editors and tools.
And since indentation is so critical to proper interpretation of YAML, this issue is just too
tricky to even attempt. Indeed Guido van Rossum of Python has acknowledged that allowing TABs
in Python source is a headache for many people and that were he to design Python again, he
would forbid them.
>Are you serious? Would you put tabs in source code too?
Of course. Why the fuck wouldn't I put tabs in source code?
We INDENT the code pressing tab. We don't press 4 spaces. Why shouldn't the code reflect that? And tabs are symbolic (logical entities), so they are customizable.
You suggest we'd rather use the elaborate kludges to handle spaces as tabs in the editor? What year is this? 1978?
Because tabs are a pain in the arse when you open them in a different editor and neat code suddenly becomes unreadable as the indentation gets messed up.
Most editors have a "tab button prints spaces" option that isn't too difficult to find. (Plenty of them have a format code option as well, so it isn't such a big deal, but overall I find it easier just to switch everything to spaces.)
>Because tabs are a pain in the arse when you open them in a different editor and neat code suddenly becomes unreadable as the indentation gets messed up.
I never had that experience, and I've used Vim, Eclipse, ST3, TextMate and BBEdit. How does that ever happen?
A tab is a tab, no matter the editor. One might be set to show it as 8 chars wide or 4 chars wide etc (since it's a logical unit), but no indentation gets "messed up".
If by identation you mean: "variables arranged to start at the same point because the programmer has OCD", maybe. But no declarations or indentation that matters, like braces etc ever changes.
>Most editors have a "tab button prints spaces" option
Yes. The back-to-the-seventies elaborate kludge I've already mentioned. It's 2014.
It happens when you open up code written by others. That's when I usually see it.
In that case you would have to set your editor to display 8 spaces per tab, or 4 or whatever the code editor where the code was written was using. I don't see that being any less of a kludge than changing it to do spaces when a tab is pressed.
I have given you a reason why I prefer spaces over tabs. Your reasoning seems to be "because I can". Then start harping back to the 70's despite that part of your comment being irrelevant.
Is there an actual reason you prefer tabs over spaces, given that you can't visually tell the difference most of the time?
I think you might overstate the percentage here, but I think the overall sentiment is true. But I've always thought a better approach would be to ban combining leading tabs with leading spaces at the file level and leaving the choice up to the user beyond that. I believe python3 actually takes this approach.
But I am glad that YAML made any choice at all. Allowing mixing is absolutely the worst possible option.
It's a lovely theory, but put into practice it's a lot of work to maintain, especially with more than one editor. I was a proponent of this concept for a long time, but it's just not worth it [1]. If you want to be able to align the leading edge to something other than a tab stop you are better off from a practical perspective to just enforce the use of spaces.
[1] Think in particular of when code moves around. Sometimes it moves to a place where the indent level is different but the total number of expanded spaces is the same. This looks fine until you look in another editor.
>Sometimes it moves to a place where the indent level is different but the total number of expanded spaces is the same.
This is exactly why Smart Tabs is the only intelligent way to do it - if you change the width of a tab, everything resizes instantly to match the current user's aesthetics, but things that need to be aligned stay aligned. The only case in which it breaks is if you switch to a non-monospaced font, and in that case, God help you.
The problem occurs when you are SSH'ed to a machine you are setting up, or one that is customer owned, you use whatever editor is around to make a quick edit, not realising that the file labeled .conf is actually a YAML file, you hit tab because it makes logical sense and then shit breaks.
No thanks, YAML is a terrible choice for configuration files.
If you're connecting to a machine over SSH and you're making enough changes that it's going to be difficult to maintain an indentation scheme, open it up on a local editor with SFTP. Emacs supports it out of the box, and Sublime has an excellent nagware SFTP package in the standard repos.
All YAML files are required to begin with "---". If you're aware of what YAML is, and you've seen YAML ebfore, and yet you somehow "aren't aware" that something is a YAML file, you're incompetent.
Which is precisely why tabs-only is a problem: tabs cannot express character-granularity alignment, so you HAVE to mix them with spaces in this way. (Or give up pretty alignment in favor of fixed indentation for continued expressions, which I'm not willing.)
Spaces just get out of my way.
P.S. I'm undoubtly biased by Python. The community's agreement on spaces and specifically 4 spaces has been a pure blessing.
Because not every editor I use will automatically turn the tab into spaces. This is especially a problem when I am editing a YAML file over SSH using vi or nano, or pico or thousands of other editors.
They stick the damn tab in, then the program that is parsing said YAML file does the wrong thing and I am confused, annoyed, and most times really pissed off because I will spend hours trying to figure out what went wrong.
I for one am looking forward to the time when neural implants will translate my high level programming thoughts into microcode sent directly to the CPU for execution.
That's what we did for some of our KDE build infrastructure metadata, since there was no way I was using YAML if I could avoid it for something this simple.
> As Stormbrew already pointed out, we already have a format that is ideal for configurations (and sure, data exchange, why not), and it is called yaml.
Unfortunately YAML for untrusted input and data exchange is unsafe by default, depending on the language and implementation. A flag might need to be set, or extra modules included like SafeYAML[1] to keep Yaml from instantiating arbitrary objects.
I thought the problem wasn't with yaml but with allowing deserialize arbitrary objects which is unsafe by default for a format used both for 'trusted' and 'untrusted' input, If you have a json library which tries to allow deserializing arbitrary objects by default (with a load rather then unsafe_load method). Python's pickle serialization is unsafe but it warns you that its unsafe and is not widely used leading to it not being used as a serialization format for for unsafe input.
I see this comment every time for many of years in almost every discussion about using JSON for configuration and while YAML is certainly used by many projects most of the people still continue to use JSON and I think that's because YAML sometimes feels like Scala of serialization markups when people just want something like Python. I personally think that TOML[0] is not only more simple but also as easy to read as YAML and we use it in our projects without any issues.
TOML is ok too, though I still like YAML better personally. I honestly didn't really like ini files much at any point, even when everything on windows used them, so TOML starts from an unhappy premise for me.
They probably aren't going to get native json5 parsing either (except in the sense that you can do something stupid like eval it). That said, I don't think there's any particular need for yaml in the browser. Browser code is usually dealing with machine-generated data, where even normal json is just fine.
If this was merely a configuration (say for node.js) which sits on the server, then you probably just import a npm to read yaml (I don't write node.js so I don't know if using .yaml is feasible as node.js config file or not). So the use case is limited.
I used to work on a project which users could edit a configuration file through a web editor and we chose YAML because writing JSON by hand is painful (I hate the comma error!). But we processed this YAML file for the user on the server side, so having a native YAML parser in browser and Javascript wouldn't really help me at all.
No, I'm talking about configuration files that get interpreted by code that is executing within the browser.
For example, a configuration-data file format for specifying a "brush" in a JS-client paint program. That'd obviously be a schema on top of JSON, right? Well, now you've got all of JSON's inherent limitations.
You're thinking of the live representation of the "model" of a brush in the program. I'm talking about the "definition" of a brush, from which the program loads that model. Another example would be, say, a "level" in an HTML5 game. These things ship alongside the game as blobs of data. Those blobs need a format that the browser can parse. Currently, JSON is that format, and it's inadequate for that.
YAML failed the intelligence test of getting its name right in the first place: "Yet Another Markup Language". YAML is NOT a markup language, in any way shape or form, yet the people who designed and named it earnestly thought they were designing a markup language, and that there was a need for yet another one.
Only when somebody pointed out that obvious fact to them, did they come up with a recursive retronym to paper over their initial stupidity: "YAML Ain't Markup Language". How clever by half.
I prefer to use formats that were designed by people who actually knew what they were doing and what it was called and how it was meant to be used.
We tried to use YAML at first, but problems with data types (can't correctly remember what, but there was no way to force something to be something) made us to rewrite our test fixtures to JSON. The only problem with JSON for us is that it doesn't support any comments, but JSON5 seems to fix it.
There's a interesting format for configuration called TOML[1], you should check it out!
JSON Schema has a ton of implementations and tools hanging off of it. It is possible to load and validate a config, and then display a reasonably nice UI for editing it (in such a way that the resulting state is also valid), all by creating a single declarative JSON Schema. Nothing comparable exists for YAML as far as I can tell, and by virtue of its complexity, it is unlikely to exist for quite some time.
Maybe the question is instead; why the hell do we need comments (and loosening of t he syntax, etc) in the first place?
Are we seriously going to keep insisting on json as a configuration format?
As Stormbrew already pointed out, we already have a format that is ideal for configurations (and sure, data exchange, why not), and it is called yaml.
yaml have comments
yaml makes it easy to enter multiline strings
and most if all; yaml is very very easy to write!
tl;dr: Just use a format suited for your needs instead of trying to change something that doesn't. Oh, and a couple of smiley faces thrown in there to ensure people don't read this in the wrong tone. People do that.. Like, all the time.. damn, now my tl;dr is too damn long! i have to add another.
tl;dr;tl;dr YAML BITCHES! (╯°□°)╯︵ ┻━┻ (but also, a puppy: http://i.imgur.com/kuDsS0i.jpg )