This is quite an undertaking and even if they do not achieve all the lofty goals, the experience and the code (if they open-source it) will be extremely valuable.
For many years I made a living working on static analysis tools and that included a lot of compiler construction stuff. Unfortunately not all languages are equally amenable to static analysis.
Take C++ for example. On one hand C++ is statically typed and that helps in static analysis and refactoring. On the other hand, its semantic is so horribly complicated that it makes almost impossible to write static analyzers for it and useful refactoring tools for C++ are almost non-existent. Pointer aliasing and macroprocessor just add insult to injury.
Python on the other hand is very clean language, no macroprocessor, but it is dynamically typed which makes reasoning about code very difficult.
Java (and after it, C#) managed to strike very nice balance, that is why tooling ecosystems for these languages are very good. The road to good tools starts with carefully designed language.
This is why I have high hopes for the Kotlin language. I think it makes a lot of sense to design a language with tooling in mind from the beginning to avoid nightmares like C++.
I think nobody really understands the pitfalls of language tooling better than the Jetbrains guys.
I think there is an elephant in the room of Yegge's article: he is apparently leading the development of a toolchain for code analysis and he is at the same time a self-declared "liberal", who advocates use of metaprogramming, evaled code, and all sort of black magic.
But then almost all the fights he fights everyday are precisely against these multiple techniques that make code analysis, even when simply greping for the place where a thing has been set, a nightmare. (Just imagine you want to know why ab == 2, and it is so because in another line of another file you have eval("a" + x + " = 2"), and because of another remote line of code x happens to equal "b".)
So, we can be optimistic and hope Grok will work. But another possibility is that Yegge's sweet-and-sour rant is the sign that it is unbearable for him to see his "software political belief" to be the cause of the failure of his gigantic project.
The fact that he is talking publicly about an internal project, apparently without authorisation, may mean he is looking for the emergency exit sign over the door...
"The fact that he is talking publicly about an internal project, apparently without authorisation, may mean he is looking for the emergency exit sign over the door..."
From everything said publicly it seems like "grok" isn't a secret for Google, the team members just haven't talked about it much because it is so likely to fail or at least fall short of its goals given how ambitious it is.
The fact that the project wouldn't be explicitly secret is perfectly believable, just look at Go or Dart which have been developed fully in the open. Like those projects, grok seems like the sort of thing that could potentially help Google and everyone but isn't seen as a competitive advantage for advertising/social, so it is perfectly reasonable to believe they aren't trying to keep it a secret.
You can do stuff that's quite a bit more amazing than that (though I guess this goes against his point of "manage expectations"):
For any given C++ refactoring (rename method + swap args), propagate that refactoring automatically to every caller of the refactored function, such that upgrading to a new version of the library is automatically refactored correctly.
Parse this Python function and emit it as C, for optimization. Parse this C function and emit it as Python, for readability.
Cross-layer data-flow analysis - so you could track that a lock that's taken out in Java and released in C++ code would never produce deadlocks.
Idea: pry open compilers, run their “guts” on distributed clusters, output a language neutral index. Serve the index via service APIs. Write client plugins, etc.
Isn't this similar to what Microsoft is trying to achieve with Project Roslyn [1]? Was someone at Microsoft inspired by Yegge's vision and decided to implement it for the .Net Common Language Runtime?
From Eric Lippert's blog posts about this, the Roslyn project most definitely doesn't seem like a one-person-getting-inspired kind of thing. I was talking to a guy in the Roslyn team last week, and the project scope, and the way they have decided to implement it, is quite frankly astonishing. I mean, a dynamic parser/lexer with all the underlying data structures immutable [1]? Impressive.
> Was someone at Microsoft inspired by Yegge's vision and decided to implement it for the .Net Common Language Runtime?
I feel a bit obligated to point out that co-incidental invention/discovery is very possible, especially when trying to solve a problem large numbers of smart people would have exposure to.
The fact that he's such a Emacs fan makes me skeptical about project Grok, to be honest. There is only so far you can take Emacs and for statically typed languages, IDE's leave Emacs in the dust.
I with he were more knowledgeable on IDE's (IDEA, Eclipse, Visual Studio, XCode) so he had a better idea of what modern developers expect from their tools.
I'm hoping Grok will succeed, but I wouldn't be surprised if in the coming year, it gets completely canceled.
So, as Steve mentions in his post, Grok already exists. It's designed to work with all editors, including IDEs. IDEs exhibit ... how shall I say ... interesting failure modes ... on the Google codebase. Grok will benefit users of Eclipse and IntelliJ as much as it will benefit users of Emacs and Vim.
Also worth noting: he is pretty knowledgeable about IDEs too. Why do you think he isn't?
The whole point of Grok is that the phrase "IDEs leave Emacs in the dust" is a function of the poor design of today's tools, not anything inherent in IDEs or Emacs.
"IDE as a service" in the style of Grok (or Ensime for something currently available) puts Emacs on a level playing field with any IDE.
""Crucially, consistent tools make it easier to switch languages. Devs would then be more likely to use the best language for the job. IDE authors can focus on presentation and editing. This leads to more configurability, scriptability, accessibility in languages.""
I am a (sometimes reluctant) polyglot programmer. Sticking with a single IDE product line (I use IntelliJ, RubyMine and PyCharm for some consistency across Clojure, Java, Ruby, and Python development) helps a lot. Eclipse is another great alternative for a unified development platform.
At SAIC in the late 1980s, we had a small team put together for a job - we were all Lisp hackers and we had a great dev community built around sharing Emacs configurations. A lot of fun but our adhoc tooling would not scale up to a Google or Facebook size engineering infrastructure.
Am I the only one that really, really hates the word "grok"? I was hoping it would eventually lose favor among us geeks, but now with this project's name I fear I may just have to get used to it.
I get irritated by words too (for example, Twitter's use of "tweet" bugs me) so sympathize in general, plus I suppose "grok" is rather unpleasant sounding. But the reason I like it, and why I suspect it (sort of) caught on, is that English doesn't have any other word for this. "Understand" is too general. "Grok" means to understand something intuitively, in its essence and as a whole. Perhaps a synonym would be "get" as in "to get it". But "get" is so overloaded that there's room for a more specific alternative.
(Actually the above isn't what "grok" originally meant, but I think it means that now.)
"Now take this one word: 'grok.' Its literal meaning, one which I suspect goes back to the origin of the Martian race as thinking, speaking creatures—and which throws light on their whole 'map'—is quite easy. 'Grok' means 'to drink.' "
"Huh?" said Jubal. "But Mike never says 'grok' when he's just talking about drinking. He—"
"Just a moment." Mahmoud spoke to Mike in Martian.
Mike looked faintly surprised and said, " 'Grok' is drink," and dropped the matter.
"But Mike would also have agreed," Mahmoud went on, "if I had named a hundred other English words, words which represent what we think of as different concepts, even pairs of antithetical concepts. And 'grok' means all of these, depending on how you use it. It means 'fear,' it means 'love,' it means 'hate'—proper hate, for by the Martian 'map' you cannot possibly hate anything unless you grok it completely, understand it so thoroughly that you merge with it and it merges with you—then and only then can you hate it. By hating yourself. But this also implies, by necessity, that you love it, too, and cherish it and would not have it otherwise. Then you can hate—and (I think) that Martian hate is an emotion so black that the nearest human equivalent could only be called a mild distaste."
Mahmoud screwed up his face. "It means 'identically equal' in the mathematical sense. The human cliché, 'This hurts me worse than it does you' has a Martian flavor to it, if only a trace. The Martians seem to know instinctively what we learned painfully from modern physics, that the observer interacts with the observed through the process of observation. 'Grok' means to understand so thoroughly that the observer becomes a part of the process being observed—to merge, to blend, to intermarry, to lose personal identity in group experience. It means almost everything that we mean by religion, philosophy, and science—and it means as little to us as color means to a blind man."
He takes it too far, though. A word so broad as to cover "almost everything that we mean by religion, philosophy, and science" isn't going to be very useful. Maybe that's why, to the extent the word survived the novel, its meaning narrowed.
It's broad in the human (as opposed to Heinlein's Martian) sense because we engage with religion, philosophy, and science in a largely intellectual way. Yes, even wingnut fundamentalists. They're all highly linguistic things; most of our more mystical traditions have fallen by the wayside for one reason or another.
In the Martian sense, that mystical perspective is far more useful: it unifies their physical reality of transcendent Old Ones and life that can be grown through singing and so on. The presence of a single word signals that they've collapsed all of these things down to a simple concept; the lack of a human analog simply says we haven't.
So to make a long thing short, the word cuts to the heart of a conceptual area that we can but dance around with all of our words and qualifications and half-understandings in the fields of religion, philosophy, and science. The desire to know divinity, to know oneself, to know the universe: these can be streamlined into a single concept, to grok.
Thus why Heinlein goes on to have Mike explain his pantheistic "God is that which groks". The closest English word I know of is not "understand" but rather "suffuse" or "permeate". To me, the reason the word survives the novel as a mere stronger version of "understand" is because most people hand-waved the novel's walk-through of religion as a boisterous critique rather than as a considered and valid approach to life, the universe, and everything.
I played amateur philosopher as a teenager and I actually spent a good 5 pages or so on a paper trying to describe this concept... and I think my result had less precision. This is before I had read Stranger.
The quotes at http://en.wikipedia.org/wiki/Grok imply that in Heinlein's novel, to grok something meant to merge one's being with it in a kind of total rapport. So, something more than a variant of cognitive understanding. I haven't read the novel though.
In Stranger in a Strange Land grok was the Martian word for "water".
But water was so tied up to everything in their culture that what it really meant was virtually anything. In particular it meant to really "get" what something meant, or what someone else was saying.
I'm curious, Why? I have no problem with grok as long as anyone that uses the term has actually read 'Stranger in a Strange Land'. Honestly if there ever was a word that needed to be tossed out with the bathwater I believe it would be hacker.
Scaevolus's answer is exactly how I feel. It just very harsh sounding to my minds ear. It literally grates on my psyche every time I have to say it in my head.
And I'm with you on "hacker". I don't have a problem with the word itself, but its so overused in these circles to the point that its irritating every time I see someone try to stuff yet another everyday activity under the hacker term.
> And I'm with you on "hacker". I don't have a problem with the word itself, but its so overused in these circles to the point that its irritating every time I see someone try to stuff yet another everyday activity under the hacker term.
It's at that point that you regex it to something else. [0]
I won't stop using hacker; because it has history. But I do laugh when I see it get applied to everything, it's sad, in a humorous sort of way.
At least the IT community hasn't ruined wizard yet.
I don't care for that word much either. But it's normalized in our culture so I'm used to it which makes it less abrasive. Another problem with "grok" is that I can't convince myself of how to pronounce it. This fact alone is probably causes half of the internal discord I feel when I try to verbalize it internally (does it rhyme with rock or broke?). Also its spelling doesn't have any precedent in English, so every time I see it my subconscious pattern matching system goes haywire.
You know when you read or hear something that is contradictory, or something that is just painfully wrong, and you can almost feel your psyche twinge as it recognizes the error? I get that feeling every time I see this word. Perhaps that's just me.
hmm interesting, just read another article on Grok yesterday posted by a previous intern of that team.
(sadly it's in Chinese, but it covered the detail of Python Indexer's implementation)
http://blog.sina.com.cn/s/blog_5d90e82f010191rh.html
Thanks for sharing. The first article is really interesting. The author worked 3 months as an intern at Google, and his much under-appreciated works saved Steve Yegge's Grok team that has been struggling for years.
> Today I found that I forgot how to derive the defintion of the Y combinator. I learned it several years ago from an online article, but now the search term “Y combinator” gives me all the news about startups (sigh).
Given N langauges and M editors / IDEs, total toolchain effort is N x M… Any toolchain support for this number of systems is non-trivial. [SNIP] How do you solve matrix problems like this? Use a hub and spoke model.
Indeed, every problem in computer science can be solved by yet another layer of indirection.
(and no, not putting every imaginable layer of indirection in there already is not flunking CS 101, like the author suggests. I hope he never becomes an architect somewhere).
The usual dangers of this extra layer of indirection are performance and configuration complexity. In fact, platforms like Eclipse have added many such layers already, and we can feel that (sluggish UI performance, conflicting plugins, random crashes).
I really hope Grok will work as well in practice as it could in theory.
This isn't a pipe dream it's just dumb. Different languages are suitable for different tasks and communities grow up with different goals and acheivements to reflect that. Certain cultural movements are language agnostic (say, testing) but DSLs? HM type theory? Not so much.
We can just look at how hard it is to compare programming languages through common benchmarks is to do to realize what a bad idea this is.
I don't know about you but getting real autocomplete and refactoring in Emacs for M languages is a dream come true for me. I don't want a different IDE for each language. I want one Editor to Rule Them All.
Making the infrastructure to understand a programming language shareable between all the ways to edit/refactor/analyze code in that language is far from dumb. It's something Eclipse/IntelliJ/VisualStudio should have done a long long time ago. Heck I once attempted to pick apart the tooling in Eclipse into something I could use headless without booting the entire thing and the code was so tightly integrated into their GUI I gave up in disgust. I mean this is architecture 101. Clear API's and division of labor should have been on everyone's mind but instead the whole thing is a pile of spaghetti code.
Even something as simple as a CLI interface to your IDE's language support features so I could integrate them into editor of my choice would have been smart. But no one did that. I work with 4 different languages minimum every day at work. I'm not going to context switch my editor/IDE for each one of them. For this reason I'm cheering Yegge on like mad.
(Full disclosure: I work at google same as Yegge. I've had these opinions for longer than I've worked there though.)
> getting real autocomplete and refactoring in Emacs for M languages is a dream come true for me
Even just one. The Python RefactoringBrowser takes tends of seconds just to make a mess of the whole thing. It's a nice effort, but it fails on both reliability and speed.
It is dumb the way it's phrased. The easiest way to achieve it is to define a super language that all other languages can be shallowly encodes into. Describing this as ambitious is dumb.
Very interesting project. I hope documentation (ie., docstrings/documentation comments) will be part of it; it seems like a natural extension of indexing mere grammar. It would be a boon to editors/IDEs to be able to extract documentation alongside autocompleted symbols.
This is interesting, I was only talking about doing this the other day, just for EmacsLisp on github and in the package repositories. Because of the constraint mine would be worse. So maybe better. Hmmmm.
I'll take a stab at it. (That is to say, "I think I understand how the phrase is being used, at least to a degree that it makes perfect sense to me. I may, of course, be wrong. But I'll try to translate my understanding.")
At any level above the trivial, the code we write (or work with) is very often not a direct set of instructions for the machine to carry out. Here, I don't simply mean a high-level language is not machine code or assembly. I think it's safe to assume that we all get that much at least.
Rather, it's a lot like everyday conversation, but with Lewis Carrol's Humpty Dumpty character. There will be common (and less-than-common) cultural references, along with a whole bunch of private definitions for things. Actually understanding the conversation means not only having those cultural references in common[1], but also having access to the private definitions[2] (such as "glory" meaning "a nice knock-down argument").
So the text on the screen (or in the file) is not the actual code, or even a complete description of the code, that the machine will run. It's only a sort of short-hand representation, a language we created along the way to talk about the instructions (even if, at the basic level of the language facilities, the operators and keywords are directly translatable). Simply working with the text content, then, as one would do with a simple parser, or as a human programmer would do with a single text file in isolation, is insufficient; the person or tool would have to have a much deeper understanding of the implications of the text in order to make any real sense of it.[3] For a tool to offer useful help (beyond the simple matter of altering text), it has to be able to "understand" what the text means, and that , in turn, means looking beyond the simple arrangement of text characters.
[1] It might help to think about the Darmok episode of Start Trek TNG. Library calls are very much like the metaphorical language of the Tamarians, in that they make perfect sense if we know the "story" being referenced, but not knowing means either the reference is meaningless or can even be misleading.
[2] That applies as much to preprocessing directives like constant definitions as it does to things like function/method calls.
[3] We do this is natural languages as well. Every time you've run across a cant, an argot or a jargon specific to one field of endeavor, you either understand it because you are familiar with the displaced vocabulary, or you're left in that uncomfortable state of knowing (or thinking you know) what most of the words mean, but failing entirely to understand what's being said.
For many years I made a living working on static analysis tools and that included a lot of compiler construction stuff. Unfortunately not all languages are equally amenable to static analysis.
Take C++ for example. On one hand C++ is statically typed and that helps in static analysis and refactoring. On the other hand, its semantic is so horribly complicated that it makes almost impossible to write static analyzers for it and useful refactoring tools for C++ are almost non-existent. Pointer aliasing and macroprocessor just add insult to injury.
Python on the other hand is very clean language, no macroprocessor, but it is dynamically typed which makes reasoning about code very difficult.
Java (and after it, C#) managed to strike very nice balance, that is why tooling ecosystems for these languages are very good. The road to good tools starts with carefully designed language.