"The more interesting/optimized ways to map this would be where a single object in the software internet somehow maps to multiple computers, either doing parallel computation or partitioned computation on each. I feel the semantics of mapping the object onto a physical computer would have to be encoded in the object itself."
You might be interested in Alan Kay's '97 OOPSLA presentation. He talked in a similar vein to what you're talking about: https://youtu.be/oKg1hTOQXoY?t=26m45s
I only got that far with it, because I realized once I did it that I had more work to do in understanding what to do with what I got back (mainly translating it into something that would keep with the beauty of what I had started)...
"Maybe there is a feedback loop where the growth of Unix leads to hardware vendors thinking 'lets optimize for C', which then feeds the growth further? OTOH, even emulated machines are faster than hardware machines used to be."
There is a feedback loop to it, though as development platforms change, that feedback gets somewhat attenuated.
As I recall, what you describe with C happened, but it began in the late '90s, and into the 2000s. I started hearing about CPUs being optimized to run C faster at that point.
I once got into an argument with someone on Quora about this re. "If Lisp is so great, why aren't more people using it?" I used Kay's point about how bad processor designs were partly to blame for that, because a large part of why programmers make their choices has to do with tradition (this gets translated to "familiarity"). Lisp and Smalltalk did not run well on the early microprocessors produced by these companies in the 1970s. As a consequence, programmers did not see them as viable for anything other than CS research, and higher-end computing (minicomputers).
A counter to this was the invention of Lisp machines, with processors designed to run Lisp more optimally. A couple companies got started in the '70s to produce them, and they lasted into the early '90s. One of these companies, Symbolics, found a niche in producing high-end computer graphics systems. The catch, as far as developer adoption went, was these systems were more expensive than your typical microcomputer, and their system stuff (the design of their processors, and system software) was not "free as in beer." Unix was distributed for free by AT&T for about 12 years. Once AT&T's long-distance monopoly was broken up, they started charging a licensing fee for it. Unix eventually ran reasonably well on the more popular microprocessors, but I think it's safe to say this was because the processors got faster at what they did, not that they were optimized for C. This effect eventually occurred for Lisp as well by the early '90s, which is one reason the Lisp machines died out. A second cause for their demise was the "AI winter" that started in the late '80s. However, by then, the "tradition" of using C, and later C++ for programming most commercial systems had been set in the marketplace.
The pattern that seems to repeat is that languages become popular because of the platforms they "rode in on," or at least that's the perception. C came on the coattails of Unix. C++ seems to have done this as well. This is the reason Java looks the way it does. It came out of this mindset. It was marketed as "the language for the internet," and it's piggybacked on C++ for its syntax and language features. At the time the internet started becoming popular, Unix was seen as the OS platform on which it ran (which had a lot of truth to it). However, a factor that had to be considered when running software for Unix was portability, since even though there were Unix standards, every Unix system had some differences. C was reasonably portable between them, if you were careful in your implementation, basically sticking to POSIX-compliant libraries. C++ was not so much, because different systems had C++ compilers that only implemented different subsets of the language specification well, and didn't implement some features at all. C++ was used for a time in building early internet services (combined with Perl, which also "rode in" on Unix). Java was seen as a pragmatic improvement on C++ among software engineers, because, "It has one implementation, but it runs on multiple OSes. It has all of the familiarity, better portability, better security features, with none of the hassles." However, it completely gave up on the purpose of C++ (at the time), which was to be a macro language on top of C in a similar way to how Simula was a macro language on top of Algol. Despite this, it kept C++'s overall architectural scheme, because that's what programmers thought was what you used for "serious work."
From a "power" perspective, one has to wonder why programmers, when looking at the prospect of putting services online, didn't look at the programming architecture, since they could see some problems with it pretty early, and say to themselves, "We need something a lot better"? Well, this is because most programmers don't think about what they're really dealing with, and modeling it in the most comprehensive way they can, because that's not a concept in their heads. Going back to my first point about hardware, for many years, the hardware they chose didn't give them the power so they could have the possibility to think about that. As a result, programmers mostly think about traits, and the community that binds them together. That gives them a sense of feeling supported in their endeavors, scaling out the pragmatic implementation details, because they at least know they can't deal with that on their own. Most didn't think to ask (including myself at the time), "Well, gee. We have these systems on the internet. They all have different implementation details, yet it all works the same between systems, even as the systems change... Why don't we model that, if for no other reason than we're targeting the internet, anyway? Why not try to make our software work like that?"
On one level, the way developers behave is tribal. Looked at another way, it's mercantilistic. If there's a feedback loop, that's it.
"OTOH, even emulated machines are faster than hardware machines used to be."
What Kay is talking about is that the Alto didn't implement a hard-coded processor. It was soft-microcoded. You could load instructions for the processor itself to run on, and then load your system software on top of that. This enabled them to make decisions like, "My process runs less efficiently when the processor runs my code this way. I can change it to this, and make it run faster."
This will explain Kay's use of the term "emulated." I didn't know this until a couple years ago, but at first, they programmed Smalltalk on a Data General Nova minicomputer. When they brought Smalltalk to the Alto, they microcoded the Alto so that it could run Nova machine code. So, it sounds like they could just transfer the Smalltalk VM binary to the Alto, and run it. Presumably, they could even transfer the BCPL compiler they were using to the Alto, and compile versions of Smalltalk with that. The point being, though, that they could optimize performance of their software by tuning the Alto's processor to what they needed. That's what he said was missing from the early microprocessors. You couldn't add or change operators, and you couldn't change how they were implemented.
Actually ... only the first version of Smalltalk was done in terms of the NOVA (and not using BCPL). The subsequent versions (Smalltalk-76 on) were done by making a custom virtual machine in the Alto's microcode that could run Smalltalk's byte codes efficiently.
The basic idea is that you can win if the microcode cycles are enough faster than the main memory cycles so that the emulations are always waiting on main memory. This was generally the case on the Alto and Dorado. Intel could have made the "Harvard" 1st level caches large enough to accommodate an emulator -- that would have made a big difference. (This was a moot point in the 80s)
I know this is getting nit-picky, but I think people might be interested in getting some of the details in the history of how Smalltalk developed. Dan Ingalls said in "Smalltalk-80: Bits of History":
"The very first Smalltalk evaluator was a thousand-line BASIC program which first evaluated 3 + 4 in October 1972. It was followed in two months by a Nova assembly code implementation which became known as the Smalltalk-72 system."
The first Altos were produced, if I have this right, in 1973.
I was surprised when I first encountered Ingalls's implementation of an Alto on the web, running Smalltalk-72, because the first thing I was presented with was, "Lively Web Nova emulator", and I had to hit a button labeled "Show Smalltalk" to see the environment. He said what I saw was Nova machine code from a genuine ST-72 image, from an original disk platter.
I take it from your comment that you're saying by the time ST-76 was developed, the Alto hardware had become fast enough that you were able to significantly reduce your use of machine code, and run bytecode directly at the hardware level.
I could've sworn Ingalls said something about using BCPL for earlier versions of Smalltalk, but quoting out of "Bits of History" again, Ingalls, when writing about the Dorado and Smalltalk-80, said of BCPL that the compiler you were using compiled to Alto code, but ...
"As it turned out, we only used Bcpl for initialization, since it could not generate our extended Alto instructions and since its subroutine calling sequence is less efficient than a hand-coded one by a factor of about 3."
The Alto didn't get any faster, and there was not a lot of superfast microcode RAM (if we'd had more it would have made a huge difference). In the beginning we just got Smalltalk-72 going in the NOVA emulator. Then we used the microcode for a variety of real-time graphics and music (2.5 D halftone animation, and several kinds of polytimbral timbre synthesis including 2 keyboards and pedal organ). These were separate demos because both wouldn't fit in the microcode. Then Dan did "Bitblt" in microcode which was a universal screen painting primitive (the ancestor of all others). Then we finally did the byte-code emulator for Smalltalk-76. The last two fit in microcode, but the music and the 2.5 D real-time graphics didn't.
The Notetaker Smalltalk (-78) was a kind of sweet spot in that it was completely in Smalltalk except for 6K bytes of machine code. This was the one we brought to life for the Ted Nelson tribute.
Thanks for the long write up. I found it very interesting.
> You might be interested in Alan Kay's '97 OOPSLA presentation
Oh yeah I have actually seen that - probably time to watch it again.
> Well, this is because most programmers don't think about what they're really dealing with
Agree with that. Most people are working on the 'problem at hand' using the current frame of context and ideas and focus on cleverness, optimization or throughput within this framework. When changing the frame of context may in fact be much better.
> What Kay is talking about is that the Alto didn't implement a hard-coded processor. It was soft-microcoded.
Interesting. I wonder if FPGAs could be used for something similar - i.e. program the FPGAs to run your bytecode directly. But I'm speculating because I don't know too much about FPGAs.
Yes re: FPGAs -- they are definitely the modern placeholder of microcode (and better because you can organize how the computation and state are hooked together). The old culprit -- Intel -- is now offering hybrid chips with both an ARM and a good size patch of FPGA -- combine this with a decent memory architecture (in many ways the hidden barrier these days) and this is a pretty good basis for comprehensive new designs.
You might be interested in Alan Kay's '97 OOPSLA presentation. He talked in a similar vein to what you're talking about: https://youtu.be/oKg1hTOQXoY?t=26m45s
Inspired by what he said there, I tried a little experiment in Squeak, which worked, as far as it went (scroll down the answer a bit, to see what I'm talking about, here): https://www.quora.com/What-features-of-Smalltalk-are-importa...
I only got that far with it, because I realized once I did it that I had more work to do in understanding what to do with what I got back (mainly translating it into something that would keep with the beauty of what I had started)...
"Maybe there is a feedback loop where the growth of Unix leads to hardware vendors thinking 'lets optimize for C', which then feeds the growth further? OTOH, even emulated machines are faster than hardware machines used to be."
There is a feedback loop to it, though as development platforms change, that feedback gets somewhat attenuated.
As I recall, what you describe with C happened, but it began in the late '90s, and into the 2000s. I started hearing about CPUs being optimized to run C faster at that point.
I once got into an argument with someone on Quora about this re. "If Lisp is so great, why aren't more people using it?" I used Kay's point about how bad processor designs were partly to blame for that, because a large part of why programmers make their choices has to do with tradition (this gets translated to "familiarity"). Lisp and Smalltalk did not run well on the early microprocessors produced by these companies in the 1970s. As a consequence, programmers did not see them as viable for anything other than CS research, and higher-end computing (minicomputers).
A counter to this was the invention of Lisp machines, with processors designed to run Lisp more optimally. A couple companies got started in the '70s to produce them, and they lasted into the early '90s. One of these companies, Symbolics, found a niche in producing high-end computer graphics systems. The catch, as far as developer adoption went, was these systems were more expensive than your typical microcomputer, and their system stuff (the design of their processors, and system software) was not "free as in beer." Unix was distributed for free by AT&T for about 12 years. Once AT&T's long-distance monopoly was broken up, they started charging a licensing fee for it. Unix eventually ran reasonably well on the more popular microprocessors, but I think it's safe to say this was because the processors got faster at what they did, not that they were optimized for C. This effect eventually occurred for Lisp as well by the early '90s, which is one reason the Lisp machines died out. A second cause for their demise was the "AI winter" that started in the late '80s. However, by then, the "tradition" of using C, and later C++ for programming most commercial systems had been set in the marketplace.
The pattern that seems to repeat is that languages become popular because of the platforms they "rode in on," or at least that's the perception. C came on the coattails of Unix. C++ seems to have done this as well. This is the reason Java looks the way it does. It came out of this mindset. It was marketed as "the language for the internet," and it's piggybacked on C++ for its syntax and language features. At the time the internet started becoming popular, Unix was seen as the OS platform on which it ran (which had a lot of truth to it). However, a factor that had to be considered when running software for Unix was portability, since even though there were Unix standards, every Unix system had some differences. C was reasonably portable between them, if you were careful in your implementation, basically sticking to POSIX-compliant libraries. C++ was not so much, because different systems had C++ compilers that only implemented different subsets of the language specification well, and didn't implement some features at all. C++ was used for a time in building early internet services (combined with Perl, which also "rode in" on Unix). Java was seen as a pragmatic improvement on C++ among software engineers, because, "It has one implementation, but it runs on multiple OSes. It has all of the familiarity, better portability, better security features, with none of the hassles." However, it completely gave up on the purpose of C++ (at the time), which was to be a macro language on top of C in a similar way to how Simula was a macro language on top of Algol. Despite this, it kept C++'s overall architectural scheme, because that's what programmers thought was what you used for "serious work."
From a "power" perspective, one has to wonder why programmers, when looking at the prospect of putting services online, didn't look at the programming architecture, since they could see some problems with it pretty early, and say to themselves, "We need something a lot better"? Well, this is because most programmers don't think about what they're really dealing with, and modeling it in the most comprehensive way they can, because that's not a concept in their heads. Going back to my first point about hardware, for many years, the hardware they chose didn't give them the power so they could have the possibility to think about that. As a result, programmers mostly think about traits, and the community that binds them together. That gives them a sense of feeling supported in their endeavors, scaling out the pragmatic implementation details, because they at least know they can't deal with that on their own. Most didn't think to ask (including myself at the time), "Well, gee. We have these systems on the internet. They all have different implementation details, yet it all works the same between systems, even as the systems change... Why don't we model that, if for no other reason than we're targeting the internet, anyway? Why not try to make our software work like that?"
On one level, the way developers behave is tribal. Looked at another way, it's mercantilistic. If there's a feedback loop, that's it.
"OTOH, even emulated machines are faster than hardware machines used to be."
What Kay is talking about is that the Alto didn't implement a hard-coded processor. It was soft-microcoded. You could load instructions for the processor itself to run on, and then load your system software on top of that. This enabled them to make decisions like, "My process runs less efficiently when the processor runs my code this way. I can change it to this, and make it run faster."
This will explain Kay's use of the term "emulated." I didn't know this until a couple years ago, but at first, they programmed Smalltalk on a Data General Nova minicomputer. When they brought Smalltalk to the Alto, they microcoded the Alto so that it could run Nova machine code. So, it sounds like they could just transfer the Smalltalk VM binary to the Alto, and run it. Presumably, they could even transfer the BCPL compiler they were using to the Alto, and compile versions of Smalltalk with that. The point being, though, that they could optimize performance of their software by tuning the Alto's processor to what they needed. That's what he said was missing from the early microprocessors. You couldn't add or change operators, and you couldn't change how they were implemented.