Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I've had a few interesting operating system development experiences. Warning: rambling alert!

Circa 1984: I was working at Callan Data Systems, a small 68k workstation maker in the greater Los Angeles region (just outside Thousand Oaks, for those familiar with the region).

We had been using Unix ports from UniSoft, but for the new 68010 and 68020 based systems we were developing we were doing our own port from the AT&T sources. I don't recall if our base was System III or if it was System V Release 1 (I'm sure it was earlier than SVR2, for reasons that will become apparent).

This version of Unix did not support demand paged virtual memory. It was a classic swapping system. I was rewriting the process and memory subsystems to add demand paged virtual memory (that's why I know we were starting from something earlier than SVR2, because SVR2 was when AT&T added demand paged virtual memory support).

It was running quite well, except I had this one annoying bug where occasionally when a signal was delivered to a process that had a signal handler installed for that process, the process would get some kind of error, like an illegal instruction trap. For instance, hitting control-C in the shell might hit the bug. There was no sign of memory corruption, and no illegal instructions where it would claim it had been executing.

I spend some long, late evenings with the in-circuit emulator and the logic analyzer, trying to figure out what the hell was going on. Eventually, I was able to determine that it only happened if the signal was delivered while the system was trying to return from handling a page fault for the process that was receiving the signal.

On the original 68000, virtual memory was not supported. When a bus error was generated by an invalid memory access, the exception stack frame that contained information about the error did not contain enough information to restart or resume the failed instruction. You had no choice really except to kill the process. Hence, almost all 68000 systems were pure swapping systems [1]. (It was possible to do on-demand stack space allocation even on the 68000, through a bit of a kludge [2]).

The 68010 added support for virtual memory. The way it did this was to make bus error push a special extended exception stack frame. This extended frame contained internal processor state information. When you did the return from exception, the processor recognized that the exception had the extended frame, and restored that state information. (This is called instruction continuation, because the processor continues after the interrupt by resuming processing in the middle of the interrupted instruction. The other major approach, which is what I believe most Intel processors use, is called instruction restart. With that approach, the processor does not save any special internal state information. If it was 90% of the way through a complicated instruction when the fault occurred, it will redo the entire instruction when resuming. Instruction continuation raises some annoying problems on multiprocessor systems [3]).

The way signals are delivered is that whenever the kernel is about to return to user mode, it does a check to see if the user process has any signals queued. If it does, and they are trappable and the process has a signal handler, the kernel fiddles with the stack, making it so that (1) when it does the return to user mode it will return to the start of the signal handler instead of to where it would have otherwise returned, and (2) there is a made-up stack frame on the stack after that so that when the signal handler returns, it returns to the right place in the program.

This was fine if the kernel was entered via a mechanism that generated a normal interrupt stack frame, such as a system call or a timer interrupt or a device interrupt. When the kernel was entered via a bus error due to a page fault, then the stack frame was that special extended frame, with all the internal processor state in it. When we fiddled with that to make it return to the signal handler, the result was the processor tried to resume the first instruction of the signal handler, but the internal state was for a different interrupted instruction, and if these did not match bad things happened.

The fix? I put a check in the signal delivery code to check for the extended frame. If one was present, I turned on the debug flag and returned from the page fault without trying to deliver the signal. The instruction that incurred the page fault would then resume and complete, the processor would see that the debug flag was on, and would generate a debug interrupt. That gave control back to the kernel, where I could then turn the debug flag off, and during the return from the debug interrupt do the stack manipulation to deliver the signal.

(continued in reply)



Circa 1986: I was working at Interactive Systems Corporation in Santa Monica. ISC had a contract from AT&T to produce the official port of System V Release 3 to Intel's new processor, the 80386. Unfortunately, the contract also called for porting it to the 80286, and that was what I was working on the kernel, with one other programmer. That was a kludge. We got it working, but there was a strange scheduling bug. If you loaded it down with around 10 processes, each using a lot of memory, so the system had to make heavy use of virtual memory, what you'd see is that 1 process would get about 90% of the available processing time, 1 process would get essentially no processing time, and the remaining 8 would split the remaining 10% of the processing time pretty much equally. It would stay this way for a few hours, and then it would thrash especially hard for a short while, and go back to the previous pattern, except the processes had shuffled, so it was a different process getting the 90% and a different getting screwed with 0%, and the remaining 10% equally shared among the remaining processes. So, in a sense, the scheduler was actually quite fair--if you watched it for a week, every process ended up with about the same processor time.

We just could not figure out why the heck it was doing this. We never did solve this. AT&T came to their senses and realized no one wanted a 286 port of SVR3 and dropped that part of the project, and I got moved to the 386 port, where I added an interactive debugger to the kernel, and hacked up the device driver subsystem to allow dynamic loading of drivers at runtime instead of requiring them to be linked in at kernel boot time. (The kernel had grown to big for the real mode boot code, and no one wanted to deal with writing a new boot loader! Eventually, someone bit the bullet and wrote a new, protected mode, boot loader and so we didn't need my cool dynamic device loading system).

Another part of the project with AT&T was providing a way for 386 Unix to run binaries from 286 Unix (probably System III, but I don't recall for sure). Carl Hensler, the senior Unix guru at ISC, and I did that project. (Carl, after ISC was sold to Kodak and then to Sun, ended up at Sun where he became a Distinguished Engineer on Solaris. He now spends much of his time helping his mechanic maintain his Piper Comanche, which he flies around to visit craft breweries). The 286 used a segmented memory model. So did the 386, but since segments could be 4 GB, 386 processes only used one 3 segments (one code, one data, and one stack) which all actually pointed to the same 4 GB space. Fortunately, the segment numbers used for those 3 segments did not overlap the segments used in the 286 Unix process model, so we did not have to do any gross memory hacks to deal with 286 memory layout on the 386. We were able to do most of the 286 support via a user mode program, called 286emul. We modified the kernel to recognize attempts to exec a 286 binary, and to change that to an exec of 286emul, adding the path to the 286 program to the arguments. 286emul would then allocate memory (ordinary user process memory) and load the 286 process into it. We added a system call to the kernel that allowed a user mode process to ask the kernel to map segments for it. 286emul used that to set up the segments appropriately.

Another lucky break was that 286 Unix and 386 Unix used a different system call mechanism. The 286emul process was able to trap system call attempts from the 286 code and handle them itself.

Later, AT&T and Microsoft made some kind of deal, and as part of that they wanted something like 286emul, but for Xenix binaries instead of Unix binaries, and ISC got a contract to do that work. This was done by me and Darryl Richman. It was mostly similar to 286emul, as far as dealing with the kernel. Xenix was farther from 386 Unix than 286 Unix was, so we had quite a bit more work in the 286 Xenix emulator process to deal with system calls, but there was nothing too bad.

There was one crisis during development. Microsoft said that there was an issue that needed to be decided and that it could not be handled by email or by a conference call. We had to have a face to face meeting, and we had to send the whole team. So, Darryl and I had to fly to Redmond, which was annoying because I do not fly. I believe everyone is allowed, and should have, one stubbornly irrational fear, and I picked flying on aircraft that I am not piloting.

So we get to Microsoft, have a nice lunch, and then we gather with a bunch of Microsoft people to resolve the issue. The issue turned out to be dealing with a difference in signal handling between Xenix and Unix. To make this work, the kernel would have to know that a signal was for a Xenix process and treat it slightly different. So...we needed some way for a process to tell the kernel "use Xenix signal handling for me". Microsoft wanted to know if we wanted this to be done as a new flag on an existing ioctl, or if we wanted to add a new "set signal mode" system call. We told them a flag was fine, and they said that was it, and we could go. WTF...this could not have been done by email or over the phone?

But wait...it gets even more annoying. After we got back, and finished the project, Microsoft was very happy with it. They praised it, and sent Darryl copies of all Microsoft's consumer software products as thanks for a job well done. They sent me nothing.

On the 286emul project, Carl was the lead engineer, and the most experienced Unix guy in the company. If AT&T had decided to give out presents for 286emul, I would have fully understood if they gave them only to Carl. On the Xenix emulator, on the other hand, neither Darryl nor myself was lead engineer, and we had about the same overall experience level (I was the more experienced kernel guy, whereas he was a compiler guru, and I had been on the 286emul project that served as the starting point for the Xenix emulator).

All I can come up with for this apparent snub is that in 1982, when I was a senior at Caltech, Microsoft came to recruit on campus. I wasn't very enthusiastic at the interview (I had already decided I did not want to move from Southern California at that time), and I got some kind of brainteaser question they asked wrong (and when they tried to tell me I was wrong, I disagreed). I don't remember the problem for sure, but I think it might have been the Monty Hall problem. Maybe they recognized me at the face to face meeting as the idiot who couldn't solve their brainteaser in 1982, and so assumed Darryl had done all the work.

Three years later, Microsoft recruited Darryl away from ISC, so evidently they really liked him. (As with Carl, you cannot tell the Darryl story without beer playing a role. After Microsoft, Darryl ran his microbrewery for a while, and wrote a book on beer [4]. I don't know why, but a lot of my old friends from school, and my old coworkers from my prior jobs, brew beer as either a hobby or as a side business, or are seriously into drinking craft beers. I do not drink beer at all, so it seems kind of odd that I apparently tend to befriend people with unusual propensities toward beer).

[1] I've heard of one hack that supposedly was actually used by a couple of vendors to do demand paged virtual memory on 68000. They put two 68000s in their system. They were running in parallel, running the same code and processing the same data, except one was running one instruction behind. If the first got a bus error on an attempted memory access, the second was halted, and the bus error handler on the first could examine the second to figure out the necessary state information to restart the failed instruction (after dealing with the page fault). This is one hell of a kludge. (Some versions of the tale say that after the first fixed the page fault, the processors swapped roles. The one that had been behind resumed as the lead processor, and the one that had been in the lead became the follower. I'm not really much of a hardware guy, but I think the first approach, where one processor is always the lead and the other is always the follower, would be easier).

[2] There was not enough information on the 68000 to figure out how to restart after a bus error in the general case, but you could in special cases. Compilers would insert a special "stack probe" at the start of functions. This would attempt to access a location on the stack deep enough to span all the space the function needed for local variables, struct returns, and function calls. The kernel knew about these stack probes, and so when it saw a bus error for an address in the stack segment but below the current stack, it could look around the return address to see if there was a stack probe instruction, and it could figure out a safe place to resume after expanding the stack.

[3] The extended exception frame contains internal processor state information. Different steppings of the same processor model might have different internal state information. After you deal with a page fault for a process, you'll have to resume that process on a processor that has a compatible extended exception frame.

[4] http://www.amazon.com/Bock-Classic-Style-Darryl-Richman/dp/0...


Two awesome stories.

We dealt with the 286 when I was at Mark Williams. They had done the compiler that Intel was using and reselling at the time. When the 286 came out, they were concerned about performance, what with the goofy segment registers and all the various memory models (compact, small, medium, large). The wanted us to guarantee that the performance of the compiled code would be equal to or better than the 8086. Naturally we resisted.

So you must know Ron Lachman.

Oh, also at Mark Williams, the year before I started with them they demonstrated Coherent (v7 unix-alike) on a vanilla IBM PC without any memory protection hardware. Later also done on the Atari ST.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: