More

tryp · on Feb 27, 2015

I'm not sure which ones you're referring to, but the NVDIMMs I'm familiar with [0] are are normal DRAMs with an additional hold-up supercap, controller, and flash. When power goes away, the controller streams DRAM contents to flash. Linux block device Drivers [1] exist as does some filesystem support [2] for ext4fs.

[0] http://en.wikipedia.org/wiki/NVDIMM

[1] https://lkml.org/lkml/2014/8/27/674

[2] https://lkml.org/lkml/2014/3/23/121

tryp · on Feb 23, 2015

>Why can't the HDD vendors publish a md5/sha1 hash of the firmware so we know what the value should be?

Because the only way to actually verify the hash of the firmware is to connect to the drive's controller outside of the firmware's control with something like JTAG or a direct dump of the flash. Otherwise, the PC would send a command to ask the HD firmware what it's own hash is. The compromised HD firmware can then simply respond with a published vendor hash.

romaniv · on Feb 23, 2015

The hash can be computed in hardware or via ROM program.

tryp · on Feb 23, 2015

Of course the firmware could hash itself. The question is what value is there in trusting and untrusted component to tell you it's trustworthy.

tryp · on Feb 6, 2015

The most likely explanation is that 64Mb (8MB) was the highest density part available in the footprint (SOIC-8?) at the time of manufacture and was priced at a significant premium to the 32Mb part.

tryp · on Feb 6, 2015

By mundane coincidence, I discovered uefi-firmware-parser yesterday. Along with a radare2 session, it made it much easier to find a null-pointer dereference in an Intel FSP binary blob.

tryp · on Dec 21, 2014

(U)EFI is essentially a little OS that eventually loads the OS that runs the software you care about. Intel sponsors most of the core OS code (mirrored) at: https://github.com/tianocore/edk2

A Bios vendor takes that code, drops in a bunch of hardware init code from Intel (or AMD), adds thier own user interface, "csm16" old-school BIOS implementation, and value-adds like debugging and automation for factory test and provisioning.

In order to comprehend anything in the codebase, the first step is probably to get acquainted with the local vernacular. https://github.com/tianocore/tianocore.github.io/wiki/Acrony...

tryp · on Dec 21, 2014

The best place to look is probably Coreboot. It's the boot firmware used by Google Chromebooks. (Github mirror links used here for politeness to the project's servers.) Taking Intel Haswell processors as an example (because I know this code path) we can sketch the general process.

Start at 0xfffffff0, the boot vector for x86, executing in 16-bit "I can run DOS 1.0" mode. https://github.com/coreboot/coreboot/blob/master/src/cpu/x86...

It just jumps to the entry to 32-bit mode https://github.com/coreboot/coreboot/blob/master/src/cpu/x86...

Turn on FPU https://github.com/coreboot/coreboot/blob/master/src/cpu/x86...

Turn on SSE https://github.com/coreboot/coreboot/blob/master/src/cpu/x86...

Configure the cache not to require a backing DRAM so that it can be used temporarily as RAM. https://github.com/coreboot/coreboot/blob/master/src/cpu/int...

Now that "RAM" is available for use as a stack, the next steps can be written in plain-ole C https://github.com/coreboot/coreboot/blob/master/src/cpu/int...

From there, mainboard-specific code sets up things like which SuperIO chip to configure, the i2c addresses to interrogate for information on RAM geometry and timing, and how the chipset is wired to connectors on the board. Commong chipset (northbridge and southbridge) init code is run using that configuration data. https://github.com/coreboot/coreboot/blob/master/src/mainboa...

Then DRAM is initialized (the Haswell example is a bit lame in that currently a binary blob of compiled code from Intel does this job.) The Sandybridge DDR3 init was recently reverse-engineered and re-implemented and fully exemplifies the training processes required. https://github.com/coreboot/coreboot/blob/master/src/northbr...

Now that Gigagbytes of RAM are available, another boot stage is fetched from flash. When generic framework gets back to cpu-specific stuff, power management is configured,Inter-Processor Interrupt handlers are installed, and other cores go through a quick init sequence. https://github.com/coreboot/coreboot/blob/master/src/cpu/int...

Then essentially the PCI tree is walked to setup all the chipset devices. https://github.com/coreboot/coreboot/tree/master/src/southbr...

Once hardware is running, Coreboot loads a "payload" that is in turn responsible for loading the OS.

pgeorgi · on Dec 29, 2014

That's a really neat walk-through - mind if I put it on the coreboot wiki? And how do you want attribution to look like?

tryp · on Nov 24, 2014

An invalid or debug instruction causes the processor to throw an exception and jump to a specific address where "handler" code should exist. This is managed by the OS, so in this case, the OS would look up the proper thing to do as configured by the earlier ptrace calls -- probably executing your debugger code to let you inspect variables and memory. When you're done inspecting the proper instruction gets re-installed in the target process, the OS restores state saved by the exception trap, and transfers execution back to the debugged process. Most processors have a single-step flag so that the next instruction to execute and the following trigger the exception.

tryp · on Nov 24, 2014

There is typically only one (or a small set) of hardware instruction pointer breakpoints. Every time the CPU is about to execute an instruction, it compares IP to the IP breakpoint register (or set of registers) and if they match, a debug exception is thrown. These are what you must use to debug the BIOS/bootloader executing in-place from ROM with a JTAG or XDP hardware debugger attached. Later, software can add the proper exception handler once enough of the hardware platform is configured. (Not all architectures exit power-on reset with exceptions enabled or even exception vectors mapped to a valid memory location.)

If the code you are debugging is executing out of write-able memory, then your debugger can support an effectively infinite list of breakpoints by writing some instruction that causes an exception to be thrown anywhere you want a break, then handling that exception by looking up the faulting instruction address in your breakpoint list.

Most processors also give you hardware data access breakpoints as well. These typically sit on the CPU's data bus interface and can fire when the address or data that's about to hit the bus match the respective breakpoint registers. There is usually an option to only trap reads, writes, or both. Sometimes you get interesting things like a mask register that lets you trap on a whole block of memory.

One of the most interesting hardware debug features of modern processors is the branch trace which keeps a running list of the last N branch instructions that lets you reconstruct a "how did we get here" story.

MrBuddyCasino · on Nov 24, 2014

Do you know if VMs like the JVM depend on hardware features also? I suspect that would be more an optimization, not a necessity though, since they should be able to de-optimize jitted native code on stack frames containing breakpoints?

tryp · on Nov 11, 2014

To be a bit more explicit, using a separate httpd and application server allows a division of labor between the resource-bound task of handling the request + building the response from the network-bound task of dibbling bytes back to the original requestor.

Nginx (and the general class of highly concurrent servers) is good at handling lots of connections largely because it tries to minimize the resources (memory, process scheduler time, etc) required to manage each connection as it slowly feeds the result down the wire.

The application server generally wants an instance per CPU so that it can hurry up and crank through a memory-, cpu-, or database-hungry calculation in as few microseconds as possible, hand the resulting data back to the webserver and proceed to put the memory, DB, and CPU to the task of processing the next request.

This is in contrast to the (simplified here) old-school CGI way that say ancient Apache would receive a request, then fork off a copy of PHP or Perl for each one, letting the app get blocked by writing to the stdio pipe to Apache then Apache to the requesting socket. All the while maintaining a full OS process for each request in play.

tryp · on Oct 17, 2014

dmesg output mentions Armada XP pinctrl and xor engine too.