Ask HN: Is big-endian dead?

rdtsc · on Jan 19, 2018

As long as we use the traditional network protocols and socket API it's not "dead" dead. The other name for big-endian is "network order", after all.

As way to serialize data (wire / disk format) it's becoming more common. FlatBuffers and Cap'n'Proto are the popular ones. They reduce (completely eliminate?) byte shuffling when de-serializing.

In one instance I was reading a spec for an industry-specific protocol. At first I was looking at it and thinking "What the... they are padding stuff in a strange way, and using little-endian for the data". Then it suddenly dawned on me, that they've designed the spec to be whatever GCC on x86-64 Linux machine would do to layout C the structures in memory.

So someone very lazy could just define a struct and cast into it as data comes from the wire. Someone one a big endian machine, would have to do a lot more legwork to get the thing working. But given that there aren't many of those around, it was deemed an acceptable tradeoff.

scott00 · on Jan 19, 2018

Using the x86-64 C struct layout as the serialization format is increasingly common in financial trading protocols. Skipping all the bit twiddling has a huge performance impact at high message rates. I once compared two feeds covering the same basic data, where the major difference was one had a complicated serialization and one was C struct layout. The former needed two 16 core servers just to process I/O and deserialize the messages; the latter used 1 core on a single server.

usefulcat · on Jan 20, 2018

"The former needed two 16 core servers just to process I/O and deserialize the messages; the latter used 1 core on a single server."

I'm genuinely curious; what was the 'former' protocol? Was it encoded using FAST or zlib or something?

twic · on Jan 20, 2018

I would guess FIX/FAST. For a concrete example, have a read of the specifications for Eurex's market data protocols:

http://www.eurexchange.com/exchange-en/technology/t7/system-...

Specifically, compare the Enhanced Market Data Interface protocol (described in "T7 Market and Reference Data Interfaces") and the Enhanced Order Book Interface protocol. EMDI is FAST, a morass of tags, stop bits, presence bitmaps, and who knows what else (i don't). EOBI is structs.

scott00 · on Jan 20, 2018

I was consuming the feed in question via an API provided by the vendor and didn't have direct knowledge of the wire protocol, but based on how it was described to me during the sales process it was probably either FAST or a proprietary protocol with a similar design.

ycombobreaker · on Jan 20, 2018

Regarding bring lazy on the serialization... yeah packed structures make sense. But excess padding is still wasteful, and endianness is only a bswap away, which is 1-2 cycles apiece and so won't exactly break the latency budget (per Agner Fog's wonderful pdf, looking at Haswell).

jnordwick · on Jan 20, 2018

It might only take a cycle to swab, but the big hit is probably the data dependency you introduce.

yuhong · on Jan 20, 2018

Which also happens to be why we used binary document formats first then moved to XML too.

krylon · on Jan 20, 2018

> they've designed the spec to be whatever GCC on x86-64 Linux machine would do to layout C the structures in memory.

In a previous job, the software I worked on had a "file format" that just consisted of dumping a raw array of structs to disk. It was obviously pretty fast, but it made cringe a lot.

What's worse, the compiler had flags to control if and how much struct padding to use, so reading a data file with a binary compiled with different padding from the one that created that file caused it to crash. Fun times...

userbinator · on Jan 20, 2018

The ideal is probably no padding --- on x86, unaligned accesses basically need no extra cycles[1] and if it means structures shrink and reduce cache misses, could actually be better.

[1] Unless you happen to access a field spanning two cache lines and miss, in which case the padded version would require accessing that second cacheilne anyway.

krylon · on Jan 20, 2018

Yes!

Unfortunately, the default for the compiler we used[1] to align struct members on DWORD (32-bit) boundaries, for some technical reason I am sure was totally reasonable. And these guys all built their code with a special make-variable that appended the "don't-pad-structs-I-really-know-what-I-am-doing" flag to the compiler's command line.

I did read somewhere, that x86 (at least Pentium III and later) like to make memory reads from addresses that are a multiple of four. But the profiling data I was able to gather showed that that part of the application had a negligible impact on overall performance. Since the vague job description I had gotten said my job was to "make things faster", I decided not to look into this any further.

[1] OpenWatcom (http://www.openwatcom.org/)

apaprocki · on Jan 19, 2018

The same was done "back in the day". On our systems, we had a reliable transport protocol (precursor to TCP/IP) that took advantage of the fact that the network and hardware was all BE. Everyone tried to squeeze every last bit of performance out of the machines and avoiding the byte shuffling was not an insignificant performance gain. It's not surprising that people are now doing the same thing with x86-64 given its dominance.

tptacek · on Jan 20, 2018

I trained myself to stop doing that (casting struct pointers over network data) after porting C code to an Alpha and getting bus errors from the unaligned accesses. And now I have to untrain myself?

gopalv · on Jan 20, 2018

> Someone one a big endian machine, would have to do a lot more legwork to get the thing working. But given that there aren't many of those around, it was deemed an acceptable tradeoff.

Even PowerPC64 has a little-endian mode - PPC64LE.

The machines are switching over because the expensive part is turning out to be software which is already working & the costs associated with migrating it over to a different endian system.

civility · on Jan 20, 2018

> The other name for big-endian is "network order", after all.

I wonder where this started... I always suspected it was Sun with their "The Network is the Computer" motto, and of course they would define "network byte order" as what they used.

Today, it drives me crazy that we're constantly swapping before sending across the network and then swapping again when it's received.

twic · on Jan 20, 2018

This may be the best ensemble of answers you'll get to that question:

https://retrocomputing.stackexchange.com/questions/2652/when...

The internet was big-endian from the start, and that was probably because DEC machines were big-endian.

tjalfi · on Jan 20, 2018

DEC's PDP-11, VAX, and Alpha were all little-endian. Alpha was biendian but Cray Research was the only vendor that shipped a big endian configuration.

collinmanderson · on Jan 20, 2018

QUIC also uses little-endian

Gibbon1 · on Jan 19, 2018

> As long as we use the traditional network protocols

I'm actually pissed the new LoRawan spec is big indian.

What saw that my only thought was 'dicks'

jacquesm · on Jan 19, 2018

Try upgrading your firmware.

hsivonen · on Jan 20, 2018

Big endian is dead on the client. WebGL exposes endianness and happened at a time when almost all systems running browsers are little-endian. This makes the big-endian provisions of the spec a dead letter. Web devs don't need to test on big-endian systems and would have a hard time finding big-endian systems to test with even if they wanted to. It's safe to assume, therefore, that there's a mass of Web content that only works if the browser exposes little-endian behavior to JS. It's not performance-wise competitive for a WebGL system to present little-endian behavior on big-endian hardware.

Therefore, it won't be feasible to try to re-introduce big endian to systems that need to be competitive at rendering the Web.

Big endian will stay alive for the time being on home routers and Sparc servers, and probably for a long time on IBM z systems.

The cost–benefit of software accommodating big-endian systems will look increasingly bad with legacy enterprise servers imposing a negative externality on everyone else. (E.g. refusal to consider bitcasts between SIMD vectors of the same bit width but different number of lanes as portable/safe operation in language design.)

classichasclass · on Jan 20, 2018

You can not only assume it, but I can also say with certainty it exists. Emscripten hastened this by letting site owners compile little-endian code into little-endian asm.js. For example, the WhatsApp Web QR code generator won't run on a BE platform; it's an Emscriptenized blob of code backing it that assumes the typed array it stores into has LE orientation.

TenFourFox and Leopard Webkit both emulate little-endian typed arrays on BE Power Macs and transparently byteswap integers. This mostly works for sites that care and has almost no overhead in JITted code for 16 and 32-bit int, but Facebook managed to break some of our assumptions with floats (example: https://github.com/classilla/tenfourfox/issues/453 -- we don't byteswap floats because it's expensive and probably would break things expecting native endian on the DOM side). Fortunately, TenFourFox doesn't support WebGL, so we don't need to worry about that.

IMHO, this is a broken promise about how the Web was supposed to be platform-independent. I get the feeling the general attitude on this thread is that such platforms are unimportant, but that doesn't mean it's not a broken promise.

userbinator · on Jan 20, 2018

IMHO, this is a broken promise about how the Web was supposed to be platform-independent. I get the feeling the general attitude on this thread is that such platforms are unimportant, but that doesn't mean it's not a broken promise.

I'm pretty sure many other fundamental protocols of the Web don't work so well on a non-2's-complement, non-8-bit-byte machine either.

classichasclass · on Jan 20, 2018

Ignoring whether asm.js is a fundamental protocol or not, the Web wasn't supposed to depend desperately on the byte orientation of the consumer. Surely you remember how it was supposed to lift information access above what type of computer you viewed it on.

WebGL's endian specification was the beginning of the end, after which folks started saying certain implementational details weren't worth caring about, and this just finishes it off. And that's not what was being sold in the beginning.

kazinator · on Jan 19, 2018

The "casting" advantage basically just hides bugs. Because it's often not okay to access just 16 or 8 bits of a 32 bit number; the value is being truncated and that could be a bug.

However, when the number's range is small enough, such code appears to work. Finding the bug is delayed until larger values occur, which is not until its deployed in the field or someone thinks of writing better test cases.

Anyway, if it is okay to truncate, the right thing to do (and ISO C conforming) is to access the underlying type as that type, and then cast the result, not to cast the pointer. I.e. rather than:

    int x = *(int *) ptr64; // breaks on BE; not kosher with ISO C aliasing rules

this:

    int x = (int) *ptr64;  // works on LE and BE

We don't even need the casting operator in the second example. So it's less syntax and clearer code to do the more portable thing.

If we want the bits for that int from some other part of the wider word, we can use shifting and masking.

arnsholt · on Jan 19, 2018

I ran into a bug in that area a while back. Our development environment has a binding to Winsock32.dll for socket stuff, and as it turns out the bindings were written so long ago that many of the return values are specced to be 16-bit ints. As a result, our application would work fine almost all of the time, but fail mysteriously with "socket operation on non-socket" after about 8000 sockets have been opened. As I understand it 8000 is because file descriptors are a kind of bump-the-pointer scheme internally and pointers are 8 bytes wide on 64-bit platform.

ajross · on Jan 19, 2018

Little endian is also better for multi-word addition/subtraction. Your first operand will be the low word, which your pointer conveniently already indexes. This was a classic trick in the days of 8 and 16 bit systems, and assembly code for modern modular crypto systems still uses it.

Really I think consensus is that the industry focus on big endian systems was actually a mistake. It prioritized programmer comfort in a regime where it never really mattered.

userbinator · on Jan 20, 2018

Yes, LE is actually the more logical and elegant one, since bit n has weight 2^n, and byte n has a weight 256^n. The BE equivalents introduce an extra length- term into the equation (length-n).

I've always found LE vs BE to be somewhat like 0- vs 1-based array indices; one is more "conventional" to a human, but the other doesn't require a lot of contortions when expressed algebraically or algorithmically.

(With 0-based indices, element i is at address base + i * size; with 1-based indices, it's base + (i-1) * size, and it only gets worse from there as you add dimensions and other calculations.)

TheOtherDave · on Jan 20, 2018

Or you could just decrement base.

PeCaN · on Jan 20, 2018

This also conveniently allows you to store the length at the "real" base.

    0 array length <--- base pointer
    1 element 1    <--- array[1] == *(base + 1)
    2 element 2    <--- array[2] == *(base + 2)

benaadams · on Jan 20, 2018

0-based offsets

kazinator · on Jan 19, 2018

OTOH, big-endian is better for comparison; you can tell when one integer is larger than another by looking at as little as just one bit, from the most significant end.

ajross · on Jan 19, 2018

That's only an scalar code optimization though. You can still do comparison in constant time on little endian data, you just lack the ability to early-exit from the loop. In the modern world of SIMD data, that optimization often doesn't exist. Fair enough point though.

IshKebab · on Jan 19, 2018

But comparison is done in registers which don't have a byte order. Unless you mean doing something like 256bit_a > 256bit_b which I would think it's a very rare operation. Good point though.

garmaine · on Jan 20, 2018

To follow up on this, I recently implemented a packed vector format. As a LE advocate I surprised myself by ordering the packing in BE order — because that let me simply byte-compare vector encodings to compare the underlying vectors, which was a huge performance increase.

Data structures which are frequently compared, e.g. to be sorted, benefit from BE ordering. Just binary compare them as byte strings, usually an efficient inline operation.

If I were implementing a packed wire format for SQL data, for example, I’d probably do it in BE order.

amelius · on Jan 19, 2018

In the worst case, comparison is still like subtraction.

kazinator · on Jan 20, 2018

Even if the whole datum is examined, comparison is more parallelizable; it doesn't require carry propagation.

jcl · on Jan 20, 2018

Now that I think about it, most manual arithmetic would also be easier if we described numbers with the least-significant digits first. If you're doing sums in your head, you could just list out the answer's digits as you calculate them, rather than needing to remember the list of digits and reverse them at the end.

Leftium · on Jan 21, 2018

Alternate methods of manual arithmetic that calculate the most significant digits first eliminate the need to remember and reverse digits.

With practice it can be faster than entering on a calculator!

tlb · on Jan 19, 2018

I used to use IETF's network order (big-endian) for everything on disk or over the wire, but I switched to little-endian around 2008. If I ever have to port to a big-endian machine I'll deal with it then. I haven't regretted it yet.

api · on Jan 19, 2018

Big-endian is the canonical form for storage and the wire because back when the Internet was designed most "pro" machines were big-endian: Sparc, old MIPS, old PPC, DEC Alpha, etc. All these are dead or dying now.

It's easy enough to deal with BE files and protocols by just swapping bytes. I'm referring to hardware architectures. Are there still any big-endian chips out there? Does it still make sense to support big-endian chips in new code as long as that code is never intended for embedded or very old systems?

tlb · on Jan 19, 2018

Swapping bytes is a huge pain in the butt! When reading large binary files, it's so convenient and efficient to be able to mmap and make struct pointers right into the file. You can even deliver those files to web browsers and use JS's TypedArray to get random access into them.

(That requires a bit more than simply little-endian. It requires struct alignment and floating point formats to be the same. But with only a tiny bit of care it can be.)

For new code running in-core in 2018, I think little-endian is quite safe.

bitwize · on Jan 19, 2018

> Swapping bytes is a huge pain in the butt! When reading large binary files, it's so convenient and efficient to be able to mmap and make struct pointers right into the file.

No no no don't do this don't do this don't do this. This is how horrors and abominations like .doc, .xls, .psd happen.

The correct way to handle binary data is to unpack it into the struct byte by byte. The reason for this is that when you define a struct in C(++), there are not just endianness issues, but implementation-dependent issues of padding and alignment you have to consider. Recently, most compilers on most architectures standardized on self-alignment rules for all primitive types except char, but this is not guaranteed by the standard, it will bite you in the ass when you least expect it, and it will be decades yet before all the C code in the world is displaced by a single-implementation language like Rust.

The best way to write portable code that will work as intended is to treat all data coming from disk or over the wire as a bag of bytes, and not attempt to alias it to a struct.

Const-me · on Jan 20, 2018

I have to disagree with your “No no no”. It all depends on requirements.

When you’re on a PC, and working with files that you can reasonably expect will be exchanged (like doc, xls or pdf) — then your “No no no” heuristic is absolutely correct. As you pointed out, differences between compilers and between e.g. x86/amd64 are very likely to render these formats incompatible.

But when you’re working with files that only your software will access, unpacking it byte by byte will slow down IO by a huge factor compared to both mmap, and read/write of large blocks (the latter will likely translate to DMA i.e. the CPU will be free to do something else). When that’s the case (like for most videogames, even for PC ones), you don’t want to pay that performance penalty, you want to get that data in the memory ASAP.

phkahler · on Jan 20, 2018

>> It all depends on requirements.

Requirements change.

Qwertious · on Jan 20, 2018

Requirements can change, sometimes unexpectedly. But if you "know" for a fact that they won't, then doing the hypothetically more-portable thing at the expense of today's performance, is just plain overengineering.

jsjohnst · on Jan 20, 2018

You never know something like that “as a fact”.

bitwize · on Jan 20, 2018

> But when you’re working with files that only your software will access, unpacking it byte by byte will slow down IO by a huge factor compared to both mmap, and read/write of large blocks (the latter will likely translate to DMA i.e. the CPU will be free to do something else).

If:

* you're working with files that only your software will access AND

* you control, and understand, the CPU architecture of any machine that will touch that data structure and you KNOW that it will never change for the entire lifetime of that piece of data AND

* you always use the same version, or an ABI-compatible future version, of the same compiler that you KNOW will always lay out data the same way and you know what that way is AND

* you either don't need to touch this data structure in other programming languages, or you KNOW that this won't be an issue (for example because your programming language implementation was written in C, compiled against the same version of the compiler, and has an FFI that understands C structs)

THEN, you may proceed to mmap structs into memory. In practice these constraints are fairly commonplace; for example, NetBSD wscons device drivers report HID events which are specified in a struct, and because the HID events are likely never to leave the originating machine, only be passed from kernel space to user space, it makes sense to simply read(2) them straight into the struct.

And there's certainly nothing wrong with snarfing a large file into a char[], but when it comes time to extract meaning from it, unless you are absolutely sure that the underlying assumptions regarding data layout will never ever change, it will only cause marginal harm to do everything in byte offsets and shift and OR bytes to yield final values -- and this is the best portable way to access the data therein, agnostic of details about the compiler and CPU architecture.

In general it's a good idea to err on the side of caution by default, profile, and then optimize the hot paths as necessary bearing the constraints you're assuming in mind (and perhaps documenting them for good measure).

Maybe my initial statement was too strong, but I shudder whenever I see comments of the form "It's so convenient to just mmap() that sucker into a struct and access the fields!" Because they've never had to deal with the consequences of trying to access a struct from a Microsoft compiler serialized to disk, and finding out that GNU compilers have a quite different notion of how structure members are to be laid out in memory...

Const-me · on Jan 21, 2018

> In general it's a good idea to err on the side of caution by default, profile, and then optimize the hot paths as necessary bearing the constraints you're assuming in mind

In general, changing data formats to something incompatible is one of the most expensive changes you can possibly make to your software. If you’re not sure write a prototype and profile. Implementing knowingly inefficient data format for a performance-critical application isn’t a good idea.

> it will only cause marginal harm to do everything in byte offsets and shift and OR bytes to yield final values

Huge harm, in both runtime performance, and code size & complexity.

P.S. For applications where portability matters and you’re OK paying performance cost of that, the industry has moved towards XML based formats. Not only it fixes byte order issues, also text encodings, globalization, it’s human readable, and it’s fast enough for many practical applications. E.g. doc/xls that you’ve mentioned are deprecated by docx/xlsx, the latter are XML based.

speleo_engr · on Jan 19, 2018

I totally agree with you and so does Rob Pike: https://commandcenter.blogspot.com/2012/04/byte-order-fallac...

pipio21 · on Jan 20, 2018

This makes no sense in lots of applications. Today computers have hardware acceleration for moving chunks of data. Big chucks of data, unpack things byte by byte makes not sense on things that have to be efficient. It is like 20.000 times more efficient to do things with blocks.

If you move text around you probably don't care about efficiency and can do it. If you care about efficiency and price(using cheap components) it is a very bad solution.

We use in all our serial-deserial a very fine grained controlled C library and it has been working flawlessly for years. This library is interfaced with C++, java, objC, swift,clojure, rust...

api · on Jan 19, 2018

Most compilers have directives for packing structs. What the OP suggests is something that can be done for high performance, but don't expect it to be portable outside of common architectures and compilers. I'd consider it something that shouldn't be done unless you really need the simplicity or performance.

IshKebab · on Jan 19, 2018

No, dumping internal representations to disk is orthogonal to zero-static data formats. You can have a well defined format that doesn't require byte-by-byte parsing.

Look up FlatBuffers and CapnProto.

bitwize · on Jan 19, 2018

Both of those appear to work by deferring the parsing step to access time. You're still treating the thing as a bag of bytes and unpacking stuff out of it bytewise.

kentonv · on Jan 20, 2018

That's incorrect. When you load an int64 field from a Cap'n Proto field, you are doing a 64-bit load instruction directly from the source bytes. You are not doing byte-by-byte access nor any sort of translation or "parsing".

Cap'n Proto works by laying out data structures like a C compiler would, but following consistent, portable rules so that the layout is the same on all platforms. It then generates inline-able accessor functions to manipulate these structures which do pointer arithmetic similar to what a compiler would generate for accessing a struct. The end result is that accessing primitive fields from a Cap'n Proto struct is essentially identical in terms of machine instructions to accessing fields of a C struct.

To call that "deferring the parsing to access time" does not make sense.

(Note that for pointers, there are more differences: namely, because data will not always be loaded at the same address, pointers need to be relative rather than absolute. They also need to be bounds-checked for security. This adds a few instructions to pointer accesses, but those instructions look nothing like traditional "parsing" and certainly aren't byte-by-byte operations.)

Your statement earlier:

> The correct way to handle binary data is to unpack it into the struct byte by byte.

This is inaccurate. Byte-by-byte parsing is a valid way to do parsing but not the only way. Byte-by-byte parsers tend to be slow and -- arguably, more importantly -- overly complex and rigid. It is, for example, usually very hard to do "random access" with a byte-by-byte parser, because allowing out-of-order parsing tends to blow the code complexity through the roof.

On the other hand, with Cap'n Proto and similar approaches, you can trivially mmap() a very large data structure and traverse it randomly, and it "just works".

(Disclosure: I'm the author of Cap'n Proto, as well as the author of the first open source release of Google's Protocol Buffers, which does byte-by-byte binary parsing.)

kuschku · on Jan 20, 2018

> The end result is that accessing primitive fields from a Cap'n Proto struct is essentially identical in terms of machine instructions to accessing fields of a C struct.

As long as the Cap'n Proto or ProtoBuf data is properly aligned within of your custom file format, or you might end up with unusual slowness: https://blogs.msdn.microsoft.com/oldnewthing/20150116-00/?p=...

kentonv · on Jan 20, 2018

Yes, Cap'n Proto is careful to require that the data is aligned.

(Protobuf, on the other hand, fundamentally doesn't allow for multi-byte loads in the first place since integers use variable-width encoding, so alignment is irrelevant there.)

kuschku · on Jan 20, 2018

Yes. For raw cap'n'proto messages. But when writong custom file formats you might end up having cap'n'proto data embedded at an unaligned offset.

kentonv · on Jan 20, 2018

Sure, when layering on top of a non-zero-copy serialization, you may be forced to do a memcpy() of your data upfront to get alignment.

bitwize · on Jan 20, 2018

Now that Cap'n Proto exists, why would you want to handle binary data any other way? Simply standardize on Cap'n Proto across your entire application, and problem solved :)

kuschku · on Jan 20, 2018

Baaad idea. Quassel standardized years ago on Qt’s data serialization everywhere, and it’s become a major issue now.

bsder · on Jan 20, 2018

Story time, please.

kuschku · on Jan 20, 2018

I only started contributing in '14, so most of what I heard is second hand information, this is all simpkified, and some parts will likely be wrong. It's also all my own opinion, I'm not representing any project here.

Basically, a decade ago a student started to work with some friends on an IRC client that integrates with a custom bouncer. Being a prototype, they just used Qt's serialization protocol between them. Over the years the project grew, at some point nokia funded development, it became Kubuntu's default IRC client (but only in the client+bouncer in one binary version, so without all the advantages), nokia was bought and closed and the department sold to BMW, and at some point.

Now, people tried writing third-party clients for this. And this became a minor issue, because the protocol was never documented. In favt, Qt's serialization was used for storing configs on disk, some blobs in the database, and over the network. It was later wrapped in TLS and deflate, and even at some point array-of-struct was turned into struct-of-array for the pattern during initialisation.

Either way, someone tried writing a mobile client for it, decided the protocol was insane, and instead built his own, almost identical client/bouncer system with an Erlang backend and json as protocol. This grew, and became IRCCloud.

Now, other people again tried developing third party clients for quassel. An android client was developed, but development was messy, and over the years, it stalled, because they reverse engineered the protocol, semi-successfully, and at some point didn't have enough time left, and gave up.

People reversed the protocol partially over the years again for pyquassel and quasselc/quasselbots/quassel-irssi.

Around then, another person tried reversing the protocol, and reimplementing it in JS for a webclient, which after a while became quassel-webserver.

Back then there was a lot of talk about replacing the protocol, but it was never done yet.

I, a user of irccloud around then, was annoyed with the costs (still being in high school myself, I couldn't afford the 4$/month, and the free tier wasn't enough), so I out for alternatives, and found quassel. But I hated the looks of Quasseldroid, so I forked it, and started working on the UI, and on features by reversing the protocol yet again. Not having actually programmed anything except for some delphi and VB.NET stuff, a tiny java project and one C# Windows Phone 7 app, my code was the worst, ever. Seriously, it was bad. After a while, discussion came about about turning this code into a PR, so I threw it all away, and rewrote it again, still bad, but it worked. This was merged, I became maintainer of Quasseldroid (because no one else was working on it anymore), and then, around 2015, I reversed the protocol, read the entire source of every implementation, wrote it all down on paper, studied every file format and every quirks and then I rewrote quasseldroid from scratch, in about 3 months, with every feature of the desktop version. I called this The Next Generation of Quasseldroid, jokingly quasseldroid TNG or later quasseldroid-ng.

A few weeks later, a new Android version came out, introducing Doze, and breaking everything about quasseldroid-ng. And so I rewrote it again, and before release, a new Android version broke it all again.

And that basically repeated, until I decided that it can't continue like this, and if we'll do major changes to the protocol, we might actually get a working version for Android that lasts longer than a few weeks in Beta before Google breaks it. So I learnt C++, and started contributing.

And that is basically my view of the story. Reverse engineering a protocol again and again, seeing variations of variations of the same protocol everywhere used, never properly documented. The Qt documentation is entirely different from what Qt actually puts on the wire. Blobs in the database.

But it also means backwards compatibility in the desktop client/core for every version from the past 10 years, and backwards compatibility on Android for every version of the past 6 years.

And now, maybe, I'll be able to help replace this, bit by bit. After the improvements to tge protocol that added major performance benefits, the next part is replacing the bouncer-side config format entirely, so I can properly containerise it.

TL;DR: no matter how good the support for a non-standard binary serialization format in your favourite language is, 10 years down the road people will reuse your protocol in a dozen more languages, and they'll have to reverse the protocol themselves, and will do a semi-good job at it, and because you never thought about backwards compatibility you now have a mess (we had luck because 99% of what we transmitted were key/value maps, and when reading we always used default values if the key didn't exist and ignored unused keys. Sometimes we did serialize structs, basically, though, and those places still cause me headaches today, and require workarounds to add features, e.g. the latest sendermode implementations)

kripke · on Jan 20, 2018

This is interesting, and thanks for trying to clean up the protocol, but isn't this an orthogonal issue? The problems you are describing seem to be related to using an undocumented protocol, which is unrelated to using a custom serialization format vs building upon an existing one.

Building upon an existing, well-documented, and relatively sane serialization format (protobuf, capn't proto, message pack, json, heck even bencode for all I care) is usually a good thing, and so is decoupling the messages from the details of an implementation's internals. Language and framework internal serializers (such as Python's pickle or, apparently, Qt's serializer) tend to make it harder to achieve both goals.

kuschku · on Jan 20, 2018

> Building upon an existing, well-documented, and relatively sane serialization format

The problem with that is that whatever format seems well-documented and relatively sane today might become an obscure, unknown protocol 10 years down the road.

kentonv · on Jan 21, 2018

FWIW, Protobuf has now been open source for a decade and has been used for basically everything inside Google since about the turn of the century. Protobuf predates JSON, and I would wager that, worldwide, much more data is stored in Protobuf format and many more cycles are spent parsing Protobuf format than JSON. For Protobuf to die out, Google itself would have to die, as would quite a few other companies that heavily rely on it. It doesn't seem likely to happen any time soon.

I unfortunately am not in a position to make such strong statements about Cap'n Proto. However, implementations exist in C++, Java, JavaScript, Rust, Go, Python, and a bunch of other languages, so it should at least be much easier to deal with than Qt serialization.

(Disclosure: I'm the author of Cap'n Proto and of the first open source release of Protobuf.)

nostrademons · on Jan 23, 2018

To be fair, DEC was once in the same position as Google; in fact, by employee count, it was twice as big (140k vs. 70k) and by market share of the whole computing market (you could speak of a "computing market" back then), it was significantly larger. In the mid-80s, the idea that a VAX might be supplanted by a massive worldwide computation network of billions of computing devices would've seemed like science fiction. (Note that at its peak, Digital had only sold 400,000 VAX.) You could be fairly confident that storing your data in the OpenVMS filesystem would be fairly future-proof.

When was the last time you saw a filename of the form NODE"accountname password"::device:[directory.subdirectory]filename.type;ver?

zerokernel · on Jan 20, 2018

> Byte-by-byte parsing is a valid way to do parsing but not the only way. Byte-by-byte parsers tend to be slow and -- arguably, more importantly -- overly complex and rigid. It is, for example, usually very hard to do "random access" with a byte-by-byte parser, because allowing out-of-order parsing tends to blow the code complexity through the roof.

I have to agree here by experiences past. If the format in question has a chance of being performance sensitive, don't use FSM-based encodings [1]. It is inordinately difficult to optimize parsing these encodings even if you only have to handle tiny subsets, and it still won't be fast. A format like msgpack which prides itself on being very fast may be fast compared to JSON and other ways to express essentially arbitrary structures, but is DEAD SLOW compared to any direct encoding (be it a dedicated encoding you developed in literally a few hours or something like capnproto).

[1] Obviously, considering an encoding more complex than FSM means that you're an idiot and your application will almost certainly have security vulnerabilities related to the format in the future.

bitwize · on Jan 20, 2018

kentonv introduced the term 'parsing' into the discussion, not me. Originally I wasn't talking about parsing as such, just being explicit about the byte-offset, length, and ordering of any piece of data you fetch or store by doing (ptr[n] << 24) | (ptr[n+1] << 16) | (ptr[n+2] << 8) | ptr[n+3], or the corresponding write operation, if you're working with a chunk of data that came from, or is destined for, a file or the network. And if for whatever reason you want or need to work with structs, don't try to alias them onto the disk or network-bound bits. FSMs don't even come into it. It's just a matter of being a little more careful than mmap()ing into a C struct and hoping for the best.

bitwize · on Jan 20, 2018

> When you load an int64 field from a Cap'n Proto field, you are doing a 64-bit load instruction directly from the source bytes. You are not doing byte-by-byte access nor any sort of translation or "parsing".

Assuming little-endian CPU arch. It's followed by a byte reorder on big-endian architectures. (And you assume all Windows instances are little-endian, which probably-is-but-may-not-be the case.) You made the decision to optimize for what you consider the common case, but it does not generalize without added translation code to all cases. You may have hidden the translation code behind CapnProto's generated accessors, but CapnProto's structs are translation-free the way AWS Lambda is "serverless".

> To call that "deferring the parsing to access time" does not make sense.

Except you are deferring translation work (like byte reordering) to access time. Either that or you're hiding it in the serialization APIs. Again, it's like "serverless" computing: just because you've hidden it doesn't mean it's gone away.

> Byte-by-byte parsing is a valid way to do parsing but not the only way. Byte-by-byte parsers tend to be slow and -- arguably, more importantly -- overly complex and rigid.

There's a performance cost, but hopefully you're only doing serialization/deserialization when you intend to hit the disk or wire to read/write into/out of your struct. All in-memory processing happens in whatever endianness and alignment makes your CPU and compiler happy.

There's nothing "overly complex and rigid" about understanding a binary format as a bag of bytes and fetching scalar values (including offsets into the data structure) from it accordingly. This is how shit gets done when it comes to portably handling arbitrary binary formats. CapnProto can score a few wins by assuming things about the target CPU/compiler, restricting the binary format to conform to some of those assumptions, and papering over the rest with code hidden behind some of its APIs. But it's not a general solution to the problem of extracting meaning from an arbitrary hunk of bytes that may or may not have come from a CapnProto-conformant application.

kentonv · on Jan 20, 2018

> Assuming little-endian CPU arch. It's followed by a byte reorder on big-endian architectures.

Basically all common CPUs are LE.

(And basically all BE CPUs have dedicated instructions for reading LE data. It's true I haven't yet added the inline assembly to use those instructions in Cap'n Proto's reference implementation, but that's only because no one actually cares about these architectures.)

So in basically all real use, there's no machine-instruction-level difference between accessing a field of a capnp struct and accessing a field of a C struct. If the instructions are identical, then how can you say one is "deferred parsing" and the other isn't? What meaning does any such distinction have?

> There's a performance cost, but hopefully you're only doing serialization/deserialization when you intend to hit the disk or wire to read/write into/out of your struct.

Disk is usually cached, meaning it's already in physical memory and you're wasting time making a copy rather than using the data in-place.

Over the network, within a datacenter, bandwidth is basically infinite (in that your CPU probably can't process bytes as fast as your network interface can). Time spent serializing and parsing is very real and wasteful. I've seen servers spending 30% or more of their CPU time parsing protobufs.

Over the long-haul internet, perhaps the CPU time spent parsing/serializing is not as relevant compared to the time spent transmitting. Still, I'd rather spend my CPU cycles elsewhere -- like in a dedicated compression algorithm -- rather than twiddling bytes needlessly in a parser.

> handling arbitrary binary formats

Yes, we all agree that some binary formats can't be handled any other way. But if you're in control of the format you use, then you can design it in a way that doesn't require a "bag of bytes" model, and your code can be much simpler and more adaptable as a result.

zerokernel · on Jan 20, 2018

> but that's only because no one actually cares about these architectures.

This. Optimizing performance on BE is a waste of everyone's time and resources.

classichasclass · on Jan 20, 2018

Except, of course, for those small number of us who primarily do run on a BE platform. I don't mind doing this work, just let us do it.

kentonv · on Jan 20, 2018

FWIW I'd be very happy to accept patches adding the appropriate platform-specific inline assembly around here: https://github.com/capnproto/capnproto/blob/master/c++/src/c...

(But the status quo on BE is that it does a load followed by a byte swap, which is probably pretty cheap anyway. The compiler might even already know how to optimize that into the appropriate LE-load instruction.)

api · on Jan 19, 2018

Yes, and I think it follows that new protocols and new file formats should be little-endian if possible.

drewg123 · on Jan 19, 2018

Small pedantic correction from a former Alpha kernel hacker: DEC Alpha was Big Endian only for some rare Cray systems. Everything else was Little Endian.

macintux · on Jan 19, 2018

Many, many years ago I worked for Progeny, and we contracted with HP to help bring Debian to Itanium.

We quickly discovered that a fair bit of Linux software (or at least Debian packaging of same) only worked properly on 64-bit DEC Alpha because the architecture was Little Endian; 32-bit pointers would typically "just work(™)" because the code was grabbing the first 32 bits, and the other half of the pointer addresses was typically 0x0.

Since we were porting to Big Endian hardware (just now discovered Itanium was selectable, not sure why it was Big Endian in our case) we were getting a fair bit of null pointer exceptions.

Anyway, ancient history, but that was my only direct exposure to endian-madness, so it has stuck with me.

zlynx · on Jan 20, 2018

HP wanted to replace PA-RISC and that was big-endian. Intel of course wanted little-endian. That's why Itanium was selectable.

lazyjones · on Jan 19, 2018

DECstations with MIPS CPU were also little-endian. Reason in both cases was probably VAX compatibility.

mattst88 · on Jan 19, 2018

I'm interested to know what you worked on on Alpha.

drewg123 · on Jan 20, 2018

I first did a Myrinet driver for DEC OSF/1.

I then worked on the FreeBSD port to DEC alpha. I helped with initial bringup and all the various issues around that. Then did a lot of the platform support, and the alpha-specific part of the Linux ABI compat layer, as well as the DEC OSF/1 binary compat layer. I ran a box on my desktop running FreeBSD/alpha as my primary workstation for years (API UP1000).

gumby · on Jan 19, 2018

FWIW every one of the machines you cited (“Sparc, old MIPS, old PPC, DEC Alpha, etc”) post dated the formation of the Internet (the arpanet transitioning to TCP); the Internet protocols just followed existing arpanet practice. Which was due to big-endian processors being common, but the dominant networked machine of that era was the PDP-10.

astrodust · on Jan 19, 2018

The PDP successor, the VAX, was strangely little-endian.

gumby · on Jan 20, 2018

PDP-11 had a little endian architecture while big endian machines like the PDP-6/PDP-10 predominated on the net.

To make life more interesting, the later PDP-10/20 mainframes used PDP-11 minicomputers as front end processors and often as network processors so byte-swapping was the norm. Luckily the PDP-10 allowed bytes ranging from 1-36 bits wide - “byte” had not yet standardized on 8 bits

dasmoth · on Jan 19, 2018

Not that strange — it was really the successor of the PDP-11, which was a quite different architecture.

watmough · on Jan 19, 2018

Could Dave Cutler have foreseen ... nahhh!

microtherion · on Jan 20, 2018

> back when the Internet was designed most "pro" machines were big-endian: Sparc, old MIPS, old PPC, DEC Alpha, etc

You kids sure know how to make someone feel really old... Much of the internet design predates the 68000, which in turn predates the RISC architectures you cite by half a decade and more.

apaprocki · on Jan 19, 2018

Just a small nit: the CPUs are all still actively developed and produced. For how much longer is anyone's guess. You shouldn't think of them as "very old systems". I run modern code happily on 64 CPU 4.3ghz POWER8 machines :)

kazinator · on Jan 19, 2018

Big-endian means that am ultra-high-speed routing fabric can receive the most significant part of an address address and possibly start making a routing decision before the other bytes of the frame have arrived.

Hands down no brainer.

Even the fact that headers come before payloads is a kind of "big endian", as is the fact that important information tends to occur earlier in headers. Look at a basic Ethernet frame. The destination address is first, so before even knowing the source address, or anything else, the switch can know where that frame will be sent.

astrodust · on Jan 19, 2018

Maybe that mattered on serial connections, but on a 10Gbit switch you're talking nonsense.

p-squared · on Jan 19, 2018

Yeah, it's extremely unlikely that a high-speed serial phy would even expose a primitive word size any smaller than 32 bits. Byte-by-byte decoding is extraordinarily difficult at these data rates.

dogecoinbase · on Jan 19, 2018

No, cut-through mode for packet forwarding (including on layer 3 prefix matches!) is a feature in current use on L2/3 10/40/100Gb switches, primarily in HFT/HPC environments.

astrodust · on Jan 19, 2018

It's my understanding that this works with the complete address, not just the first byte, where it begins processing before the whole frame is received.

On 100Gbit you're talking 0.01ns per bit, so 0.32ns for a full IPv4 address. This compared to, potentially, 15.24ns for a complete IPv4 frame.

kazinator · on Jan 19, 2018

Nevertheless; header before payload is a kind of "higher level big endian".

IshKebab · on Jan 19, 2018

Err, no.

kazinator · on Jan 19, 2018

Only if all you care about is that 10Gbit is a lot of throughput, and latencies don't bother your application.

If a 10Gbit network is being used precisely because 1Gbit didn't have sufficiently low latency, then it matters.

tlb · on Jan 20, 2018

But more than that, the encoding used in 10GBASE-T doesn't allow decoding individual bytes. For the usual copper standard, it's 4 pairs with 7 bits per symbol, and the skew between lanes means you can't count on corresponding symbols arriving at the same time. And then there's a reed-solomon block code to correct errors, which has to be decoded all at once. The optical standards use 64b/66b coding, which means you receive blocks of 66 bits and decode 64 out of them (which prevents having too many zeros in a row which can disrupt timing.)

asterius · on Jan 21, 2018

No one who cares about latency even has 10GBASE-T hardware, because it introduces a vast amount of coding latency. It is also very power inefficient.

astrodust · on Jan 22, 2018

Right, it's probably even "chatty", too and "won't scale".

ycombobreaker · on Jan 20, 2018

It absolutely matters for low-latency applications. Maybe not your use case, but it's not nonsense.

astrodust · on Jan 21, 2018

It's nonsense when talking about bytes. As others have pointed out, most switches/routers work with chunks of 16-64 bytes at a time, not singles.

cesarb · on Jan 20, 2018

> Look at a basic Ethernet frame. The destination address is first

Perhaps the biggest mistake of IPv6 was copying IPv4's header order of source followed by destination. It would have been better if the destination address came before the source address in the header.

iforgotpassword · on Jan 20, 2018

Same. I still wrap all reads and writes of multi-byte values though, with functions like toNet32() etc. which are just noops currently. Just in case big endian will get a revive by hipster CPU architects in a couple years...

wmf · on Jan 20, 2018

We had this same thread 997 days ago: https://hackertimes.com/item?id=9451284

If big endian was effectively dead back then, I guess it's really dead now.

tinus_hn · on Jan 19, 2018

Let’s hope so because one of these is not better than the other but one is much better than two.

phkahler · on Jan 20, 2018

I was annoyed reading the RISC-V docs and seeing that they allow implementations to go either way. Sometimes it's better to just make a choice and tell people what it is.

jcadam · on Jan 20, 2018

I was once on a contract that involved porting satellite simulation software from SPARC/Solaris to x86/RHEL. The software was in Ada95, and it didn't take much to get it to compile on the new hardware. However, ground-to-vehicle comms (which our simulator had to handle as it needed to work with the real ground control software) took place using a packed binary format (wasting as little space as possible). Data fields frequently crossed byte boundaries (the first 10 bits are field X, the next 6 are field Y, etc.). Fun times rearranging and repacking bits.

Of course, dealing with all of the Endian issues had not been planned for and was not in the project schedule :/

gbacon · on Jan 20, 2018

Sounds like VMF. It’s such a pain to debug when you can’t even rely on marker strings being recognizable in dump output.

One time I had to make little- and big-endian machines talk nicely was for a message engine that ran on Intel/Linux. All the messages in the ICD were glorified packed structs that were native on the IBM/AIX side.

I used the ICD wire format as my storage format. I defined containers for all the primitive types with correct sizes and that swapped values on assignment and read (except for strings, of course). I added the appropriate pragmata to pack the underlying structs and locked down their sizes with static_asserts that would fail at compilation time until everything fit perfectly.

These classes gave me what looked like native structs on the Linux side, e.g.,

    msg.foo = 3;

or

    msg1.date = msg2.date + 1;

and the byte swapping happened transparently all around, both on the way in and on the way out. Yes, if latency had been a concern, this design may have been problematic. The Linux box faked messages on behalf of three other systems so we could command the AIX software into different modes with complex prerequisites. Changing a few fields per cycle, I could then dump the struct onto the wire.

A cool aspect to the story was it all worked correctly the first time out of the box even though I did not have access to the AIX box during implementation. When it came time to integrate with the AIX side, all that was left to do was some low-level socket stuff to hook up to the message flow.

I’d moved to a different project full-time and worked this message engine during evenings and weekends. We got the code unpacked, so I run make check like I had been so frequently.

“Wow, you had time to write unit tests and everything?” asked one of the guys from behind my shoulder.

A moment of Zen hit.

“I didn’t have time not to.”

faragon · on Jan 19, 2018

Little endian is genius. Not only because of size casts being "free" (because LE is compatible among "word" sizes -lower bits are at a known location-, while with BE you need to know the word size in order to locate the lower bits of the word), but also because of efficient bit I/O (using CPU registers as accumulators, and flushing with a word-size store), so variable size data can be encoded efficiently.

Xeoncross · on Jan 20, 2018

Dealing with PCM audio streams (like WAV files) you sometimes find both. After all, there are thousands of codecs + media containers + multiple versions.

https://en.wikipedia.org/wiki/FFmpeg#Supported_formats

> When more than one byte is used to represent a PCM sample, the byte order (big endian vs. little endian) must be known. Due to the widespread use of little-endian Intel CPUs, little-endian PCM tends to be the most common byte orientation.

https://wiki.multimedia.cx/index.php/PCM

ComputerGuru · on Jan 19, 2018

I’m presuming from your wording that you’re already aware, but for the benefit of others, ARM is bi-endian and can be used in either of LE or BE modes. However, most “in the field” configurations deploying real-world operating systems use it in LE mode as that makes porting everything so much easier.

gumby · on Jan 19, 2018

> Little-endian is slightly more confusing for humans...

I think you meant to write “Little-endian is slightly more confusing for humans who use left-to-right languages” as all the R-L languages also put the least significant digit on the right.

samat · on Jan 19, 2018

Not true. In arabic languages you write text R-L, but numbers are L-R.

ufo · on Jan 19, 2018

Arabic numbers are L-R if you have a big-endian mindset, but R-L if you switch to a little-endian mindset :)

amaccuish · on Jan 19, 2018

No. Arabic numbers are always L-R...

alaaibrahim · on Jan 19, 2018

Not exctly accurate, for the part of the number that is < 100, it will always be R-L, as in 25, would be 5 and twenty in Arabic.

Also for the part that is L-R, that is not the rule, as some people still read all the number as R-L (actually in a lot of historical documents that was the case), so the would read 1925 as five and twenty and nine hundred and a thousand. Where is now most people would read it as a thousand, and nine hundred, and five and twenty.

gumby · on Jan 20, 2018

Many R-L languages do the same such as German French, Hindi et al.

hondadriver · on Jan 20, 2018

I guess you mean L-R. In Dutch and German 1925 is most commonly read as 'nine teen five and twenty' But 2025 as 'two thousand five and twenty'

pritambaral · on Jan 20, 2018

Wait, Hindi is neither a R-L language nor does it switch directions for text and numbers.

Source: native Hindi speaker.

gumby · on Jan 20, 2018

Yes sorry for too-late-to-edit confusing typo: german Hindi French are all L-R , all big-endian (least significant digit on right) all have numbers which, when spoken, aren’t said in strict digit order.

BrandoElFollito · on Jan 19, 2018

Do you mean that you read a text from the right to the left, and when you meet a number you jump to is leftmost digit, then read it LR, then jump agzin to its left (where there is a space) and resume reading text?

simias · on Jan 19, 2018

If you think about it in left-to-right languages reading numbers requires backtracking because you have to see how many digits there are to know if you're reading units, thousands, millions, billions...

Take 12,345,567 for instance, if you want to read it aloud you first have to go all the way to the end to figure out that there are three groups of numbers, so the first one is million, then thousands, then units. If you were only given a truncated version of the number it's impossible to read it aloud. So in a way the arabic represents number actually makes some sense and it could be argue that we write them backwards.

Furthermore IIRC from a discussion with an arabic friend (from the middle east, not sure if it's different in NA) the most formal way of reading numbers aloud is actually backwards (or little endian I guess) compared to the way we do in english. So if I understood correctly for 1,234 you'd say "four and thirty and two hundred and a thousand" or something similar. Don't quote me on that though because I know almost nothing about the arabic language and this is from a discussion I've had many years ago.

ori_b · on Jan 19, 2018

gumby · on Jan 20, 2018

Exactly: you encounter the least significant digit first as you read.

IshKebab · on Jan 19, 2018

Honestly I find little endian less confusing. The only confusing thing is how little endian value are written in hex because within each byte they are written as big endian.

Other than that little endian makes more sense. The fact that Arabic numbers are written in big endian is weird. Nobody questions it because it's all anyone is taught.

gumby · on Jan 20, 2018

They aren’t Arabic BTW, they are Hindu positional digits (look at the characterforms you’ll see the connection) which arrived in Europe via the Arabs, hence the name — in Arabic, different character forms are used.

takeda · on Jan 20, 2018

Still doesn't make sense because little endian keeps things reverse in bit groups. It doesn't make sense no matter if you read left-right, right-left, top to bottom or bottom to top.

convolvatron · on Jan 20, 2018

write out what happens bitwise with a shift on a little endian machine.

from a bytes as representing a bitstring perspective, big endian make sense, and little endian is a terrible scramble.

wilsonnb · on Jan 19, 2018

I believe the chips that IBM uses in their current Z series mainframes are still big-endian.

monocasa · on Jan 19, 2018

Eh, haswell added the movbe instructions. And I wouldn't be surprised if someone in the RISCV world added big endian instructions (they're nice for network processing).

And nearly every powerpc I've seen is big endian, FWIW.

acallan · on Jan 19, 2018

Actually, Silverthorne (the first Atom CPU) added MOVBE.

ComputerGuru · on Jan 19, 2018

It seems that Atom has become - ironically - a staging ground for new features that make their way much later to the desktop/mobile x86_64 architecture. Here's another: https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring...

monocasa · on Jan 19, 2018

Those little cores get a lot more benefit out of special case instructions.

awalton · on Jan 19, 2018

It's less that, and more that in high-performance-at-low-power realms, density (and CISC) is good - it means big complex operations can be hidden behind single instructions instead of needing multiple instructions and powering all of that hardware for multiple cycles.

The Atom needs instructions that make bigger, more complex operations simpler and lower powered, and with transistors the size they are these days they have ample silicon space to burn on these types of accelerators.

ajdlinux · on Jan 20, 2018

Pretty much all new deployment of Power that IBM has sold since the release of POWER8 is little endian. Distro support is very quickly shifting to be little endian only as well.

(disclaimer: IBMer)

api · on Jan 19, 2018

Those are for fast I/O, but they don't change the architecture.

comandillos · on Jan 19, 2018

Im working for an Space Agency and we'r using SPARC for all our sats CPUs. Basically, Big-Endian.

speleo_engr · on Jan 19, 2018

I worked on a code base using the LEON (SPARC, big-endian) and also ARM in little-endian mode for space. The code was common for both endians. I followed the sage advice of Rob Pike on it: https://commandcenter.blogspot.com/2012/04/byte-order-fallac...

jfries · on Jan 19, 2018

  Little-endian is slightly more confusing for humans

I've heard this before, but the reason is that you view hex data and list numbers left-to-right as if they were letters. They are not.

0x12345678 stored big-endian, numbering bytes left-to-right:

  12 34 56 78

Looks good, but I think that this is actually more confusing, because when you number the bytes and bits you will see that the bytes are written left-to-right, while at the same time the bits are written right-to-left.

The solution is to show dumps with bytes numbered right-to-left. This is coherent with how we number bits, and also how we relatively position digits in any other number.

0x12345678 in little-endian, but written right-to-left:

  12 34 56 78

Now the numbering of the bytes is consistent with the numbering of bits. You can easy see that bits 0-7 of the bits interpreted as a 32-bit word belong in byte 0, while bits 16-23 belong in byte 2.

kazinator · on Jan 19, 2018

When 0x12345678 is stored in little endian, it looks like 78 56 34 12 in a byte dump, which is stupid because hex digits are grouped as pairs and then revered.

The endianness at the bit level is irrelevant because the bits are chunked into bytes, and are usually not even addressable.

A storage format exhibits endianness only when it is addressable.

In the C language, bits are only "addressable" via the shift operators. These are rooted in pure arithmetic, so that 1<<1 is always 2, regardless of whether you're on a big or little endian platform. Whether you call the value 1 "bit 7", "bit 8", "bit 1" or "bit 0" is just, pardon the pun, word semantics.

The only time you deal with "bit endianness" is with certain compressed data formats, and with bitfield layout rules.

tetrep · on Jan 19, 2018

> Whether you call the value 1 "bit 7", "bit 8", "bit 1" or "bit 0" is just, pardon the pun, word semantics.

It's extremely important to your mental model to understand how the bits are arranged, or left shift (<<) is going to produce different results in your head. If you think the value 1 is bit 7/8, you get:

1 0 0 0 0 0 0 0

Which would left shift to:

0 0 0 0 0 0 0 0

Which would not be equal to 2.

kazinator · on Jan 20, 2018

> It's extremely important to your mental model to understand how the bits are arranged

And the shortcut for that is simply regard bytes as big-endian. On all platforms. Whether you're on a PPC or x86, the byte 0x80 is going to go out on the wire as 1 first, followed by 7 zeros.

So if you're writing a data compressor and the spec says that the variable-length bit strings (huffman or whatever) are stuffed into bytes in network order, that means you fill bytes from the left down. That is done with code that works the same way on BE or LE platforms.

loeg · on Jan 19, 2018

Your scheme to byteswap little endian in hex displays falls over in the face of differing word sizes.

jfries · on Jan 19, 2018

Not at all, you just list the whole dump right-to-left. Then it works out regardless of word size.

loeg · on Jan 19, 2018

So the dump is ordered backwards, from top of memory to bottom? That seems harder for humans than little endian integers.

jfries · on Jan 19, 2018

I don't understand what you mean. Line breaks can be inserted where ever suitable. Whether the bytes are listed left-to-right or right-to-left makes no difference.

kazinator · on Jan 19, 2018

A dump of memory (several kilobytes, megabytes or whatever) is inherently big endian: it proceeds from the base address and goes up.

It is counterintuitive to swap pieces of it into some locally opposite order.

Yet, that's what has to be done so that numbers are readable.

That's why "od" has modes for that.

  $ od -tx1  /bin/ls | head -1
  0000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  $ od -tx2  /bin/ls | head -1
  0000000 457f 464c 0102 0001 0000 0000 0000 0000
  $ od --endian=big -tx2  /bin/ls | head -1  # GNU extension, probably.
  0000000 7f45 4c46 0201 0100 0000 0000 0000 0000

jfries · on Jan 19, 2018

  A dump of memory (several kilobytes, megabytes or whatever) is inherently big endian: it proceeds from the base address and goes up.

It only looks "inherently big endian" when you print bytes on each line starting from the left. This way of printing numbers makes little sense as you end up with some kind of mixed-endian where the bytes are ordered one way and the bits another way.

Start each line with bytes from the right instead to make the bytes and bits numbering consistent, and you can print large little-endian dumps with all bits and bytes are where you expect them to be.

kazinator · on Jan 20, 2018

Everything is consistent in the big (down to the nybble and bit) endian view.

The number 0x12345678 is actually the nybbles 1 2 3 4 ... which are the bits 0001 0010 0011 0100 and so on.

In the hex dump 12 34 56 78 we just understand the bytes to be big-endian also at the nybble level (and bit also).

That is to say the "1" can be understood to be at the lower "nybble address", relative to the "2".

If that buffer is sent over a serial communication channel or network, the bits actually go in that order 0001 0010 0011. The 1 nybble goes out first, as 0001 (three zeros out the door, then a one), then the 2 nybble and so on.

spc476 · on Jan 20, 2018

It depends. with RS-232 (the old serial port standard) bits were transmitted least-significant-bit first, with the most significant bit sent last [1]. Getting back to hex dumps, here's one of some data:

    00000000: 6C 6F 77 09 30 0A 66 72 65 65 09 31 32 32 0A 65 low.0.free.122.e

You can see it's ASCII. A little endian dump of that would be

    e.221.eerf.0.wol 65 0A 32 32 31 09 65 65 72 66 0A 30 09 77 6F 6C :00000000

It makes reading any text in binary data a bit difficult.

[1] My first computer was a Tandy Color Computer, which had a serial port driven directly by the CPU. I learned pretty quickly which bit goes first.

phs2501 · on Jan 19, 2018

At least in Freescale's PowerPC documentation, it's convention to number the bits left-to-right in big-endian. So the most-significant bit is bit 0, which matches up with the most-significant big-endian byte being 0. See, for a random example, page 1101 of https://www.nxp.com/docs/en/reference-manual/MPC8379ERM.pdf.

Personally I prefer the little-endian representation.

kazinator · on Jan 19, 2018

The documentation might have that numbering, but that makes no difference in programming on the PowerPC. When we take the value 1 and shift left by 1 bit, we get 2.

That the documentation thinks this is bit 7 going to bit 6 is immaterial.

Calling the MSB "bit 1" is a tip of the hat to serial communications. In serial communication and networking, it is predominant to transmit the MSB first.

If the documentation is about a wire format, then using that numbering is correct down to the data link layer and (modulo framing considerations and such), physical.

phs2501 · on Jan 19, 2018

Oh, I'm aware it makes no difference internally; I picked that documentation because I am somewhat intimately familiar with it as I worked on a product with that CPU and did a lot of driver work. My only point was to the poster I was replying to, who wrote "because when you number the bytes and bits you will see that the bytes are written left-to-right, while at the same time the bits are written right-to-left". That assumes that bits are universally numbered starting with 0 at the LSB, which isn't true.

ajdlinux · on Jan 20, 2018

MSB0 is convention in basically all IBM documentation (including PowerPC/Power Architecture stuff), hence all the Freescale/NXP PPC manuals follow.

The main practical problem with MSB0 is that you need to consider the width of whatever field or register you're looking at to work out the correct bit shift.

mrpippy · on Jan 19, 2018

SPARC still exists, and is big-endian

DannyBee · on Jan 19, 2018

SPARC 64 is bi-endian. It has big endian instruction format, but you can choose endianness both at the load-store level or at the memory page level.

geraldcombs · on Jan 19, 2018

It might exist, but if my experience is any indication it's moving steadily toward irrelevance and Oracle is pushing it there.

loeg · on Jan 19, 2018

Barely.

echlebek · on Jan 19, 2018

Maybe in processors, but not in general.

Why? Because you can use lexicographic sorting with big-endian bytes.

IshKebab · on Jan 19, 2018

Funny I just asked a question about this on SO when I discovered that CBOR - a very recent protocol - uses network byte order (big endian). A confounding decision given that no machines today use it, and it just add extra byte swapping at both ends.

I can't explain it other than presumable there was some guy on the design committee who was like "but.... Unix... the 70s... network byte order... guys!" or something.

spc476 · on Jan 20, 2018

Having implemented CBOR [1], it's really only an issue for the x86, which can do unaligned reads. On most other architectures (even little-endian ones like RISC-V) it's less an issue because you will either have to copy the bytes to be aligned, or just read byte-by-byte (sorry, octet-by-octet) and shift.

[1] https://github.com/spc476/CBOR

rozzie · on Jan 20, 2018

Big-endian is the new 1’s complement.

Like 6-bit bytes and octal, BCD, segments, and other such memories, it’s simply a relief that we’ve moved on.

cryptica · on Jan 20, 2018

I find the the terms big endian and little endian confusing, feels like they should be switched around.

So the one that ends big (most significant byte last) should be big endian and the one that ends little should be little endian. But it's actually the opposite. It makes it difficult to remember which is which.

userbinator · on Jan 20, 2018

Here's an alliterative mnemonic:

    Little endian puts the
    Least significant part at the
    Lowest address(offset)

kripke · on Jan 20, 2018

I usually think of it as: by which end do I read my number? but I agree it's confused me for a long time.

viraptor · on Jan 20, 2018

> On a little-endian machine integer size casts are free -- e.g. casting a uint64_t to uint32_t just means reading the first 4 bytes of it. On big-endian machines integer size casts require pointer math.

Is it really that different? It matters on x86 if the value is in some register already, since there's no subdivision of the high part, but it shouldn't matter for memory. Most instructions support math in addressing, so whether `(char)*x` gets compiled to `[eax]` or `[eax+3]` seems not very different.

grawprog · on Jan 19, 2018

The 24-bit virtual machine I'm working on is little endian. The 'processor' reads instructions and addressess in little endian and instructions are stored in memory in little endian. But when programming it, using assembly, values and addresses are written with the least significant byte on the furthest right to make it easier for people to understand. At least so far. I may change it if it ends up being confusing.

angry_octet · on Jan 21, 2018

While it has mostly gone out of favour, it really isn't due to some cosmic plan. Intel just won in the microprocessor architecture race, and they happened to be LE due to their evolution.

I still have PowerPC and MIPS machines that are BE, and which are occasionally handy for seeing if pointer math is being done wrong.

chmike · on Jan 20, 2018

Here is another point of view on the endianess issue: https://commandcenter.blogspot.fr/2012/04/byte-order-fallacy...

deepnotderp · on Jan 19, 2018

I think a couple of networking chips still use it.

AlexandrB · on Jan 19, 2018

I remember reading somewhere that BE was chosen for IP partly for routing efficiency. As you go from MSB to LSB in an IP address you narrow in on an increasingly specific network segment. That's probably not true anymore as smaller and smaller blocks of IPv4 addresses have been handed out (e.g. instead of a full class C).

Const-me · on Jan 19, 2018

> That's probably not true anymore

Right, and also because modern RAM is essentially a block device. You just can’t access individual bytes anymore. E.g. on a typical modern PC with a dual-channel DDR, the block size is 128 bits = 16 bytes.

MichaelMoser123 · on Jan 19, 2018

cavium is big endian mips.

wmf · on Jan 20, 2018

They're replacing MIPS with ARM as fast as they can but I wonder which endian they're running. Certainly ARM servers will run LE but I wonder if they will bother to produce a BE SDK for networking to ease porting from MIPS BE.

takeda · on Jan 20, 2018

I read and so far all arguments for little endian I see are because the CPUs adopted it so it is more efficient to keep data the same way.

What are the other benefits of little endian, because in terms of readability big endian makes most sense.

I could understand that having things in reverse could be somehow more efficient, but why do it reverse in groups of 8 bits instead having all bits reversed?

valarauca1 · on Jan 19, 2018

Big Endian is still supported natively on POWER8/9

Intel added movBE (mov big endian) support in BM2 extension.

Networks are still Big Endian.

SPARC chips (and now RISC-V) are BIG Endian by default.

MIPS just became an independent company a few days ago and they have native Big Endian support.

Edit 1: I was wrong about Power8/9 LE/BE

wmf · on Jan 19, 2018

If you use Linux on POWER you're forced to use Big Endian mode as the faster Little Endian mode (according to marketting) is reserved for IBM's operating systems.

That's not correct. Ubuntu is LE only and it looks like RHEL is also switching from BE to LE.

"The base RISC-V ISA has a little-endian memory system, but non-standard variants can provide a big-endian..."

SPARC and MIPS are dead.

Aloha · on Jan 19, 2018

MIPS last I looked was quite alive in the embedded space - you're right in that it is effectively dead everywhere else.

bonzini · on Jan 19, 2018

Recent Linux supports both endiannesses for PPC. ppc64le was added recently.

RISC-V is little endian.

ksec · on Jan 21, 2018

I thought RISC-V supported both?

lazyjones · on Jan 19, 2018

ARM also supports both.

dnautics · on Jan 20, 2018

little endian is only confusing for humans because we very inconsistently use RTL vs LTR semantics.

amelius · on Jan 20, 2018

Do some bignum libraries perhaps still use big endian, for whatever reason?

Bitcoin_McPonzi · on Jan 20, 2018

Why not both? There was at least one chip that let you select your endian-ness. (The MIPS R4000).

See Wikipedia on Bi-endian machines: https://en.wikipedia.org/wiki/Endianness#Bi-endian_hardware

floogtheunfound · on Jan 20, 2018

If you work with some industrial machines via serial protocols it is very much alive.

CoC_CRUSHER · on Jan 19, 2018

Does it matter? on x86 you can bswap if you need to convert.

hapless · on Jan 19, 2018