*I have the source code for the entire Linux kernel as training data, so it's no...

tsimionescu · on May 26, 2020

> perhaps you should train two models: one to convert machine code to assembly, and the other to convert assembly to C.

Is the step of going from machine code to gcc-produced assembly not trivial? Is gcc actually producing assembly code that an assembler needs to do more with than convert to the corresponding opcodes?

p1esk · on May 27, 2020

There are two kinds of assembly: 1. assembly that corresponds to optimized machine code, and 2. assembly that closely corresponds to the original C code. As I said, these two assembly versions might look very different depending on optimizations performed by the compiler. You can reduce the difficulty of learning the conversion from machine code to assembly at the expense of increasing the difficulty of learning the conversion from assembly to C code (and vice versa).