Posted by Luc 3 days ago
From Table 7 you can get an idea of how many instruction variants we cover (~1500 covered, ~700 enumerated but not synthesized, 744 out of enumeration scope). Instruction variants correspond much more closely with the mnemonics listed in the reference manuals, and this is typically the number reported by related work.
> Memory Debugger
Valgrind > Memcheck, None: https://en.wikipedia.org/wiki/Valgrind ; TIL Memcheck's `none` provides a traceback where the shell would normally just print "Segmentation fault"
DynamoRio > Dr Memory: https://en.wikipedia.org/wiki/DynamoRIO#Dr._Memory
Intel Pin: https://en.wikipedia.org/wiki/Pin_(computer_program)
https://news.ycombinator.com/item?id=22095435, : SoftICE, EPT, Hypervisor, HyperDbg, PulseDbg, BugChecker, pyvmidbg (libVMI + GDB), libVMI Python, volatilityfoundation/volatility, Google/rekall -> yara, winpmem, Microsoft/linpmem, AVML,
rr, Ghidra Trace Format: https://github.com/NationalSecurityAgency/ghidra/discussions... https://github.com/NationalSecurityAgency/ghidra/discussions... : appliepie, orange_slice, cannoli
GDB can help with register introspection: https://web.stanford.edu/class/archive/cs/cs107/cs107.1202/l... :
> Auto-display and Printing Registers: The `info reg` command [and `info all-registers` (`i r a`)]
emu86 implements X86 instructions in Python, optionally in Jupyter notebooks; still w/o X86S, SIMD, AVX-512, x86-84-v4
What would it take to get `bash` to print a --traceback after "Segmentation fault\n", and then possibly also --gdb like pytest --pdb?
- [ ] ENH: bash: add valgrind `none`-style ~ --traceback after "Segmentation fault" given an env var or by default?
I'm happy to answer questions if there are any.
Documentation is definitely not one of x86's strengths. Other architectures do much better. For example, ARM provides formal models of their CPUs, and RISC-V is so simple you could implement all its semantics in a few thousand lines of code.
There are quite a few instructions with undefined behavior, but it is not that much of an issue if you can choose to avoid it -- for example in a compiler. Almost all UB is found in flags or when using invalid instruction prefixes. And although there is some unexpected UB, like `imul`'s zero flag being UB instead of being set according to the result of the multiplication [1], reading the manual and sticking to the parts that are clearly not UB gets you most of the way.
However, it becomes an issue if you need to analyze a binary that uses UB. Then you can't choose which instructions to use, so you need to have a complete model of all UB. That's much more difficult, and for example most decompilers currently fail at this. We have an example of this in Figure 1 of our paper.