In common anecdotal experience with disassembling code, it is very common for data areas interspersed with code (like string literals) to disaassemble to instructions, momentarily causing the human to be puzzled: what is this repetition of five "or" instructions doing here referencing registers that would never be arguments?

The reason is that the opcode encoding is very dense, and has no redundancy against detecting bad encodings, and usually no relationship to neighboring words.

By that I mean that some four byte chunk (say) treated as an opcode word is treated that way regardless of what came before or what comes after. If it looks like an opcode with a four-byte immediate operand, then the disassembly will pull in that operand (which can be any bit combination) and skip another four bytes. Nothing in the operand will indicate "this is a bad instruction overall".

▲

userbinator 35 minutes ago | parent | next [-]

This is why a dumb linear disassembler is not too useful unless you're pointing it at a specific region of data that you already know contains valid instructions; for best results, you need a disassembler that knows how to follow the control flow, starting at an entry point or some other location that is known to be an instruction.

▲

NobodyNada 6 hours ago | parent | prev [-]

Every reverse engineer learns very quickly that "add [rax], al" has the machine code representation "00 00".

	▲	userbinator 4 hours ago \| parent [-]
		Or "add [bx+si], al" for those from an earlier era.