Published on: November 18, 2011
This post contains mail written by me to the famous hacker Héctor Martín. The mail was regarding the basic steps in reverse engineering devices.
— MAIL BEGIN —
> 1) After opening a device, how do you understand which chip is the CPU ?
This is usually fairly obvious from the layout and the connectivity on the board. It really depends on the device, but it’s usually one of the larger chips, and may be connected to Flash memory, and/or to a quartz crystal. On larger devices it will have its own power supply, while on smaller ones the only telltale might be that it’s connected to most parts of the board. And of course, often you can just look up the part numbers and figure out what most chips are.
> 2) How are the firmwares extracted from the devices ? Is there a general principle ?
This depends heavily on the device. It can be as easy as connecting to a debug serial port and getting a text-based console into a bootloader that lets you dump the flash. Or it can be as hard as requiring a clock/power glitching setup in order to dump an internal mask ROM buried inside the CPU. Usually if the flash is external, you can remove it and dump it externally, or there might be a JTAG port through which you can read/write it. Microcontrollers with embedded flash usually have programming ports but the code is usually protected from readout; these are nearly impossible to dump unless you know of a specific vulnerability in the particular chip’s protection.
> 3) After getting a firmware dump how do you read it ?
If you know the CPU architecture in use, you run it through a disassembler and see if it makes sense. If you don’t know the architecture, you can try some educated guesses. After a while you learn to recognize some popular CPU architectures from a simple hex dump (e.g. ARM code sticks out like a sore thumb due to the condition code field, which means that every 32-bit word almost always starts with ‘E’). You can just use GNU binutils (objdump) to disassemble code (usually), but the IDA disassembler by Hex-Rays is quite popular in the reverse engineering community (albeit quite pricey). Sometimes the CPU architecture is unknown. I know some crazy people who can eventually make sense of an unknown binary and figure out what the opcodes mean, but I’m not one of them.
And sometimes if the firmware has very high entropy (it looks like “garbage” – no patterns, you learn to recognize this too) it usually means it’s either encrypted or compressed, so you might look to see whether you can find an offset after which there’s valid compressed data using a popular algorithm (zlib, LZMA, etc…). If it’s encrypted sometimes there are blockwise patterns (e.g. duplicated 16byte or 8byte blocks) that often mean it’s encrypted using a block cipher in ECB mode.
— MAIL END —