When a loading less than 8 bytes to a register, the value is supposed to
be zero-extended. This is what the eBPF execution engine in the Linux
kernel does, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2113
This is also what is specified in RFC 9669 which standardised BPF ISA:
https://www.rfc-editor.org/rfc/rfc9669.html#name-regular-load-and-store-oper
Add the missing `zext` calls in the semantic section of instructions
LDXW, LDXH and LDXB. While at it, add them to other load instructions.
For information, the issue can be seen when analyzing this C program:
unsigned int div_by_1000(unsigned int value) {
return value / 1000;
}
Compiling it with clang gives:
$ clang -O0 -target bpf -c division.c -o division.ebpf
$ bpf-objdump -rd division.ebpf
division.ebpf: file format elf64-bpfle
Disassembly of section .text:
0000000000000000 <div_by_1000>:
0: 63 1a fc ff 00 00 00 00 stxw [%fp+-4],%r1
8: 61 a0 fc ff 00 00 00 00 ldxw %r0,[%fp+-4]
10: 37 00 00 00 e8 03 00 00 div %r0,0x3e8
18: 95 00 00 00 00 00 00 00 exit
Ghidra decompiles this program as:
ulonglong div_by_1000(uint param_1)
{
undefined4 in_stack_00000000;
return CONCAT44(in_stack_00000000,param_1) / 1000;
}
This `in_stack_00000000` comes from the way the parameter is loaded from
the stack. The listing shows the following disassembly and p-code
operations:
ram:00100008 61 a0 fc ff 00 LDXW R0,[R10 + -0x4=>Stack[-0x4]]
00 00 00
$U3e00:8 = INT_ADD R10, -4:8
R0 = LOAD ram($U3e00:8)
This shows the value is indeed loaded from 8 bytes at `$U3e00:8` instead
of 4.
After adding `zext` calls, Ghidra decodes the same instruction as:
ram:00100008 61 a0 fc ff 00 LDXW R0,[R10 + -0x4=>local_4]
00 00 00
$U4100:8 = INT_ADD R10, -4:8
$U4180:4 = LOAD ram($U4100:8)
R0 = INT_ZEXT $U4180:4
This only loads 4 bytes from the stack, as expected.
Moreover the decompilation view is now correct:
ulonglong div_by_1000(uint param_1)
{
return (ulonglong)param_1 / 1000;
}
The operand of the CALL instruction missed multiplying the immediate
value by 8. Without this, calls are not decoded correctly.
Such a CALL instruction can be emitted when compiling this simple
`single_call.c` program:
static int one(void) {
return 1;
}
int call_one(void) {
return one();
}
with:
clang -O0 -target bpf -c single_call.c -o single_call.ebpf
Disassembling with LLVM shows:
$ llvm-objdump -d single_call.ebpf
single_call.ebpf: file format elf64-bpf
Disassembly of section .text:
0000000000000000 <call_one>:
0: 85 10 00 00 01 00 00 00 call 1
1: 95 00 00 00 00 00 00 00 exit
0000000000000010 <one>:
2: b7 00 00 00 01 00 00 00 r0 = 1
3: 95 00 00 00 00 00 00 00 exit
The first instruction ("call 1") calls the function located at 0x10 (at
index `2:` in the listing). Ghidra considered the call to target
address 9 instead (as `inst_next = 8` and `imm = 1`). Fix this by
multiplying `imm` by 8 when encountering a `disp32` operand (which is
only used by instruction `CALL`).
Adjust ELF Relocation R_BPF_64_32 to take into account for this
multiplication by 8. Actually it is documented to compute (S + A) / 8 - 1
so the division by 8 was missing.
eBPF byte swap operations (BE16, BE32, BE64, LE16, LE32, LE64) have
semantics that depend on the endianness of the host processor executing
the eBPF program. For example, on a Little-Endian CPU, BE16 swaps the 2
lowest significant bytes of the given destination register.
The semantic section of LE16 contains:
{ dst=((dst) >> 8) | ((dst) << 8); }
This contains several issues:
- It assumes the instruction always swaps the bytes. This should only
happen on Big-Endian host CPU.
- If `dst` does not contain a 16-bit value (meaning `dst >> 16 != 0`),
the computed value is wrong. The value should be properly masked. For
example the Linux kernel defines in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/swab.h?h=v6.14#L14
#define ___constant_swab16(x) ((__u16)( \
(((__u16)(x) & (__u16)0x00ffU) << 8) | \
(((__u16)(x) & (__u16)0xff00U) >> 8)))
As the endianness of the CPU has to be the same as the eBPF program
(defined in the ELF header), introduce a macro `ENDIAN` and use it to
implement the byte swap operations.
file npe
Apple Macho binaries truncate section names to 16 chars, DWARF 5
introduced a section (debug_str_offsets) that has a name longer than 16
(along with the macho "__" prefix).
Add support for ignoring atomic_type, and some checking for missing
source file names.