Cleanup logic of expression evaluation, stub out resolution of register
values to a callback in case we want to use constant propagation to try
to allow successful calculations, and add support for default static
values for treating an arch's stack frame register (e.g. RBP) like the
static CFA value we already have support for.
Add option to decorate params and local vars with their DWARF storage
location info.
Handle arrays with unspecified element type.
When a loading less than 8 bytes to a register, the value is supposed to
be zero-extended. This is what the eBPF execution engine in the Linux
kernel does, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2113
This is also what is specified in RFC 9669 which standardised BPF ISA:
https://www.rfc-editor.org/rfc/rfc9669.html#name-regular-load-and-store-oper
Add the missing `zext` calls in the semantic section of instructions
LDXW, LDXH and LDXB. While at it, add them to other load instructions.
For information, the issue can be seen when analyzing this C program:
unsigned int div_by_1000(unsigned int value) {
return value / 1000;
}
Compiling it with clang gives:
$ clang -O0 -target bpf -c division.c -o division.ebpf
$ bpf-objdump -rd division.ebpf
division.ebpf: file format elf64-bpfle
Disassembly of section .text:
0000000000000000 <div_by_1000>:
0: 63 1a fc ff 00 00 00 00 stxw [%fp+-4],%r1
8: 61 a0 fc ff 00 00 00 00 ldxw %r0,[%fp+-4]
10: 37 00 00 00 e8 03 00 00 div %r0,0x3e8
18: 95 00 00 00 00 00 00 00 exit
Ghidra decompiles this program as:
ulonglong div_by_1000(uint param_1)
{
undefined4 in_stack_00000000;
return CONCAT44(in_stack_00000000,param_1) / 1000;
}
This `in_stack_00000000` comes from the way the parameter is loaded from
the stack. The listing shows the following disassembly and p-code
operations:
ram:00100008 61 a0 fc ff 00 LDXW R0,[R10 + -0x4=>Stack[-0x4]]
00 00 00
$U3e00:8 = INT_ADD R10, -4:8
R0 = LOAD ram($U3e00:8)
This shows the value is indeed loaded from 8 bytes at `$U3e00:8` instead
of 4.
After adding `zext` calls, Ghidra decodes the same instruction as:
ram:00100008 61 a0 fc ff 00 LDXW R0,[R10 + -0x4=>local_4]
00 00 00
$U4100:8 = INT_ADD R10, -4:8
$U4180:4 = LOAD ram($U4100:8)
R0 = INT_ZEXT $U4180:4
This only loads 4 bytes from the stack, as expected.
Moreover the decompilation view is now correct:
ulonglong div_by_1000(uint param_1)
{
return (ulonglong)param_1 / 1000;
}
The operand of the CALL instruction missed multiplying the immediate
value by 8. Without this, calls are not decoded correctly.
Such a CALL instruction can be emitted when compiling this simple
`single_call.c` program:
static int one(void) {
return 1;
}
int call_one(void) {
return one();
}
with:
clang -O0 -target bpf -c single_call.c -o single_call.ebpf
Disassembling with LLVM shows:
$ llvm-objdump -d single_call.ebpf
single_call.ebpf: file format elf64-bpf
Disassembly of section .text:
0000000000000000 <call_one>:
0: 85 10 00 00 01 00 00 00 call 1
1: 95 00 00 00 00 00 00 00 exit
0000000000000010 <one>:
2: b7 00 00 00 01 00 00 00 r0 = 1
3: 95 00 00 00 00 00 00 00 exit
The first instruction ("call 1") calls the function located at 0x10 (at
index `2:` in the listing). Ghidra considered the call to target
address 9 instead (as `inst_next = 8` and `imm = 1`). Fix this by
multiplying `imm` by 8 when encountering a `disp32` operand (which is
only used by instruction `CALL`).
Adjust ELF Relocation R_BPF_64_32 to take into account for this
multiplication by 8. Actually it is documented to compute (S + A) / 8 - 1
so the division by 8 was missing.
eBPF byte swap operations (BE16, BE32, BE64, LE16, LE32, LE64) have
semantics that depend on the endianness of the host processor executing
the eBPF program. For example, on a Little-Endian CPU, BE16 swaps the 2
lowest significant bytes of the given destination register.
The semantic section of LE16 contains:
{ dst=((dst) >> 8) | ((dst) << 8); }
This contains several issues:
- It assumes the instruction always swaps the bytes. This should only
happen on Big-Endian host CPU.
- If `dst` does not contain a 16-bit value (meaning `dst >> 16 != 0`),
the computed value is wrong. The value should be properly masked. For
example the Linux kernel defines in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/swab.h?h=v6.14#L14
#define ___constant_swab16(x) ((__u16)( \
(((__u16)(x) & (__u16)0x00ffU) << 8) | \
(((__u16)(x) & (__u16)0xff00U) >> 8)))
As the endianness of the CPU has to be the same as the eBPF program
(defined in the ELF header), introduce a macro `ENDIAN` and use it to
implement the byte swap operations.