Commit graph

1776 commits

Author SHA1 Message Date
dev747368
e908ab6fbf DWARF expression handling refactor
Cleanup logic of expression evaluation, stub out resolution of register
values to a callback in case we want to use constant propagation to try
to allow successful calculations, and add support for default static
values for treating an arch's stack frame register (e.g. RBP) like the
static CFA value we already have support for.

Add option to decorate params and local vars with their DWARF storage
location info.

Handle arrays with unspecified element type.
2025-08-11 11:21:28 -04:00
Ryan Kurtz
b76bbb843f Merge remote-tracking branch 'origin/GP-5853_Dan_ARM-VLD-and-VST--SQUASHED' 2025-07-29 10:35:14 -04:00
Dan
352fed0d95 GP-5853: Initial implementation of ARM Neon VLD/VSTn instructions. 2025-07-29 14:32:54 +00:00
Ryan Kurtz
6c85ba4563 Merge remote-tracking branch
'origin/GP-5759_ghidorahrex_PR-8192_p1pkin_sh4_fsca_fix' (Closes #8192)
2025-07-29 09:12:19 -04:00
Ryan Kurtz
391a052e55 Merge remote-tracking branch 'origin/patch' 2025-07-29 09:10:56 -04:00
ghidorahrex
4abf6d55ad GP-5766: Fixed instruction AVX512 disassembly errors 2025-07-29 08:56:43 -04:00
Nicolas Iooss
24d19f6e8c Add eBPF ISA v4 instructions
In 2023, the eBPF instruction set was modified to add several
instructions related to signed operations (load with sign-extension,
signed division, etc.), a 32-bit jump instruction and some byte-swap
instructions. This became version 4 of eBPF ISA.

Here are some references about this change:

- https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html
  (a blog post about eBPF instruction set extensions)
- https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
  (documentation sent to Linux Kernel mailing list)
- https://www.rfc-editor.org/rfc/rfc9669.html#name-sign-extension-load-operati
  (IETF's BPF Instruction Set Architecture standard defined the new
  instructions)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n1859
  (implementation of signed division and remainder in Linux kernel.
  This shows that 32-bit signed DIV and signed MOD are zero-extending
  the result in DST)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2135
  (implementation of signed memory load in Linux kernel)
- https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f9a1ea821ff25353a0e80d971e7958cd55b47a3
  (commit which added signed memory load instructions in Linux kernel)

This can be tested with a recent enough version of clang and LLVM (this
works with clang 19.1.4 on Alpine 3.21).
For example for signed memory load instructions:

    signed int sext_8bit(signed char x) {
        return x;
    }

produces:

    $ clang -O0 -target bpf -mcpu=v4 -c test.c -o test.ebpf
    $ llvm-objdump -rd test.ebpf
    ...
    0000000000000000 <sext_8bit>:
           0:  73 1a ff ff 00 00 00 00  *(u8 *)(r10 - 0x1) = r1
           1:  91 a1 ff ff 00 00 00 00  r1 = *(s8 *)(r10 - 0x1)
           2:  bc 10 00 00 00 00 00 00  w0 = w1
           3:  95 00 00 00 00 00 00 00  exit

(The second instruction is a signed memory load)

Instruction MOVS (Sign extend register MOV) uses offset to encode the
conversion (whether the source register is to be considered as signed
8-bit, 16-bit or 32-bit integer). The mnemonic for these instructions is
quite unclear:

- They are all named MOVS in the proposal
  https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
- LLVM and Linux disassemblers only display pseudo-code (`r0 = (s8)r1`)
- RFC 9669 (https://datatracker.ietf.org/doc/rfc9669/) uses MOVSX for
  all instructions.
- GCC uses MOVS for all instructions:
  https://github.com/gcc-mirror/gcc/blob/releases/gcc-14.1.0/gcc/config/bpf/bpf.md?plain=1#L326-L365

To make the disassembled code clearer, decode such instructions with a
size suffix: MOVSB, MOVSH, MOVSW.

The decoding of instructions 32-bit JA, BSWAP16, BSWAP32 and BSWAP64 is
straightforward.
2025-07-29 12:45:06 +00:00
Ryan Kurtz
1929357e1d Merge remote-tracking branch 'origin/patch' 2025-07-29 08:33:22 -04:00
Ryan Kurtz
0d8a39a07a Merge remote-tracking branch
'origin/GP-5857_ghidorahrex_PR-7979_niooss-ledger_ebpf-fix-load-zext'
into patch (Closes #7979)
2025-07-29 08:24:03 -04:00
Ryan Kurtz
b4239911c9 Merge remote-tracking branch
'origin/GP-5858_ghidorahrex_PR-7929_niooss-ledger_fix-ebpf-call-operand'
into patch (Closes #7929)
2025-07-29 08:21:27 -04:00
Ryan Kurtz
179263a592 Merge remote-tracking branch
'origin/GP-5593_ghidorahrex_PR-7985_niooss-ledger_ebpf-fix-semantic-byte-swap-instructions'
into patch (Closes #7985)
2025-07-29 08:19:37 -04:00
Ryan Kurtz
28b46c5c93 Merge remote-tracking branch
'origin/GP-5336_ghidorahrex_PR-7065_philpem_6805_hcs08_xidx_fix' into
patch (Closes #7065, Closes #7064)
2025-07-29 08:16:11 -04:00
Ryan Kurtz
ce924f8ab5 Merge remote-tracking branch 'origin/GP-4977_DescriptorDecoderFix' 2025-07-29 10:14:27 +00:00
caheckman
c05acfed1d Fix for testGetReturnTypeOfMethodDescriptor 2025-07-28 22:06:06 +00:00
Ryan Kurtz
1b7fae31f9 Merge remote-tracking branch 'origin/patch' 2025-07-28 17:28:07 +00:00
Dan
39c0a83c0c GP-5877: Fix Patch Instruction action in some Harvard architectures. 2025-07-28 15:48:40 +00:00
ghidra1
4a0e95ecd3 GP-3091 ppc64 ELF improvements for 32-bit addressing. Fixed default ELF
GOT markup boundary condition.  Fixed improper EXTERNAL symbols with
.pltgot. prefix and duplication.
2025-07-25 14:19:18 -04:00
Ryan Kurtz
cc177afc8f Merge remote-tracking branch 'origin/patch' 2025-07-21 13:17:27 -04:00
Ryan Kurtz
3cfa867ac3 Merge remote-tracking branch 'origin/GP-5843_emteere_MIPS64FunctionStarts' into patch 2025-07-21 13:14:06 -04:00
Ryan Kurtz
9628d10220 Merge remote-tracking branch 'origin/patch' 2025-07-18 15:21:52 -04:00
Ryan Kurtz
edf42d82d9 Merge remote-tracking branch 'origin/GP-5846_ghidra1_PPC64_ELFRelocations' into patch 2025-07-18 15:17:45 -04:00
ghidra1
006bd8d423 GP-5846 Corrected ELF PowerPC 64-bit relocation processing bugs
affecting ELFv2 use and R_PPC64_JMP_SLOT relocation
2025-07-18 12:00:34 -04:00
Ryan Kurtz
fde33a5821 Merge remote-tracking branch 'origin/patch' 2025-07-18 06:19:25 -04:00
Ryan Kurtz
e69ce4104b Merge remote-tracking branch 'origin/GP-5804_emteere_FixDefaultSymbolicPropRecordState' into patch 2025-07-18 06:15:13 -04:00
emteere
3468c4b502 GP-5843 Added MIPS64 function start patterns 2025-07-17 22:42:00 +00:00
Ryan Kurtz
88bfdeb429 Merge remote-tracking branch 'origin/GP-4356_ghintern_avr8_cspec--SQUASHED' 2025-07-17 06:19:47 -04:00
ghintern
991a4b440c GP-4356: fixes to avr8 cspec and elf extension, and additions to decompiler model rules 2025-07-16 20:22:28 +00:00
Ryan Kurtz
2c10392a79 Merge remote-tracking branch 'origin/GP-5211_ghintern_riscv_cspec--SQUASHED' 2025-07-16 13:31:25 -04:00
ghintern
f26d36c6bb GP-5211: Fix RISCV 32- and 64-bit compiler specifications and relocation handler 2025-07-16 16:38:27 +00:00
Ryan Kurtz
bdfe4ba492 Merge remote-tracking branch
'origin/GP-5815_ghidra1_AARCH64_ElfGotRelocs' (Closes #8253)
2025-07-16 06:15:58 -04:00
ghidra1
17827592d4 Merge remote-tracking branch 'origin/patch' 2025-07-15 18:30:48 -04:00
ghidra1
130b365e7c GP-5827 Corrected ELF MIPS 64-bit relocation processing error 2025-07-15 18:27:41 -04:00
Ryan Kurtz
7d26a65e31 Merge remote-tracking branch 'origin/patch' 2025-07-14 16:11:33 -04:00
Ryan Kurtz
7d76ab5e9b Merge remote-tracking branch
'origin/GP-4989_ghintern_arm_fix_aapcs--SQUASHED' into patch
(Closes #6958)
2025-07-14 16:05:48 -04:00
caheckman
14870dc532 GP-4977 Properly decode <object> in Array 2025-07-14 18:49:05 +00:00
ghintern
3e11715778 GP-4989: Fix ARM AAPCS cspec, add soft float calling convention 2025-07-14 18:38:17 +00:00
ghidra1
438725bafd GP-5815 Added ELF Loader GOT allocation support for AARCH64 in support
of object module loading.
2025-07-11 16:17:19 -04:00
Ryan Kurtz
f97fd834fe Merge remote-tracking branch 'origin/patch' 2025-07-10 05:39:35 -04:00
caheckman
de842dbd32 GP-5816 Fix return recovery for AARCH64 and ARM 2025-07-09 21:19:07 +00:00
Nicolas Iooss
e2de11d5b2
Fix eBPF zero-extend load instructions
When a loading less than 8 bytes to a register, the value is supposed to
be zero-extended. This is what the eBPF execution engine in the Linux
kernel does, in
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/core.c?h=v6.14#n2113
This is also what is specified in RFC 9669 which standardised BPF ISA:
https://www.rfc-editor.org/rfc/rfc9669.html#name-regular-load-and-store-oper

Add the missing `zext` calls in the semantic section of instructions
LDXW, LDXH and LDXB. While at it, add them to other load instructions.

For information, the issue can be seen when analyzing this C program:

    unsigned int div_by_1000(unsigned int value) {
        return value / 1000;
    }

Compiling it with clang gives:

    $ clang -O0 -target bpf -c division.c -o division.ebpf
    $ bpf-objdump -rd division.ebpf
    division.ebpf:     file format elf64-bpfle

    Disassembly of section .text:

    0000000000000000 <div_by_1000>:
       0:    63 1a fc ff 00 00 00 00     stxw [%fp+-4],%r1
       8:    61 a0 fc ff 00 00 00 00     ldxw %r0,[%fp+-4]
      10:    37 00 00 00 e8 03 00 00     div %r0,0x3e8
      18:    95 00 00 00 00 00 00 00     exit

Ghidra decompiles this program as:

    ulonglong div_by_1000(uint param_1)
    {
      undefined4 in_stack_00000000;
      return CONCAT44(in_stack_00000000,param_1) / 1000;
    }

This `in_stack_00000000` comes from the way the parameter is loaded from
the stack. The listing shows the following disassembly and p-code
operations:

    ram:00100008 61 a0 fc ff 00       LDXW       R0,[R10 + -0x4=>Stack[-0x4]]
                 00 00 00
                            $U3e00:8 = INT_ADD R10, -4:8
                            R0 = LOAD ram($U3e00:8)

This shows the value is indeed loaded from 8 bytes at `$U3e00:8` instead
of 4.

After adding `zext` calls, Ghidra decodes the same instruction as:

    ram:00100008 61 a0 fc ff 00       LDXW       R0,[R10 + -0x4=>local_4]
                 00 00 00
                            $U4100:8 = INT_ADD R10, -4:8
                            $U4180:4 = LOAD ram($U4100:8)
                            R0 = INT_ZEXT $U4180:4

This only loads 4 bytes from the stack, as expected.
Moreover the decompilation view is now correct:

    ulonglong div_by_1000(uint param_1)
    {
      return (ulonglong)param_1 / 1000;
    }
2025-07-07 16:28:00 +02:00
Nicolas Iooss
c1d96a2140
Fix eBPF CALL operand decoding
The operand of the CALL instruction missed multiplying the immediate
value by 8. Without this, calls are not decoded correctly.

Such a CALL instruction can be emitted when compiling this simple
`single_call.c` program:

    static int one(void) {
        return 1;
    }

    int call_one(void) {
        return one();
    }

with:

    clang -O0 -target bpf -c single_call.c -o single_call.ebpf

Disassembling with LLVM shows:

    $ llvm-objdump -d single_call.ebpf
    single_call.ebpf:	file format elf64-bpf

    Disassembly of section .text:

    0000000000000000 <call_one>:
           0:	85 10 00 00 01 00 00 00	call 1
           1:	95 00 00 00 00 00 00 00	exit

    0000000000000010 <one>:
           2:	b7 00 00 00 01 00 00 00	r0 = 1
           3:	95 00 00 00 00 00 00 00	exit

The first instruction ("call 1") calls the function located at 0x10 (at
index `2:` in the listing). Ghidra considered the call to target
address 9 instead (as `inst_next = 8` and `imm = 1`). Fix this by
multiplying `imm` by 8 when encountering a `disp32` operand (which is
only used by instruction `CALL`).

Adjust ELF Relocation R_BPF_64_32 to take into account for this
multiplication by 8. Actually it is documented to compute (S + A) / 8 - 1
so the division by 8 was missing.
2025-07-07 16:26:31 +02:00
Nicolas Iooss
adb0eac98a
Add support for big endian eBPF programs 2025-07-07 16:13:37 +02:00
Nicolas Iooss
52cb7a36e6
Fix the semantics of eBPF byte swap instructions
eBPF byte swap operations (BE16, BE32, BE64, LE16, LE32, LE64) have
semantics that depend on the endianness of the host processor executing
the eBPF program. For example, on a Little-Endian CPU, BE16 swaps the 2
lowest significant bytes of the given destination register.

The semantic section of LE16 contains:

    { dst=((dst) >> 8) | ((dst) << 8); }

This contains several issues:

- It assumes the instruction always swaps the bytes. This should only
  happen on Big-Endian host CPU.
- If `dst` does not contain a 16-bit value (meaning `dst >> 16 != 0`),
  the computed value is wrong. The value should be properly masked. For
  example the Linux kernel defines in
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/swab.h?h=v6.14#L14

    #define ___constant_swab16(x) ((__u16)(             \
            (((__u16)(x) & (__u16)0x00ffU) << 8) |      \
            (((__u16)(x) & (__u16)0xff00U) >> 8)))

As the endianness of the CPU has to be the same as the eBPF program
(defined in the ELF header), introduce a macro `ENDIAN` and use it to
implement the byte swap operations.
2025-07-07 16:13:36 +02:00
emteere
4723729d80 GP-5804 Set SymbolicPropogator to record register begin/end state in
basic constructor. Better document recordBeginEndState flag.
2025-07-03 17:49:53 +00:00
ghidorahrex
997c64f6db GP-5759: Fixed token piece formatting 2025-06-16 14:29:10 +00:00
Ryan Kurtz
ab849887aa Merge remote-tracking branch
'origin/GP-3952-ghidra_blue-update-script-categories--SQUASHED'
2025-06-13 12:12:47 -04:00
ghidra_blue
7db176b2bd GP-3952 Updated the script categories to simplify and reduce the number of folders. 2025-06-13 15:00:15 +00:00
Ryan Kurtz
5ac69075e3 GP-0: Fixing deprecated calls to Conv 2025-06-13 09:03:48 -04:00
Ryan Kurtz
82baf0aa74 Merge remote-tracking branch 'origin/Ghidra_11.4' 2025-06-11 12:07:08 -04:00
Ryan Kurtz
e08d05a376 Merge remote-tracking branch 'origin/GP-5622_ghidorahrex_aarch64_neon_impl--SQUASHED' into Ghidra_11.4 2025-06-11 11:51:05 -04:00