diff --git a/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc b/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc index 8238afc9d4..cb96bbbbce 100644 --- a/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc +++ b/Ghidra/Features/Decompiler/src/decompile/cpp/architecture.cc @@ -26,8 +26,8 @@ vector ArchitectureCapability::thelist; -const uint4 ArchitectureCapability::majorversion = 4; -const uint4 ArchitectureCapability::minorversion = 1; +const uint4 ArchitectureCapability::majorversion = 5; +const uint4 ArchitectureCapability::minorversion = 0; AttributeId ATTRIB_ADJUSTVMA = AttributeId("adjustvma",103); AttributeId ATTRIB_ENABLE = AttributeId("enable",104); diff --git a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml index bf9eaeaaca..fc210d599a 100644 --- a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml +++ b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml @@ -2071,8 +2071,136 @@ + + Unions + + Unions data-types are fully supported. The Decompiler does not automatically infer unions + when analyzing a function; it propagates them into the function from explicitly + annotated sources, like input parameters or global variables. + + + A union data-type, similarly to a structure, is made up of component data-types + called fields. But unlike a structure, a union's fields all share the same underlying + storage. When the union is applied to a variable, each field potentially describes the whole variable. + At any given point where the variable is read or written, a different field may be in effect, even if the + underlying data hasn't changed. The decompiler attempts to infer the particular field by following data-flow + to or from the point of use to determine which field best aligns with the specific operations being applied to the + variable. The name of this recovered field is then printed in Decompiler output using syntax similar to that + used for structure fields. + + + Depending on the number and variety of fields within the union, it may not be possible + to fully distinguish which field is being used in a specific context. In this situation, + the Decompiler chooses the first field from the list of best matches. The user has the + option of changing this choice with the action. + + + + Typedefs + + Typedef data-types are fully supported. The Decompiler does not automatically infer typedefs + when analyzing a function; it propagates them into the function from explicitly annotated sources. + + + A typedef is copy of another data-type but with an alternate name. + In most cases it can be used interchangeably with the data-type it copies. + In general, the Decompiler treats a typedef as a distinct data-type, and it will maintain its identify + when it is assigned to variables and is propagated through data-flow. + + + Ghidra supports a specific set of attributes that can be placed directly on a typedef + that then distinguish it from the data-type it copies. This allows Ghidra to support some + non-standard data-types, although the typedef and its copy are no longer interchangeable. + The decompiler supports the following typedef properties: + + Component Offset - See + Address Space - See + + + + + Pointer Attributes + + The Decompiler supports some specialized attributes that can be applied to pointer data-types, like offsets + and address spaces (See below). Ghidra implements these attributes on top of typedef data-types only. In + order to add attributes to pointers, a typedef of the underlying pointer data-type must be created first. + Attributes can then be placed directly on the typedef from the Data Type Manager window + (See Pointer-Typedef Settings). + + + Offset Pointers + + An offset pointer points at a fixed offset relative to the start of its + underlying data-type. Typically the underlying data-type is a structure and the pointer points at a + specific field in the interior of the structure. But in general, the underlying data-type can be anything, + and the offset can point anywhere relative to that data-type, including either before or after. + + + An offset pointer is defined with all the same properties of a normal pointer. It has an underlying + data-type and a size. On top of this an offset is specified + as an integer attribute on the pointer (typedef). This is the number of bytes that need to be + added to the start of the underlying data-type to obtain the address actually being pointed at. + + + Because the underlying data-type does not start directly at the address + contained in the offset pointer, one can also refer to the offset pointer's + direct data-type, i.e. the data-type that is + directly at the address contained in the pointer. If the pointer's offset is positive (and small), + the direct data-type will generally be that of a field of the underlying + data-type. If the offset is bigger than the size of the underlying data-type or is negative, + the direct data-type will be undefined. + + + Offset pointers occur in code where the compiler has maintained knowledge of the position of + an underlying data-type relative to a pointer, even if the pointer no longer points directly at the data-type. + Because of this, the code may still access fields of the underlying data-type through the pointer. + Annotating a variable with an offset pointer allows the Decompiler to recover these accesses. + + + Within the Decompiler's output, the token ADJ is used to indicate that the code is + accessing the underlying data-type through the offset pointer. The token uses functional syntax + to indicate the particular offset pointer. Then, once the ADJ token is + applied, additional pointer syntax is used, i.e. ->, to indicate what part + of the underlying data-type is being accessed. + + + ADJ(structoffptr)->field1 = 2; // Accessing the underlying structure's field + iVar1 = *ADJ(intoffptr); // Accessing the underlying integer data-type + ADJ(arrayoffptr)[4] = iVar2; // Accessing the underlying array + + + If the offset pointer appears in Decompiler output without the ADJ token being + applied, it is being treated as if it were a normal pointer to its direct + data-type. This generally indicates the pointer is being used to access data outside the + underlying data-type. + + + + Address Space Pointers + + An address space pointer is a normal pointer data-type with a specific + address space associated to it (See ). Its created by setting + the Address Space attribute on a typedef of a pointer. The attribute value is the name of the specific + address space. + + + Address space pointers are useful, when a program architecture supports more than one address space + containing addressable memory, such as separate code and data address spaces. + For a program and a specific section of its code that manipulates a pointer, it may not be easy to determine + which address space is being referred to. Address space pointers provide an additional annotation mechanism + to help the decompiler identify the correct address space for a pointer in context. + + + The Decompiler will automatically propagate an address space pointer data-type from parameters and + other annotated variables associated with a function. Any constant that the pointer reaches via propagation + is assumed to point into the address space associated with the pointer. The correct symbol can then + be looked up, further informing the Decompiler output. + + + + Forcing Data-types @@ -2509,17 +2637,19 @@ The volatile mutability setting indicates that values within the memory region may change unexpectedly, even if the code currently executing does not directly - write to it. If a volatile variable is accessed in a function being analyzed by the Decompiler, - each specific access is replaced with a built-in function call, which prevents constant propagation - and other transforms across the access. The built-in functions are named based on - whether the access is a read or write and on the size - of the access. Within the Decompiler output, the first parameter to a built-in function is a symbol - indicating the volatile variable. The function returns a value in the case of a volatile read or - takes a second parameter in the case of a volatile write. + write to it. Accessing a variable within a volatile region, either reading or writing, can have other + side-effects on the machine state, and it cannot in general be treated as normal variable. + If a volatile variable is accessed in a function being analyzed by the Decompiler, + each access is expressed as a copy statement on its own line, separated from other expressions, + so that the its position within the code and any sequence of accesses is clearly indicated. + Any access, either read or write, will always be displayed, even if the value is not directly + used by the function. The token representing the variable will be displayed using the + Special color, highlighting that the access is volatile + (See ). - X = read_volatile_2(SREG); - write_volatile_1(DAT_mem_002b,0x20); + X = SREG; // Reading volatile SREG + DAT_mem_002b = 0x20; // Writing volatile DAT @@ -3982,8 +4112,10 @@ Double-clicking a '{' or '}' token, causes the window to navigate to the matching brace within the window. The cursor is set and the window view is adjusted if - necessary to ensure that the matching brace is visible. Braces may also be navigated via - the keyboard. + necessary to ensure that the matching brace is visible. + + + Braces may also be navigated via the keyboard. @@ -4032,15 +4164,69 @@