From fe2d508c1c82a3aea57daedecd8425561964d29b Mon Sep 17 00:00:00 2001
From: caheckman <48068198+caheckman@users.noreply.github.com>
Date: Tue, 22 Sep 2020 16:00:21 -0400
Subject: [PATCH] Changes in response to review
---
.../Decompiler/certification.manifest | 1 +
.../src/main/doc/decompileplugin.xml | 605 +++++++++++++-----
.../src/main/help/help/shared/languages.css | 6 +-
.../DecompilerAnnotations.html | 163 ++++-
.../DecompilePlugin/DecompilerConcepts.html | 121 +++-
.../DecompilePlugin/DecompilerIntro.html | 112 ++--
.../DecompilePlugin/DecompilerOptions.html | 118 +++-
.../DecompilePlugin/DecompilerWindow.html | 144 ++++-
.../DecompilePlugin/images/Undefined.png | Bin 0 -> 16567 bytes
.../app/decompiler/DecompileOptions.java | 166 +++--
.../core/decompile/DecompilerProvider.java | 10 +-
.../BackwardsSliceToPCodeOpsAction.java | 8 +-
.../actions/ForwardSliceToPCodeOpsAction.java | 8 +-
.../program/model/lang/BasicCompilerSpec.java | 6 +-
14 files changed, 1089 insertions(+), 379 deletions(-)
create mode 100644 Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/Undefined.png
diff --git a/Ghidra/Features/Decompiler/certification.manifest b/Ghidra/Features/Decompiler/certification.manifest
index 115209e2e9..1e59b644ab 100644
--- a/Ghidra/Features/Decompiler/certification.manifest
+++ b/Ghidra/Features/Decompiler/certification.manifest
@@ -47,6 +47,7 @@ src/main/help/help/topics/DecompilePlugin/images/DecompWindow.png||GHIDRA||||END
src/main/help/help/topics/DecompilePlugin/images/Defuse.png||GHIDRA||||END|
src/main/help/help/topics/DecompilePlugin/images/EditFunctionSignature.png||GHIDRA||||END|
src/main/help/help/topics/DecompilePlugin/images/ForwardSlice.png||GHIDRA||||END|
+src/main/help/help/topics/DecompilePlugin/images/Undefined.png||GHIDRA||||END|
src/main/help/help/topics/DecompilePlugin/images/camera-photo.png||Tango Icons - Public Domain|||Tango|END|
src/main/help/help/topics/DecompilePlugin/images/decompileFunction.gif||GHIDRA||reviewed||END|
src/main/help/help/topics/DecompilePlugin/images/page_edit.png||FAMFAMFAM Icons - CC 2.5||||END|
diff --git a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml
index 72895cc21f..75cafef9b1 100644
--- a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml
+++ b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml
@@ -50,7 +50,14 @@
- Press the icon
+ Press the
+
+
+
+
+
+
+ icon
in the tool bar, or
@@ -88,54 +95,79 @@
Some of the primary capabilities of the decompiler include:
-
-
- Recovering Expressions: The
- decompiler does full data-flow analysis which allows it to
- perform slicing on functions: complicated expressions, which have been split into
- distinct operations/instructions and then mixed together with
- other instructions by the compiling/optimizing process, are
- reconstituted back into a single line.
-
-
- Recovering High-Level Scoped
- Variables: The decompiler understands how compilers
- use processor stacks and registers to implement variables with
- different scopes within a function. Data-flow analysis allows it to
- follow what was originally a single variable as it moves from
- the stack, into a register, into a different register, etc. Thus
- it can effectively recover the original programs concept of a
- variable, minimizing the need to introduce artificial variables
- in the output.
-
-
- Recovering Function Parameters:
- The decompiler understands the parameter passing conventions of
- the compiler and can reconstruct the original form of
- function calls.
-
-
- Using Data-type, Name, and Signature
- Annotations: The decompiler automatically pulls in
- all the different data types and variable names that the user
- has applied to functions, and the C output is altered to reflect
- this. High-level variables are appropriately named, structure
- fields and array indices are calculated and displayed with
- correct syntax, constant char pointers are replaced with
- appropriate quoted strings, etc.
-
-
- Propagating Local Data-types:
- The decompiler infers the data-type of unlabeled variables
- by propagating information from other sources throughout a function.
-
-
- Recovering Structure Definitions:
- The decompiler can be used to create structures that match the usage
- pattern of particular functions and variables, automatically discovering
- component offsets and data-types.
+
+
+
+ Recovering Expressions
+
+
+ The decompiler does full data-flow analysis which allows it to
+ perform slicing on functions: complicated expressions, which have been split into
+ distinct operations/instructions and then mixed together with
+ other instructions by the compiling/optimizing process, are
+ reconstituted back into a single line.
+
-
+
+
+ Recovering High-Level Scoped Variables
+
+
+ The decompiler understands how compilers
+ use processor stacks and registers to implement variables with
+ different scopes within a function. Data-flow analysis allows it to
+ follow what was originally a single variable as it moves from
+ the stack, into a register, into a different register, etc. Thus
+ it can effectively recover the original program's concept of a
+ variable, minimizing the need to introduce artificial variables
+ in the output.
+
+
+
+
+ Recovering Function Parameters
+
+
+ The decompiler understands the parameter passing conventions of
+ the compiler and can reconstruct the original form of
+ function calls.
+
+
+
+
+ Using Data-type, Name, and Signature Annotations
+
+
+ The decompiler automatically pulls in
+ all the different data types and variable names that the user
+ has applied to functions, and the C output is altered to reflect
+ this. High-level variables are appropriately named, structure
+ fields and array indices are calculated and displayed with
+ correct syntax, constant char pointers are replaced with
+ appropriate quoted strings, etc.
+
+
+
+
+ Propagating Local Data-types
+
+
+ The decompiler infers the data-type of unlabeled variables
+ by propagating information from other sources throughout a function.
+
+
+
+
+ Recovering Structure Definitions
+
+
+ The decompiler can be used to create structures that match the usage
+ pattern of particular functions and variables, automatically discovering
+ component offsets and data-types.
+
+
+
+
@@ -170,7 +202,7 @@
size - the maximum number of bytes that can be addressed
- endianess - how groups of bytes are interpreted as integers
+ endianness - how groups of bytes are interpreted as integers
@@ -187,7 +219,7 @@
ram
- A space that models memory accessible via the processors's main data bus. Depending on
+ A space that models memory accessible via the processor's main data bus. Depending on
the architecture, different spaces might be substituted for ram,
such as separate code and data spaces.
@@ -280,7 +312,7 @@
- The integer data-type assumes a twos complement encoding in the endianness of the
+ The integer data-type assumes a two's complement encoding in the endianness of the
address space containing the varnode. Similarly, the floating point data-type assumes
an IEEE 754 standard encoding. The precision of the integer or floating point value is
determined by the varnode's size. A boolean data-type assumes the varnode has a size
@@ -444,12 +476,12 @@
CALL
funcname(...)
-
Branch to a subfunction.
+
Branch to a function, as a call.
CALLIND
(*funcptr)(...)
-
Branch through a pointer to a subfunction.
+
Branch through a pointer to a function, as a call.
RETURN
@@ -539,7 +571,7 @@
INT_2COMP
-
-
Twos complement.
+
Two's complement.
INT_NEGATE
@@ -761,6 +793,47 @@
+
+ P-code Control Flow
+
+ P-code has natural control-flow, with the subtlety that flow
+ happens both within and across machine instructions. Most p-code operators have
+ fall-through semantics, meaning that flow moves to the
+ next operator in the sequence associated with the instruction, or, if the operator is the
+ last in the sequence, flow moves to the first operator in the p-code associated with the next instruction.
+ The p-code operators with branching semantics, such as
+ CBRANCH and BRANCH, can jump to a target operator which is internal to the current instruction, or they can
+ jump to the first p-code operator corresponding to a new instruction at a different address.
+
+
+ Ghidra labels a machine instruction with one of the following Flow Types that describe
+ its overall control-flow. The Flow Type is derived directly from the control-flow of the p-code for the instruction,
+ with the basic types corresponding directly with a specific branching p-code operator.
+
+
+ FALL_THROUGH
+ UNCONDITIONAL_CALL - CALL
+ UNCONDITIONAL_JUMP - BRANCH
+ CONDITIONAL_JUMP - CBRANCH
+ COMPUTED_JUMP - BRANCHIND
+ COMPUTED_CALL - CALLIND
+ TERMINATOR - RETURN
+
+
+ Other Flow Types occur due to a combination of multiple p-code branching operators within the same instruction.
+
+
+ CONDITIONAL_CALL - CALL with CBRANCH
+ CONDITIONAL_TERMINATOR - RETURN with CBRANCH
+ COMPUTED_CALL_TERMINATOR - CALLIND with RETURN
+ CONDITIONAL_COMPUTED_JUMP - CBRANCH with BRANCHIND
+ CONDITIONAL_COMPUTED_CALL - CBRANCH with CALLIND
+ JUMP_TERMINATOR - BRANCH with RETURN
+
+
+
+
+
Internal Decompiler Functions
@@ -771,9 +844,11 @@
of p-code operations, displaying them as if they were built-in functions for the language.
-
+
+
+
+ SUB41(x,c) - Truncation operation - SUBPIECE
- SUB41(x,c) - Truncation operation - SUBPIECE
The digit '4' indicates the size of the input operand 'x' in bytes.The digit '1' indicates the size of the output value in bytes.
@@ -804,8 +879,10 @@
+
+
+ CONCAT31(x,y) - Concatenation operator - PIECE
- CONCAT31(x,y) - Concatenation operator - PIECE
The digit '3' indicates the size of the input operand 'x' in bytes.The digit '1' indicates the size of the input operand 'y' in bytes.
@@ -819,8 +896,10 @@
bytes, and 'y' the least significant bytes, in the result.
+
+
+ ZEXT14(x) - Zero-extension operator - INT_ZEXT
- ZEXT14(x) - Zero-extension operator - INT_ZEXT
The digit '1' indicates the size of the input operand 'x' in bytes.The digit '4' indicates the size of the output in bytes.
@@ -833,8 +912,10 @@
significant bytes of the result.
+
+
+ SEXT14(x) - Sign-extension operator - INT_SEXT
- SEXT14(x) - Sign-extension operator - INT_SEXT
The digit '1' indicates the size of the input operand 'x' in bytes.The digit '4' indicates the size of the output in bytes.
@@ -847,8 +928,10 @@
bit of 'x' into the most significant bytes of the result.
+
+
+ SBORROW4(x,y) - Test for signed borrow operator - INT_SBORROW
- SBORROW4(x,y) - Test for signed borrow operator - INT_SBORROW
The digit '4' indicates the size of both input operands 'x' and 'y' in bytes.
@@ -857,8 +940,10 @@
as signed integers.
+
+
+ CARRY4(x,y) - Test for unsigned overflow operator - INT_CARRY
- CARRY4(x,y) - Test for unsigned overflow operator - INT_CARRY
The digit '4' indicates the size of both input operands 'x' and 'y' in bytes.
@@ -867,8 +952,10 @@
as unsigned integers.
+
+
+ SCARRY4(x,y) - Test for signed overflow operator - INT_SCARRY
- SCARRY4(x,y) - Test for signed overflow operator - INT_SCARRY
The digit '4' indicates the size of both input operands 'x' and 'y' in bytes.
@@ -877,7 +964,8 @@
as signed integers.
-
+
+
@@ -1102,14 +1190,14 @@
Unaffected
- Prototype models can specify a set of unaffectedmemory locations,
+ Prototype models can specify a set of unaffected memory locations,
whose value must be preserved across the function. I.e. each location
must hold the same value at a function's exit that it held coming into the function.
These encompass a calling convention's saved registers, where a calling function
can store values it doesn't want to change unexpectedly, but also may include other registers that are
known not to change, like the stack pointer.
The decompiler uses the information to determine which locations can be safely propagated across
- a sub-function.
+ a called function.
@@ -1137,8 +1225,12 @@
processor specific form into Ghidra's IR language (see ),
which provides both the control-flow behavior of the instruction and the detailed
semantics describing how the processor and memory state is affected. The translation is controlled by
- the underlying processor model and cannot be directly altered from the tool. Users
- can modify the model specification itself, however.
+ the underlying processor model and, except in limited circumstances, cannot be directly altered
+ from the tool. Flow Overrides (see below) can change how certain control-flow is translated,
+ and, depending on the processor, context registers may affect p-code (see ).
+
+
+ Outside of the tool, users can modify the model specification itself.
See the document "SLEIGH: A Language for Rapid Processor Specification".
@@ -1148,7 +1240,7 @@
fall through, conditional jump, and other
semantics until an instruction with terminator semantics is
reached, which is usually a "return from subroutine"
- instruction. Flow is not traced into subfunctions, in this situation. Instructions
+ instruction. Flow is not traced into called functions, in this situation. Instructions
with call semantics are treated only as if they fall through.
@@ -1202,11 +1294,17 @@
Flow Overrides
- Control-flow behavior for machine instructions is generally determined by the underlying
- Processor model. But this flow can be overridden for individual instructions by various
- Analyzers or manually by the user. The decompiler incorporates these overrides into its
- analysis, and they can have a significant impact on results. Current
- flow overrides include:
+ Control-flow behavior for a machine instruction is generally determined by its underlying
+ p-code (see ), but this can be changed by applying a Flow Override.
+ A Flow Override maintains the overall semantics of a branching instruction
+ but changes how the branch is interpreted. For instance, a JMP instruction, which traditionally
+ represents a branch within a single function, can be overridden to represent a call to a new function.
+ Flow Overrides are applied by Analyzers or manually by the user.
+
+
+ The decompiler automatically incorporates any relevant Flow Overrides into its
+ analysis of a function. This can have a significant impact on results. The
+ types of possible Flow Overrides include:
@@ -1492,8 +1590,11 @@
Duplicate Symbols
- Ghidra allows duplicate symbols, including duplicate function names, in
- any scope higher than a function.
+ Ghidra allows different functions to have the same name, even within the same
+ namespace, in order to model languages that support function overloading.
+ In most languages, such functions would be expected to have distinct prototypes to allow
+ the symbols to be distinguished in context. Ghidra and the decompiler however do not check
+ for this, as prototypes may not be known.
@@ -1796,21 +1897,20 @@
For annotations that specifically label a function's formal parameters or return value,
- the Signature Source-Type also affects how they're treated by the decompiler.
- If the Source-Type is set to anything other than DEFAULT, there is a forced
+ the Signature Source also affects how they're treated by the decompiler.
+ If the Signature Source is set to anything other than DEFAULT, there is a forced
one-to-one correspondence between variable annotations and actual parameters in the decompiler's
view of the function. This is stronger than just forcing the data-type; the existence (or not) of
- the variable itself is forced by the annotation in this case. If the Source-Type is forcing and
+ the variable itself is forced by the annotation in this case. If the Signature Source is forcing and
there are no parameter annotations, a void prototype is forced on the function.
- A forcing Source-Type, with a value other than
- DEFAULT, is set typically if debug symbols for the function are read in during
+ A forcing Signature Source is set typically if debug symbols for the function are read in during
Program import (IMPORTED), or if the user manually edits the function prototype
directly (USER_DEFINED).
- If an annotation and the Signature Source-Type force a parameter to exist, specifying an
+ If an annotation and the Signature Source force a parameter to exist, specifying an
undefined data-type in the annotation still directs the decompiler to fill in
the variable's data-type using type propagation. The same holds true for the return value; an
undefined annotation fixes the size of the return value, but the decompiler
@@ -1886,10 +1986,13 @@
Currently, the entire body of the function is included
in the scope of any stack annotation, and the decompiler will allow only a single variable to exist
at the stack address. A stack annotation can be a formal parameter to the function, but otherwise the
- decompiler does not expect to see a value that exists before the start of the function. Similarly,
- because the value is not expected to persist after the function executes, a write to the stack
- address may not show up as an explicit assignment to the variable if the value is simply
- propagated through to another expression.
+ decompiler does not expect to see a value that exists before the start of the function.
+
+
+ The decompiler will continue to perform copy propagation and other transforms on
+ stack locations associated with a variable annotation. In particular, within decompiler output,
+ a specific write operation to a stack address may not show up as an explicit assignment to its variable,
+ if the value is simply copied to another location.
@@ -2024,7 +2127,7 @@
If the boolean property in-line is turned on for a particular function,
it directs the decompiler to inline the effects of the function into the decompilation of any of its calling functions.
- The function will no longer appear as a direct subfunction call in the decompilation, but all of its data-flow
+ The function will no longer appear as a direct function call in the decompilation, but all of its data-flow
will be incorporated into the calling function.
@@ -2040,8 +2143,8 @@
This property is similar in spirit to marking a function as in-line.
A call-fixup directs the decompiler to replace any call to the function with a specific
- chunk of raw p-code. The decompilation of any calling function no longer shows the subfunction call, but the chunk
- of p-code incorporates the subfunction's effects.
+ chunk of raw p-code. The decompilation of any calling function no longer shows the function call, but the chunk
+ of p-code incorporates the called function's effects.
Call-fixups are more flexible than just inlining a function. The call-fixup chunk can be tailored to incorporate all of,
@@ -2056,6 +2159,41 @@
+
+ Signature Source
+
+ Ghidra records a Signature Source for every function,
+ indicating the origin of its prototype information. This is
+ similar to the Symbol Source attached to Ghidra's symbol annotations
+ (See the documentation for
+ Filtering
+ in the Symbol Table). The possible types are:
+
+
+ DEFAULT - for basic or no information
+ ANALYSIS - for information derived by an Analyzer
+ IMPORTED - for information imported from an external source
+ USER_DEFINED - for information set by the user
+
+
+
+
+ Upon import of the Program, if there are debugging symbols available, Ghidra will build
+ annotations of the function's parameters and set the Symbol Source type to IMPORTED.
+ Otherwise, it will generally be set to DEFAULT.
+
+
+ However, Ghidra adjusts the Signature Source for a function if there is any change to the
+ prototype. In particular, if the user adds, removes, or edits variable annotations
+ for the function's parameters or return value, Ghidra automatically converts the Signature
+ Source to be USER_DEFINED.
+
+
+ If the Signature Source is set to anything other than DEFAULT, the
+ function's prototype information is forcing on the decompiler. See the discussion
+ in
+
+ Discovering Parameters
@@ -2066,7 +2204,7 @@
The input parameters and return value are all forced on the decompiler as a unit based on the
- Signature Source-Type. They are all forced if the Source-Type is set to anything
+ Signature Source. They are all forced if the type is set to anything
other than DEFAULT; otherwise none of them are forced.
@@ -2164,7 +2302,7 @@
Signed or zero extensionBitwise negation
- Integer negation - Twos complement
+ Integer negation - Two's complementAdd or subtract 1
@@ -2242,6 +2380,53 @@
If a register value's region starts in the middle of a function, decompilation is not
affected at all.
+
+ Context Registers
+
+ There is a special class of registers, called context registers whose
+ values have a different affect on analysis and decompilation than described above.
+
+
+ Context registers are inputs to the disassembly decoding process and directly affect which
+ machine instructions are created.
+
+
+ The value in a context register is examined when Ghidra decodes machine instructions from the underlying
+ bytes in the Program. A specific value generally corresponds to a specific execution mode
+ of the processor. The ARM processor T bit for instance, which selects whether the
+ processor is executing ARM or THUMB instructions, is modeled as a context register in Ghidra.
+ The same set of bytes in the Program can be decoded to machine instructions in more than one way,
+ depending on context register values.
+
+
+ Bytes are typically decoded once using context register values
+ established at the time of disassembly. From Ghidra's more static view of execution, a context register holds
+ only a single value at any point in the code, but the same context register can hold different values for
+ different regions of code. Setting a new value on a region of the Program will affect any subsequent disassembly
+ of code within that region.
+
+
+ If a context register value is changed for a region that has already been disassembled, in order to see
+ the affect of the change, the machine instructions in the region need to be cleared, and disassembly needs
+ to be triggered again. See the documentation on the
+ Clear Plugin.
+
+
+ Values for a context register are set in the same way as any other register, using the
+ Set Register Values ... command
+ described above. Within the
+ Register Manager window,
+ context registers are generally grouped together under the (pseudo-register) heading, contextreg.
+ For details about how context registers are used in processor modeling, see
+ the document "SLEIGH: A Language for Rapid Processor Specification".
+
+
+ Because context registers affect machine instructions, they also affect the underlying p-code and
+ have a substantial impact on decompilation. Although details vary by processor, context register
+ values are typically established during the initial import and analysis of a Program and aren't changed
+ frequently.
+
+
@@ -2286,7 +2471,7 @@
-
+ Cache Size (Functions)
@@ -2298,7 +2483,7 @@
-
+ Decompiler Max-Payload (MBytes)
@@ -2311,7 +2496,7 @@
-
+ Decompiler Timeout (seconds)
@@ -2338,13 +2523,13 @@
-
+ Alias Blocking
When deciding if an individual stack location has become dead, the decompiler
must consider aliases, pointers onto the stack that could
- be used to modify the location within a sub-function. One strong heuristic the decompiler
+ be used to modify the location within a called function. One strong heuristic the decompiler
uses is; if the user has explicitly created a variable on the stack between the
base location referenced by the pointer and the individual stack location, then
the decompiler can assume that the pointer is not an alias of the stack location.
@@ -2371,7 +2556,7 @@
-
+ Eliminate unreachable code
@@ -2384,7 +2569,7 @@
-
+ Ignore unimplemented instructions
@@ -2400,7 +2585,7 @@
-
+ Infer constant pointers
@@ -2413,7 +2598,7 @@
-
+ Respect read-only flags
@@ -2434,7 +2619,7 @@
-
+ Simplify extended integer operations
@@ -2445,7 +2630,7 @@
-
+ Simplify predication
@@ -2458,7 +2643,7 @@
-
+ Use in-place assignment operators
@@ -2481,7 +2666,7 @@
-
+ Background Color
@@ -2489,7 +2674,7 @@
-
+ Color for <token>
@@ -2510,7 +2695,7 @@
-
+ Color Default
@@ -2520,7 +2705,7 @@
-
+ Color for Current Variable Highlight
@@ -2528,7 +2713,7 @@
-
+ Color for Highlighting Find Matches
@@ -2537,7 +2722,7 @@
-
+ Comment line indent level
@@ -2547,7 +2732,7 @@
-
+ Comment style
@@ -2556,7 +2741,7 @@
-
+ Disable printing of type casts
@@ -2593,7 +2778,7 @@
-
+ Display Header comment
@@ -2604,7 +2789,7 @@
-
+ Display Line Numbers
@@ -2616,7 +2801,7 @@
-
+ Display Namespaces
@@ -2647,7 +2832,7 @@
-
+ Display Warning comments
@@ -2657,7 +2842,7 @@
-
+ Font
@@ -2667,7 +2852,7 @@
-
+ Integer format
@@ -2692,7 +2877,7 @@
-
+ Maximum characters in a code line
@@ -2703,7 +2888,7 @@
-
+ Number of characters per indent level
@@ -2714,7 +2899,7 @@
-
+ Print 'NULL' for null pointers
@@ -2725,7 +2910,7 @@
-
+ Print calling convention name
@@ -2774,7 +2959,14 @@
To display the decompiler window, position the cursor on a
function in the Code Browser, then select the
- icon from the tool bar, or the
+
+
+
+
+
+
+
+ icon from the tool bar, or the
Decompile option from the
Window menu in the tool.
@@ -2838,14 +3030,22 @@
Main Window
- Initially pushing or selecting
- Decompile from the Window menu in the tool
- brings up the main window. The main window always displays the function
- at the current address within the Code Browser and follows as the user navigates
- within the Program. Any mouse click, menu option, or other action causing the cursor to move to a new
- address in the Listing also causes the main window to display the function containing that address.
- Navigation to new functions is also possible from within the window by double-clicking on function
- tokens (See ).
+ Initially pushing
+
+
+
+
+
+
+
+ or selecting
+ Decompile from the Window menu in the tool
+ brings up the main window. The main window always displays the function
+ at the current address within the Code Browser and follows as the user navigates
+ within the Program. Any mouse click, menu option, or other action causing the cursor to move to a new
+ address in the Listing also causes the main window to display the function containing that address.
+ Navigation to new functions is also possible from within the window by double-clicking on function
+ tokens (See ).
@@ -2868,7 +3068,7 @@
Operator tokens map to the machine instruction which performed that operation.
- Function Name tokens, if they represent a call to a sub-function, map to the
+ Function Name tokens, if they represent a call to another function, map to the
machine instruction executing the call.
@@ -2887,8 +3087,16 @@
Snapshot Windows
- Pressing the
- icon in another Decompiler window's toolbar causes a Snapshot window
+ Pressing the
+
+
+
+
+
+
+
+ icon
+ in another Decompiler window's toolbar causes a Snapshot window
to be created, which initially shows decompilation of the same function. Multiple
Snapshot windows can be brought up to show decompilation of different functions
simultaneously. Snapshot
@@ -2915,6 +3123,41 @@
+
+ Undefined Functions
+
+ If the current location within the Code Browser is in disassembled code, but that code
+ is not contained in a Formal Function Body,
+ then the decompiler window invents a function body on the fly called an
+ Undefined Function. The background color of the window
+ is changed to gray to indicate this special state.
+
+
+
+
+
+
+
+
+ The entry point address of the Undefined Function is chosen by
+ backtracking through the code's control-flow from the current location to the start of
+ a basic block that has no flow coming in except possibly from call instructions.
+ During decompilation, a function body is computed from the selected entry point (as with any function)
+ based on control-flow up to instructions with terminator semantics.
+
+
+ The current address, as indicated by the cursor in the Listing Window for instance, is
+ generally not the entry of the invented function, but the current address will be
+ contained somewhere in the body.
+
+
+ For display purposes in the window, the invented function is given a name based on the
+ computed entry point address with the prefix UndefinedFunction. The function
+ is assigned the default calling convention, and parameters are discovered as part of
+ the decompiler's analysis.
+
+
+
Tool Bar
@@ -2926,7 +3169,14 @@
Export to C
- - button
+
+
+
+
+
+
+
+ - button
Exports the decompiled result of the current function to a file. A file chooser
@@ -2934,12 +3184,27 @@
is not specified, a ".c" is appended to the filename. If the file already exists,
a final dialog is presented to confirm that the file should be overwritten.
+
+ This action exports a single function at a time. The user can export all functions
+ simultaneously from the Code Browser, by selecting the menu
+ File -> Export Program ... and then choosing
+ C/C++
+ from the drop-down menu. See the full documentation for
+ the Export dialog.
+ Snapshot
- - button
+
+
+
+
+
+
+
+ - button
Creates a new Snapshot window. The Snapshot window
@@ -2952,7 +3217,14 @@
Re-decompile
- - button
+
+
+
+
+
+
+
+ - button
Triggers a re-decompilation of the current function displayed in the window.
@@ -2970,7 +3242,14 @@
Copy
- - button
+
+
+
+
+
+
+
+ - button
Copies the currently selected text in the decompiler window to the clipboard.
@@ -3005,7 +3284,7 @@
and render it using the current Graph Service.
- If no Graph Service is available then this action will no be present.
+ If no Graph Service is available then this action will not be present.
@@ -3052,10 +3331,10 @@
- Sub-function Symbols
+ Function Symbols
- Double clicking a sub-function name causes the
- window itself to navigate away from its current function to the sub-function, triggering
+ Double clicking a called function name causes the
+ window itself to navigate away from its current function to the called function, triggering
a new decompilation if necessary and changing its display.
@@ -3101,7 +3380,7 @@
Control Double Click
Opens a new Snapshot window, navigating it to the selected symbol.
- This is a convenience for immediately decompiling and displaying a sub-function in a
+ This is a convenience for immediately decompiling and displaying a called function in a
new window, without disturbing the active window. The behavior is similar to the
Double Click action, the selected token must represent a function name symbol or possibly
a constant address, but the navigation occurs in the new Snapshot window.
@@ -3307,7 +3586,7 @@
The action is available from any token in the decompiler window. Most tokens trigger editing
- of the current function itself, but a subfunction can be edited by putting the cursor on
+ of the current function itself, but a called function can be edited by putting the cursor on
its name specifically.
@@ -3394,7 +3673,8 @@
Highlight all variable tokens where the value at that point in the function is
directly affected by the value at the selected variable token.
A token is highlighted if there is a direct data-flow path
- starting from the selected point and ending at the token.
+ starting from the selected point and ending at the token. A call operation is not
+ considered a direct data-flow path from its input parameters to its output value.
In the following example, the token b, the output of
@@ -3414,26 +3694,32 @@
directly affects the value at the selected variable token.
A token is highlighted if there is a direct data-flow path
starting from the token and ending at the selected point.
+ A call operation is not considered a direct data-flow path from its input parameters
+ to its output value.
- Forward Inst Slice
+ Forward Operator Slice
Highlight every operator token that manipulates a value directly affected by the
value at the selected variable token. Along each direct data-flow path that
starts
at the selected point, each token representing an operation is highlighted, along with
- any explicit variable it writes to.
+ any explicit variable read or written by the operation. A call operation is not
+ considered a direct data-flow path from its input parameters to its output value.
+ This is an alternate presentation of the slice displayed by the Forward Slice action.
- Backward Inst Slice
+ Backward Operator Slice
Highlight every operator token that manipulates a value that directly affects the
value at the selected variable token. Along each direct data-flow path that
ends
at the selected point, each token representing an operation is highlighted, along with
- any explicit variable it writes to.
+ any explicit variable read or written by the operation. A call operation is not
+ considered a direct data-flow path from its input parameters to its output value.
+ This is an alternate presentation of the slice displayed by the Backward Slice action.
@@ -3491,19 +3777,19 @@
Override Signature
- Override the function prototype corresponding to the sub-function under the cursor.
+ Override the function prototype corresponding to the function under the cursor.
This action can be triggered at call sites, where the function
- being decompiled is calling into another sub-function. Users must select either the token representing
- the sub-function's name or the tokens representing the function pointer at the call site.
+ being decompiled is calling into another function. Users must select either the token representing
+ the called function's name or the tokens representing the function pointer at the call site.
A dialog is brought up where the a complete function declaration, specifying
the return data-type along with the name and data-type for each input parameter. Additionally,
the "Calling Convention", "In Line", and "No Return" properties of the function prototype
can be set (See ).
- Confirming the dialog forces the new function prototype on the decompiler's view of the sub-function,
+ Confirming the dialog forces the new function prototype on the decompiler's view of the called function,
but only for the single selected call site.
@@ -3566,12 +3852,12 @@
Remove Signature Override
- Remove the overriding function prototype applied previously to the sub-function under the cursor.
+ Remove the overriding function prototype applied previously to the called function under the cursor.
This action can only be triggered at call sites, where an overriding
prototype was previously placed by the command. As with
- this command, users must select either the token representing the sub-function's name or the
+ this command, users must select either the token representing the called function's name or the
tokens representing the function pointer at the call site. The action causes the
override to be removed immediately. Parameter information will be drawn from the decompiler's
normal analysis.
@@ -3585,7 +3871,7 @@
The current function can be renamed by selecting the name token within the function's
- declaration at the top of the decompiler window, or individual subfunctions
+ declaration at the top of the decompiler window, or individual called functions
can be renamed by selecting their name token within a call expression.
This action brings up a dialog containing a text field prepopulated with the
name to be changed. The current namespace (and any parent namespaces) is
@@ -3758,6 +4044,12 @@
input parameters to be created as well. In this situation, the action is equivalent to
, and a confirmation dialog comes up to notify the user.
+
+ Setting a data-type on the return value using this action affects decompilation for the
+ function itself and, additionally, any function that calls this function. Within a calling
+ function, the decompiler propagates the data-type into the variable or expression incorporating
+ the return value at each call site.
+
@@ -3791,6 +4083,11 @@
parameter did not exist previously, then its data-type is not
forced by this action.
+
+ Data-type information applied to parameters using this action is especially impactful
+ because it affects decompilation for the function owning the parameter and, additionally,
+ any function that calls this owning function.
+
diff --git a/Ghidra/Features/Decompiler/src/main/help/help/shared/languages.css b/Ghidra/Features/Decompiler/src/main/help/help/shared/languages.css
index 092cafc875..b113644639 100644
--- a/Ghidra/Features/Decompiler/src/main/help/help/shared/languages.css
+++ b/Ghidra/Features/Decompiler/src/main/help/help/shared/languages.css
@@ -18,8 +18,10 @@
FrontPage.css.
*/
-h5 { margin-left: 10px; }
-div.informalexample { margin-left: 50px; }
+h5 { margin-left: 10px; margin-top: 20px; font-family:times new roman; font-size:12pt; font-style:italic; }
+div.informalexample { margin-left: 50px; margin-top: 10px; }
+dd { margin-bottom: 20px; }
+dd p { margin-top: 5px; margin-left: 10px; }
span.term { font-family:times new roman; font-size:14pt; font-weight:bold; }
span.code { font-weight: bold; font-family: courier new; font-size: 14pt; color:#000000; }
diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html
index 04aecdc6c7..dbf4bd5c7e 100644
--- a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html
+++ b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html
@@ -26,8 +26,12 @@
processor specific form into Ghidra's IR language (see “P-code”),
which provides both the control-flow behavior of the instruction and the detailed
semantics describing how the processor and memory state is affected. The translation is controlled by
- the underlying processor model and cannot be directly altered from the tool. Users
- can modify the model specification itself, however.
+ the underlying processor model and, except in limited circumstances, cannot be directly altered
+ from the tool. Flow Overrides (see below) can change how certain control-flow is translated,
+ and, depending on the processor, context registers may affect p-code (see “Context Registers”).
+
+
+ Outside of the tool, users can modify the model specification itself.
See the document "SLEIGH: A Language for Rapid Processor Specification".
@@ -37,7 +41,7 @@
fall through, conditional jump, and other
semantics until an instruction with terminator semantics is
reached, which is usually a "return from subroutine"
- instruction. Flow is not traced into subfunctions, in this situation. Instructions
+ instruction. Flow is not traced into called functions, in this situation. Instructions
with call semantics are treated only as if they fall through.
@@ -103,11 +107,17 @@
Flow Overrides
- Control-flow behavior for machine instructions is generally determined by the underlying
- Processor model. But this flow can be overridden for individual instructions by various
- Analyzers or manually by the user. The decompiler incorporates these overrides into its
- analysis, and they can have a significant impact on results. Current
- flow overrides include:
+ Control-flow behavior for a machine instruction is generally determined by its underlying
+ p-code (see “P-code Control Flow”), but this can be changed by applying a Flow Override.
+ A Flow Override maintains the overall semantics of a branching instruction
+ but changes how the branch is interpreted. For instance, a JMP instruction, which traditionally
+ represents a branch within a single function, can be overridden to represent a call to a new function.
+ Flow Overrides are applied by Analyzers or manually by the user.
+
+
+ The decompiler automatically incorporates any relevant Flow Overrides into its
+ analysis of a function. This can have a significant impact on results. The
+ types of possible Flow Overrides include:
@@ -413,8 +423,11 @@
Duplicate Symbols
- Ghidra allows duplicate symbols, including duplicate function names, in
- any scope higher than a function.
+ Ghidra allows different functions to have the same name, even within the same
+ namespace, in order to model languages that support function overloading.
+ In most languages, such functions would be expected to have distinct prototypes to allow
+ the symbols to be distinguished in context. Ghidra and the decompiler however do not check
+ for this, as prototypes may not be known.
@@ -763,21 +776,20 @@
For annotations that specifically label a function's formal parameters or return value,
- the Signature Source-Type also affects how they're treated by the decompiler.
- If the Source-Type is set to anything other than DEFAULT, there is a forced
+ the Signature Source also affects how they're treated by the decompiler.
+ If the Signature Source is set to anything other than DEFAULT, there is a forced
one-to-one correspondence between variable annotations and actual parameters in the decompiler's
view of the function. This is stronger than just forcing the data-type; the existence (or not) of
- the variable itself is forced by the annotation in this case. If the Source-Type is forcing and
+ the variable itself is forced by the annotation in this case. If the Signature Source is forcing and
there are no parameter annotations, a void prototype is forced on the function.
- A forcing Source-Type, with a value other than
- DEFAULT, is set typically if debug symbols for the function are read in during
+ A forcing Signature Source is set typically if debug symbols for the function are read in during
Program import (IMPORTED), or if the user manually edits the function prototype
directly (USER_DEFINED).
- If an annotation and the Signature Source-Type force a parameter to exist, specifying an
+ If an annotation and the Signature Source force a parameter to exist, specifying an
undefined data-type in the annotation still directs the decompiler to fill in
the variable's data-type using type propagation. The same holds true for the return value; an
undefined annotation fixes the size of the return value, but the decompiler
@@ -867,10 +879,13 @@
Currently, the entire body of the function is included
in the scope of any stack annotation, and the decompiler will allow only a single variable to exist
at the stack address. A stack annotation can be a formal parameter to the function, but otherwise the
- decompiler does not expect to see a value that exists before the start of the function. Similarly,
- because the value is not expected to persist after the function executes, a write to the stack
- address may not show up as an explicit assignment to the variable if the value is simply
- propagated through to another expression.
+ decompiler does not expect to see a value that exists before the start of the function.
+
+
+ The decompiler will continue to perform copy propagation and other transforms on
+ stack locations associated with a variable annotation. In particular, within decompiler output,
+ a specific write operation to a stack address may not show up as an explicit assignment to its variable,
+ if the value is simply copied to another location.
@@ -1004,7 +1019,7 @@
If the boolean property in-line is turned on for a particular function,
it directs the decompiler to inline the effects of the function into the decompilation of any of its calling functions.
- The function will no longer appear as a direct subfunction call in the decompilation, but all of its data-flow
+ The function will no longer appear as a direct function call in the decompilation, but all of its data-flow
will be incorporated into the calling function.
@@ -1018,8 +1033,8 @@
This property is similar in spirit to marking a function as in-line.
A call-fixup directs the decompiler to replace any call to the function with a specific
- chunk of raw p-code. The decompilation of any calling function no longer shows the subfunction call, but the chunk
- of p-code incorporates the subfunction's effects.
+ chunk of raw p-code. The decompilation of any calling function no longer shows the function call, but the chunk
+ of p-code incorporates the called function's effects.
Call-fixups are more flexible than just inlining a function. The call-fixup chunk can be tailored to incorporate all of,
@@ -1036,6 +1051,49 @@
+Signature Source
+
+
+ Ghidra records a Signature Source for every function,
+ indicating the origin of its prototype information. This is
+ similar to the Symbol Source attached to Ghidra's symbol annotations
+ (See the documentation for
+ Filtering
+ in the Symbol Table). The possible types are:
+
+
+
+
+DEFAULT - for basic or no information
+
+ANALYSIS - for information derived by an Analyzer
+
+IMPORTED - for information imported from an external source
+
+USER_DEFINED - for information set by the user
+
+
+
+
+
+ Upon import of the Program, if there are debugging symbols available, Ghidra will build
+ annotations of the function's parameters and set the Symbol Source type to IMPORTED.
+ Otherwise, it will generally be set to DEFAULT.
+
+
+ However, Ghidra adjusts the Signature Source for a function if there is any change to the
+ prototype. In particular, if the user adds, removes, or edits variable annotations
+ for the function's parameters or return value, Ghidra automatically converts the Signature
+ Source to be USER_DEFINED.
+
+
+ If the Signature Source is set to anything other than DEFAULT, the
+ function's prototype information is forcing on the decompiler. See the discussion
+ in “Forcing Data-types”
+
+
+
+
Discovering Parameters
@@ -1051,7 +1109,7 @@
The input parameters and return value are all forced on the decompiler as a unit based on the
- Signature Source-Type. They are all forced if the Source-Type is set to anything
+ Signature Source. They are all forced if the type is set to anything
other than DEFAULT; otherwise none of them are forced.
@@ -1165,7 +1223,7 @@
Signed or zero extension
Bitwise negation
-
Integer negation - Twos complement
+
Integer negation - Two's complement
Add or subtract 1
@@ -1252,6 +1310,61 @@
If a register value's region starts in the middle of a function, decompilation is not
affected at all.
+
+
+Context Registers
+
+
+ There is a special class of registers, called context registers whose
+ values have a different affect on analysis and decompilation than described above.
+
+
+
+
+
+
+
+ Context registers are inputs to the disassembly decoding process and directly affect which
+ machine instructions are created.
+
+
+
+ The value in a context register is examined when Ghidra decodes machine instructions from the underlying
+ bytes in the Program. A specific value generally corresponds to a specific execution mode
+ of the processor. The ARM processor T bit for instance, which selects whether the
+ processor is executing ARM or THUMB instructions, is modeled as a context register in Ghidra.
+ The same set of bytes in the Program can be decoded to machine instructions in more than one way,
+ depending on context register values.
+
+
+ Bytes are typically decoded once using context register values
+ established at the time of disassembly. From Ghidra's more static view of execution, a context register holds
+ only a single value at any point in the code, but the same context register can hold different values for
+ different regions of code. Setting a new value on a region of the Program will affect any subsequent disassembly
+ of code within that region.
+
+
+ If a context register value is changed for a region that has already been disassembled, in order to see
+ the affect of the change, the machine instructions in the region need to be cleared, and disassembly needs
+ to be triggered again. See the documentation on the
+ Clear Plugin.
+
+
+ Values for a context register are set in the same way as any other register, using the
+ Set Register Values ... command
+ described above. Within the
+ Register Manager window,
+ context registers are generally grouped together under the (pseudo-register) heading, contextreg.
+ For details about how context registers are used in processor modeling, see
+ the document "SLEIGH: A Language for Rapid Processor Specification".
+
+
+ Because context registers affect machine instructions, they also affect the underlying p-code and
+ have a substantial impact on decompilation. Although details vary by processor, context register
+ values are typically established during the initial import and analysis of a Program and aren't changed
+ frequently.
+