From 690ca3ff2bc4c38bd4b243ab9eee40823034158f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Luke=20Sern=C3=A9?=
Table . P-code Operations
+Table . P-code Operations
Additional P-CODE Operations | ||||
---|---|---|---|---|
-Prev | -- | Next +Prev | + | + Next |
-Prev | -- | Next +Prev | + | + Next |
Pseudo P-CODE Operations | +Pseudo P-CODE Operations | Home | -Syntax Reference | + Syntax Reference |
This instruction loads data from a dynamic location into the output -variable by dereferencing a pointer. The “pointer” comes in two +variable by dereferencing a pointer. The âpointerâ comes in two pieces. One piece, input1, is a normal variable containing the offset of the object being pointed at. The other piece, input0, is a constant indicating the space into which the offset applies. The data in input1 @@ -133,7 +133,7 @@ loaded by this instruction is determined by the size of the output variable. It is easy to confuse the address space of the output and input1 variables and the Address Space represented by the ID, which could all be different. Unlike many programming models, there are -multiple spaces that a “pointer” can refer to, and so an extra ID is +multiple spaces that a âpointerâ can refer to, and so an extra ID is required.
@@ -194,7 +194,7 @@ correct byte offset into the space. This instruction is the complement of LOAD. The data in the variable input2 is stored at a dynamic location by dereferencing a pointer. As -with LOAD, the “pointer” comes in two +with LOAD, the âpointerâ comes in two pieces: a space ID part, and an offset variable. The size of input1 must match the address space specified by the ID, and the amount of data stored is determined by the size of input2. @@ -264,7 +264,7 @@ of the current machine instruction. This allows branching within the operations forming a single instruction. For example, if the BRANCH occurs as the pcode operation with index 5 for the instruction, it can branch to operation -with index 8 by specifying a constant destination “address” of +with index 8 by specifying a constant destination âaddressâ of 3. Negative constants can be used for backward branches.
@@ -1821,7 +1821,7 @@ sign-extended to the desired size. This is an unsigned integer division operation. Divide input0 by input1, truncating the result to the nearest integer, and store the result in output. Both inputs and output must be the same size. There -is no handling of division by zero. To simulate a processor’s handling +is no handling of division by zero. To simulate a processorâs handling of a division-by-zero trap, other operations must be used before the INT_DIV. @@ -1923,7 +1923,7 @@ This is a signed integer division operation. The resulting integer is the one closest to the rational value input0/input1 but which is still smaller in absolute value. Both inputs and output must be the same size. There is no handling of division by zero. To simulate a -processor’s handling of a division-by-zero trap, other operations must +processorâs handling of a division-by-zero trap, other operations must be used before the INT_SDIV. @@ -3024,15 +3024,15 @@ Input0 and output can be different sizes.-Prev | -- | Next +Prev | + | + Next |
P-Code Reference Manual | +P-Code Reference Manual | Home | -Pseudo P-CODE Operations | + Pseudo P-CODE Operations |
P-Code Reference Manual | |||||
---|---|---|---|---|---|
- | - | Next + | Â | +Â | +Â Next |
@@ -209,7 +209,7 @@ opcodes. Two of these, MULTIEQUAL and INDIRECT, are specific to the graph construction process, but other opcodes can be introduced during subsequent analysis and transformation of a graph and help hold recovered data-type relationships. -All of the new opcodes are described in the section called “Additional P-CODE Operations”, none of which can occur +All of the new opcodes are described in the section called âAdditional P-CODE Operationsâ, none of which can occur in the original raw p-code translation. Finally, a few of the p-code operators, CALL, CALLIND, and RETURN, @@ -319,7 +319,7 @@ its opcode. For almost all p-code operations, only the output varnode can have its value modified; there are no indirect effects of the operation. The only possible exceptions are pseudo operations, -see the section called “Pseudo P-CODE Operations”, which are sometimes necessary when there +see the section called âPseudo P-CODE Operationsâ, which are sometimes necessary when there is incomplete knowledge of an instruction's behavior.
@@ -342,7 +342,7 @@ The list of possible opcodes are similar to many RISC based instruction sets. The effect of each opcode is described in detail in the following sections, and a reference table is given -in the section called “Syntax Reference”. In general, the size or +in the section called âSyntax Referenceâ. In general, the size or precision of a particular p-code operation is determined by the size of the varnode inputs or output, not by the opcode.
@@ -353,15 +353,15 @@ of the varnode inputs or output, not by the opcode.- | - | Next + | Â | +Â | +Â Next |
- | - | P-Code Operation Reference | +Â | +Â | +Â P-Code Operation Reference |
-Prev | -- | Next +Prev | + | + Next |
P-Code Operation Reference | +P-Code Operation Reference | Home | -Additional P-CODE Operations | + Additional P-CODE Operations |
-Prev | -- | +Prev + |  | + |
Additional P-CODE Operations | +Additional P-CODE Operations | Home | -+ |  |
Although p-code is a distinct language from SLEIGH, because a major purpose of SLEIGH is to specify the translation from machine code to p-code, this document serves as a primer for p-code. The key concepts and terminology are presented in this section, and more detail is -given in Section 7.7, “The Semantic Section”. There is also a complete set +given in Section 7.7, âThe Semantic Sectionâ. There is also a complete set of tables which list syntax and descriptions for p-code operations in the Appendix.
@@ -179,9 +179,9 @@ general purpose processor. Code for different processors can be translated in a straightforward manner into p-code, and then a single suite of analysis software can be used to do data-flow analysis and decompilation. In this way, the analysis software -becomes retargetable, and it isn’t necessary to +becomes retargetable, and it isnât necessary to redesign it for each new processor being analyzed. It is only -necessary to specify the translation of the processor’s instruction +necessary to specify the translation of the processorâs instruction set into p-code.@@ -221,7 +221,7 @@ respectively.
An address space for p-code is a generalization of the indexed memory (RAM) that a typical processor has access to, and @@ -261,7 +261,7 @@ Typically, a processor can be modeled with only two spaces, a ram address space that represents the main memory accessible to the processor via its data-bus, and a register address space that is used to -implement the processor’s registers. However, the specification +implement the processorâs registers. However, the specification designer can define as many address spaces as needed.
@@ -272,14 +272,14 @@ semantics into individual p-code operations. It is called the unique space. There is also a special address space, called the const space, used as a placeholder for constant operands of p-code instructions. For the most -part, a SLEIGH specification doesn’t need to be aware of this space, +part, a SLEIGH specification doesnât need to be aware of this space, but it can be used in certain situations to force values to be interpreted as constants.
A varnode is the unit of data manipulated by p-code. It is simply a contiguous sequence of bytes in some address @@ -305,7 +305,7 @@ forces an interpretation on each varnode that it uses, as either an integer, a floating-point number, or a boolean value. In the case of an integer, the varnode is interpreted as having a big endian or little endian encoding, depending on the specification (see -Section 4.1, “Endianess Definition”). Certain instructions +Section 4.1, âEndianess Definitionâ). Certain instructions also distinguish between signed and unsigned interpretations. For a signed integer, the varnode is considered to have a standard twos complement encoding. For a boolean interpretation, the varnode must be @@ -322,7 +322,7 @@ must be provided and enforced by the specification designer.
P-code is intended to emulate a target processor by substituting a sequence of p-code operations for each machine instruction. Thus every @@ -352,7 +352,7 @@ general purpose processor instruction sets. They break up into groups.
Table 1. P-code Operations
+Table 1. P-code Operations
- | - | Next + | Â | +Â | +Â Next |
- | - | 2. Basic Specification Layout | + | + | + 2. Basic Specification Layout |
Fields are the basic building block for family symbols. The mechanisms for building up from fields to the @@ -56,11 +56,11 @@ to think of a constructor as a kind of table in and of itself. But it is only the table that has an actual family symbol identifier associated with it. Most of this chapter is devoted to describing how to define a single constructor. The issues involved in combining -multiple constructors into a single table are addressed in Section 7.8, “Tables”. +multiple constructors into a single table are addressed in Section 7.8, âTablesâ.
A single complex statement in the specification file describes a constructor. This statement is always made up of five distinct @@ -92,12 +92,12 @@ in turn.
Every constructor must be part of a table, which is the element with an actual family symbol identifier associated with it. So each constructor starts with the identifier of the table it belongs to -followed by a colon ‘:’. +followed by a colon â:â.
mode1: ... @@ -122,18 +122,18 @@ identifier. The identifier instruction is actually reserved for the root table, but should not be used in the table header as the SLEIGH parser uses the blank identifier to help distinguish assembly -mnemonics from operands (see Section 7.3.1, “Mnemonic”). +mnemonics from operands (see Section 7.3.1, âMnemonicâ).
The display section consists of all characters -after the table header ‘:’ up to the SLEIGH -keyword is. The section’s primary +after the table header â:â up to the SLEIGH +keyword is. The sectionâs primary purpose is to assign disassembly display meaning to the -constructor. The section’s secondary purpose is to define local +constructor. The sectionâs secondary purpose is to define local identifiers for the pieces out of which the constructor is being built. Characters in the display section are treated as literals with the following exceptions. @@ -151,7 +151,7 @@ the following exceptions.
-In particular, all punctuation except ‘^’ loses its special +In particular, all punctuation except â^â loses its special meaning. Those identifiers that are not treated as literals are considered to be new, initially undefined, family symbols. We refer to these new symbols as the operands of the constructor. And for root constructors, these operands frequently correspond to the natural assembly operands. Thinking of it as a family symbol, the -constructor’s display meaning becomes the string of literals itself, +constructorâs display meaning becomes the string of literals itself, with each identifier replaced with the display meaning of the symbol corresponding to that identifier.
@@ -182,11 +182,11 @@ mode1: ( op1 ),op2 is ... In the above example, a constructor for table mode1 is being built out of two pieces, symbol op1 and -symbol op2. The characters ‘(‘, ’)’, and ‘,’ +symbol op2. The characters â(â, â)â, and â,â become literal parts of the disassembly display for symbol mode1. After the display strings for op1 and op2 are found, they are inserted into the -string of literals, forming the constructor’s display string. The +string of literals, forming the constructorâs display string. The white space characters surrounding the op1 identifier are preserved as part of this string. @@ -198,7 +198,7 @@ but only their identifiers are established in the display section.If the constructor is part of the root instruction table, the first string of characters in the display section that does not contain @@ -212,8 +212,8 @@ if it is legal.
-In the above example, the string “var1” is treated as a symbol -identifier, but the string “and” is considered to be the mnemonic of +In the above example, the string âvar1â is treated as a symbol +identifier, but the string âandâ is considered to be the mnemonic of the instruction.
@@ -230,10 +230,10 @@ no such requirement.
-The ‘^’ character in the display section is used to separate -identifiers from other characters where there shouldn’t be white space +The â^â character in the display section is used to separate +identifiers from other characters where there shouldnât be white space in the disassembly display. This can be used in any manner but is usually used to attach display characters from a local symbol to the literal characters of the mnemonic. @@ -244,7 +244,7 @@ literal characters of the mnemonic.
-In the above example, “bra” is treated as literal characters in the
+In the above example, âbraâ is treated as literal characters in the
resulting display string followed immediately, with no intervening
spaces, by the display string of the local
symbol cc. Thus the whole constructor actually
@@ -253,39 +253,39 @@ identifiers cc,
-If the ‘^’ is used as the first (non-whitespace) character in the +If the â^â is used as the first (non-whitespace) character in the display section of a base constructor, this inhibits the first identifier in the display from being considered the mnemonic, as -described in Section 7.3.1, “Mnemonic”. This allows +described in Section 7.3.1, âMnemonicâ. This allows specification of less common situations, where the first part of the mnemonic, rather than perhaps a later part, needs to be considered as -an operand. An initial ‘^’ character can also facilitate certain +an operand. An initial â^â character can also facilitate certain recursive constructions.
Syntactically, this section comes between the keyword is and the delimiter for the -following section, either an ‘{‘ or an ‘[‘. The bit pattern +following section, either an â{â or an â[â. The bit pattern section describes a -constructor’s pattern, the subset of possible +constructorâs pattern, the subset of possible instruction encodings that the designer wants to match the constructor being defined.
The patterns required for processor specifications can almost always be described as a mask and value pair. Given a specific instruction encoding, we can decide if the encoding matches our pattern by looking at just the bits specified by the mask and seeing if they match a specific value. The fields, as -defined in Section 6.1, “Defining Tokens and Fields”, typically give us +defined in Section 6.1, âDefining Tokens and Fieldsâ, typically give us our masks. So to construct a pattern, we can simply require that the field take on a specific value, as in the example below.
@@ -294,9 +294,9 @@ field take on a specific value, as in the example below.Assuming the symbol opcode was defined as a field, this says that a -root constructor with mnemonic “halt” matches any instruction where +root constructor with mnemonic âhaltâ matches any instruction where the bits defining this field have the value 0x15. The equation -“opcode=0x15” is called a constraint. +âopcode=0x15â is called a constraint.
The standard bit encoding of the integer is used when restricting the @@ -311,13 +311,13 @@ field.
More complicated patterns are built out of logical operators. The meaning of these are fairly straightforward. We can force two or more constraints to be true at the same time, a logical -and ‘&’, or we can require that either one constraint or -another must be true, a logical or ‘|’. By using these with +and â&â, or we can require that either one constraint or +another must be true, a logical or â|â. By using these with constraints and parentheses for grouping, arbitrarily complicated patterns can be constructed.
@@ -337,11 +337,11 @@ requires two or more mask/value style checks to correctly implement.The principle way of defining a constructor operand, left undefined from the display section, is done in the bit pattern section. If an -operand’s identifier is used by itself, not as part of a constraint, +operandâs identifier is used by itself, not as part of a constraint, then the operand takes on both the display and semantic definition of the global symbol with the same identifier. The syntax is slightly confusing at first. The identifier must appear in the pattern as if it @@ -390,13 +390,13 @@ parsers, a SLEIGH specification is in part a grammar specification. The terminal symbols, or tokens, are the bits of an instruction, and the constructors and tables are the non-terminating symbols. These all build up to the root instruction table, the -grammar’s start symbol. So this link from local to global is simply a +grammarâs start symbol. So this link from local to global is simply a statement of the grouping of old symbols into the new constructor.
There are some additional complexities to designing a specification for a processor with variable length instructions. Some initial @@ -419,14 +419,14 @@ designer control over how tokens fit together.
The most important operator for patterns defining variable length -instructions is the concatenation operator ‘;’. When building a +instructions is the concatenation operator â;â. When building a constructor with fields from two or more tokens, the pattern must explicitly define the order of the tokens. In terms of the logic of -the pattern expressions themselves, the ‘;’ operator has the same -meaning as the ‘&’ operator. The combined expression matches only if +the pattern expressions themselves, the â;â operator has the same +meaning as the â&â operator. The combined expression matches only if both subexpressions are true. However, it also requires that the subexpressions involve multiple tokens and explicitly indicates an order for them. @@ -456,7 +456,7 @@ corresponding encoding. The second instruction, add, uses fields op and reg, but it also uses field imm16 contained -in immtoken. The ‘;’ operator indicates that +in immtoken. The â;â operator indicates that token base (via its fields) comes first in the encoding, followed by immtoken. The constraints on base will therefore correspond to constraints @@ -466,25 +466,25 @@ bytes. The length of the final encoding for add< will be 3 bytes, the sum of the lengths of the two tokens.
-If two pattern expressions are combined with the ‘&’ or ‘|’ operator, -where the concatenation operator ‘;’ is also being used, the designer +If two pattern expressions are combined with the â&â or â|â operator, +where the concatenation operator â;â is also being used, the designer must make sure that the tokens underlying each expression are the same and come in the same order. In the example add -instruction for instance, the ‘&’ operator combines the “op=3” and -“reg” expressions. Both of these expressions involve only the +instruction for instance, the â&â operator combines the âop=3â and +âregâ expressions. Both of these expressions involve only the token base, so the matching requirement is -satisfied. The ‘&’ and ‘|’ operators can combine expressions built out +satisfied. The â&â and â|â operators can combine expressions built out of more than one token, but the tokens must come in the same -order. Also these operators have higher precedence than the ‘;’ +order. Also these operators have higher precedence than the â;â operator, so parentheses may be necessary to get the intended meaning.
-The ellipsis operator ‘...’ is used to satisfy the token matching -requirements of the ‘&’ and ‘|’ operators (described in the previous +The ellipsis operator â...â is used to satisfy the token matching +requirements of the â&â and â|â operators (described in the previous section), when the operands are of different lengths. The ellipsis is a unary operator applied to a pattern expression that extends its token length before it is combined with another expression. Depending @@ -496,7 +496,7 @@ extension. addrmode: reg is reg & mode=0 { ... addrmode: #imm16 is mode=1; imm16 { ... -:xor “A”,addrmode is op=4 ... & addrmode { ... +:xor âAâ,addrmode is op=4 ... & addrmode { ...
@@ -527,7 +527,7 @@ whatever the length of addrmode turns out
Since the op constraint occurs to the left of the ellipsis, it is considered left justified, and the matching -requirement for ‘&’ will insist that base is the +requirement for â&â will insist that base is the first token in all forms of addrmode. This allows the xor instruction's constraint on op and the addrmode @@ -538,7 +538,7 @@ constraints on a single byte in the final encoding.
It is not necessary for a global symbol, which is needed by a constructor, to appear in the display section of the definition. If @@ -549,15 +549,15 @@ operand. Such an operand behaves and is parsed exactly like any other operand but there is absolutely no visible indication of the operand in the final display of the assembly instruction. The one common type of instruction that uses this is the relative branch (see -Section 7.5.1, “Relative Branches”) but it is otherwise needed +Section 7.5.1, âRelative Branchesâ) but it is otherwise needed only in more esoteric instructions. It is useful in situations where you need to break up the parsing of an instruction along lines that -don’t quite match the assembly. +donât quite match the assembly.
Occasionally there is a need for an empty pattern when building tables. An empty pattern matches everything. There is a predefined @@ -567,9 +567,9 @@ to indicate an empty pattern.
-A constraint does not have to be of the form “field = constant”, +A constraint does not have to be of the form âfield = constantâ, although this is almost always what is needed. In certain situations, it may be more convenient to use a different kind of constraint. Special care should be taken when designing these @@ -584,7 +584,7 @@ of parsing states for a single constraint. A constraint can actually be built out of arbitrary expressions. These pattern expressions are more commonly used in disassembly actions and are defined in -Section 7.5.2, “General Actions and Pattern Expressions”, but they can also be used in +Section 7.5.2, âGeneral Actions and Pattern Expressionsâ, but they can also be used in constraints. So in general, a constraint is any equation where the left-hand side is a single family symbol, the right-hand side is an arbitrary pattern expression, and the constraint operator is one of @@ -592,7 +592,7 @@ the following:
Table 3. Constraint Operators
+Table 3. Constraint Operators
![]() |
Figure 1. Two Encodings and the Resulting Specific Symbol Trees
+Figure 1. Two Encodings and the Resulting Specific Symbol Trees
-In Figure 1, “Two Encodings and the Resulting Specific Symbol Trees”, we can see the break down +In Figure 1, âTwo Encodings and the Resulting Specific Symbol Treesâ, we can see the break down of two typical instructions in the example instruction set. For each instruction, we see the how the encodings split into the relevant fields and the resulting tree of specific symbols. Each node in the @@ -2066,7 +2066,7 @@ and p-code for these encodings by walking the trees.
If the nodes of each tree are replaced with the display information of the corresponding specific symbol, we see how the disassembly @@ -2074,12 +2074,12 @@ statement is built.
-Figure 2, “Two Disassembly Trees”, shows the resulting +Figure 2, âTwo Disassembly Treesâ, shows the resulting disassembly trees corresponding to the specific symbol trees in -Figure 1, “Two Encodings and the Resulting Specific Symbol Trees”. The display information comes +Figure 1, âTwo Encodings and the Resulting Specific Symbol Treesâ. The display information comes from constructor display sections, the names of attached registers, or the integer interpretation of fields. The identifiers in a constructor display section serves as placeholders for the subtrees below them. By @@ -2089,7 +2089,7 @@ statements corresponding to the original instruction encodings.
A similar procedure produces the resulting p-code translation of the instruction. If each node in the specific symbol tree is replaced with @@ -2097,10 +2097,10 @@ the corresponding p-code, we see how the final translation is built.
-Figure 3, “Two P-code Trees” lists the final p-code +Figure 3, âTwo P-code Treesâ lists the final p-code translation for our example instructions and shows the trees from which the translation is derived. Symbol names within the p-code for a particular node, as with the disassembly tree, are placeholders for @@ -2108,7 +2108,7 @@ the subtree below them. The final translation is put together by concatenating the p-code from each node, traversing the nodes in a depth-first order. Thus the p-code of a child tends to come before the p-code of the parent node (but see -Section 7.9, “P-code Macros”). Placeholders are filled in with the +Section 7.9, âP-code Macrosâ). Placeholders are filled in with the appropriate varnode, as determined by the export statement of the root of the corresponding subtree.
@@ -2117,11 +2117,11 @@ of the corresponding subtree.SLEIGH supports a macro facility for encapsulating semantic actions. The syntax, in effect, allows the designer to define p-code -subroutines which can be invoked as part of a constructor’s semantic +subroutines which can be invoked as part of a constructorâs semantic action. The subroutine is expanded automatically at compile time.
@@ -2131,8 +2131,8 @@ anywhere in the file before its first use. This is followed by the global identifier for the new macro and a parameter list, comma separated and in parentheses. The body of the definition comes next, surrounded by curly braces. The body is a sequence of semantic -statements with the same syntax as a constructor’s semantic -section. The identifiers in the macro’s parameter list are local in +statements with the same syntax as a constructorâs semantic +section. The identifiers in the macroâs parameter list are local in scope. The macro can refer to these and any global specific symbol.
@@ -2168,7 +2168,7 @@ directive however should not be used in a macro.
Because the nodes of a specific symbol tree are traversed in a depth-first order, the p-code for a child node in general comes before @@ -2180,10 +2180,10 @@ used to affect these issues in the rare cases where it is necessary. The build directive occurs as another form of statement in the semantic section of a constructor. The keyword build is -followed by one of the constructor’s operand identifiers. Then, -instead of filling in the operand’s associated p-code based on an +followed by one of the constructorâs operand identifiers. Then, +instead of filling in the operandâs associated p-code based on an arbitrary traversal of the symbol tree, the directive specifies that -the operand’s p-code must occur at that point in the p-code for the +the operandâs p-code must occur at that point in the p-code for the parent constructor.
@@ -2199,7 +2199,7 @@ efficient to treat the condition bit which distinguishes the variants as a special operand.
-cc: “c” is condition=1 { if (flag==1) goto inst_next; } +cc: âcâ is condition=1 { if (flag==1) goto inst_next; } cc: is condition=0 { } :and^cc r1,r2 is opcode=0x67 & cc & r1 & r2 { @@ -2210,7 +2210,7 @@ cc: is condition=0 { }
-In this example, the conditional variant is distinguished by a ‘c’ +In this example, the conditional variant is distinguished by a âcâ appended to the assembly mnemonic. The cc operand performs the conditional side-effect, checking a flag in one case, or doing nothing in the other. The two forms of the instruction can now @@ -2223,7 +2223,7 @@ normal action of the instruction.
For processors with a pipe-lined architecture, multiple instructions are typically executing simultaneously. This can lead to processor @@ -2268,7 +2268,7 @@ Because the delayslot directive combines two or more instructions into one, the meaning of the symbols inst_next and inst_next2 become ambiguous. It is not -clear anymore what exactly the “next instruction” is. SLEIGH uses the +clear anymore what exactly the ânext instructionâ is. SLEIGH uses the following conventions for interpreting an inst_next symbol. If it is used in the semantic section, the symbol refers to the address of the instruction @@ -2289,15 +2289,15 @@ when computing the value of inst_next2.
-Prev | -- | Next +Prev | + | + Next |
6. Tokens and Fields | +6. Tokens and Fields | Home | -8. Using Context | + 8. Using Context |
For most practical specifications, the disassembly and semantic meaning of an instruction can be determined by looking only at the @@ -77,7 +77,7 @@ necessary.
SLEIGH solves these problems by introducing context variables. The syntax for defining these symbols was -described in Section 6.4, “Context Variables”. As mentioned +described in Section 6.4, âContext Variablesâ. As mentioned there, the easiest and most common way to use a context variable is as just another field to use in our bit patterns. It gives us the extra information we need to distinguish between different instructions @@ -85,7 +85,7 @@ whose encodings are otherwise the same.
Suppose a processor supports the use of two different sets of registers in its main addressing mode, based on the setting of a @@ -149,12 +149,12 @@ although see the following sections.
SLEIGH can make direct modifications to context variables through statements in the disassembly action section of a constructor. The left-hand side of an assignment statement in this section can be a context variable, -see Section 7.5.2, “General Actions and Pattern Expressions”. Because the result of this +see Section 7.5.2, âGeneral Actions and Pattern Expressionsâ. Because the result of this assignment is calculated in the middle of the instruction disassembly, the change in value of the context variable can potentially affect any remaining parsing for that instruction. A modal variable is being @@ -193,7 +193,7 @@ use mode, its value will have reverted to original global state. The same holds for any context variable modified with this syntax. If an instruction needs to permanently modify the state of a context variable, the designer must use -constructions described in Section 8.3, “Global Context Change”. +constructions described in Section 8.3, âGlobal Context Changeâ.
Clearly, the behavior of the above example could be easily replicated @@ -219,7 +219,7 @@ by build directives.
It is possible for an instruction to attempt a permanent change to a context variable, which would then affect the parsing of other @@ -261,7 +261,7 @@ select r registers via rreg1, and smode sets mode to 1 in order to select s registers. As is described in -Section 8.2, “Local Context Change”, these assignments by themselves +Section 8.2, âLocal Context Changeâ, these assignments by themselves cause only a local context change. However, the subsequent globalset directives make the change persist outside of the the instructions @@ -276,7 +276,7 @@ of mode begins at the next address.
A global change to context that affects instruction decoding is typically open-ended. I.e. once the mode switching instruction is executed, a permanent change @@ -290,7 +290,7 @@ is encountered.
Flow following behavior can be overridden by adding the noflow -attribute to the definition of the context field. (See Section 6.4, “Context Variables”) +attribute to the definition of the context field. (See Section 6.4, âContext Variablesâ) In this case, a globalset directive only affects the context of a single instruction at the specified address. Subsequent instructions retain their original context. This can be useful in a variety of situations but is typically @@ -348,15 +348,15 @@ end and what to do if there are conflicts.
-Prev | -- | Next +Prev | + | + Next |
7. Constructors | +7. Constructors | Home | -9. P-code Tables | + 9. P-code Tables |
SLEIGH files must start with all the definitions needed by the rest of the specification. All definition statements start with the keyword -define and end with a semicolon ‘;’. +define and end with a semicolon â;â.
The first definition in any SLEIGH specification must be for endianess. Either
@@ -46,7 +46,7 @@ define endian=little; This defines how the processor interprets contiguous sequences of bytes as integers or other values and globally affects values across all address spaces. It also affects how integer fields -within an instruction are interpreted, (see Section 6.1, “Defining Tokens and Fields”), +within an instruction are interpreted, (see Section 6.1, âDefining Tokens and Fieldsâ), although it is possible to override this setting in the rare case that endianess is different for data versus instruction encoding. The specification designer generally only needs to worry about @@ -56,7 +56,7 @@ otherwise the specification language hides endianess issues.An alignment definition looks like
@@ -73,7 +73,7 @@ instruction as an error.The definition of an address space looks like
@@ -115,7 +115,7 @@ and store from dynamic pointers into the space.A space of type register_space is -intended to model the processor’s general-purpose registers. In terms +intended to model the processorâs general-purpose registers. In terms of accessing and manipulating data within the space, SLEIGH and p-code make no distinction between the type ram_space or the @@ -157,8 +157,8 @@ At least one space needs to be labeled with the default attribute. This should be the space that the processor accesses with its main address bus. In terms of the rest of the specification file, this sets the default -space referred to by the ‘*’ operator (see -Section 7.7.1.2, “The '*' Operator”). It also has meaning to +space referred to by the â*â operator (see +Section 7.7.1.2, âThe '*' Operatorâ). It also has meaning to GHIDRA.
@@ -184,7 +184,7 @@ bits).
The general purpose registers of the processors can be named with the following define syntax: @@ -194,8 +194,8 @@ define spacename offset=stringlist is either a single string or a white -space separated list of strings in square brackets ‘[’ and ‘]’. A -string of just “_” indicates a skip in the sequence for that +space separated list of strings in square brackets â[â and â]â. A +string of just â_â indicates a skip in the sequence for that definition. The offset corresponding to that position in the list of names will not have a varnode defined at it.
@@ -228,7 +228,7 @@ define register offset=0 size=1Many processors define registers that either consist of a single bit or otherwise don't use an integral number of bytes. A recurring @@ -245,7 +245,7 @@ models because the smallest object they can manipulate directly is a byte. In order to manipulate single bits, p-code must use a combination of bitwise logical, extension, and truncation operations. So a register defined as a bit range is not really a -varnode as described in Section 1.2, “Varnodes”, but is +varnode as described in Section 1.2, âVarnodesâ, but is really just a signal to the SLEIGH compiler to fill in the proper operators to simulate the bit manipulation. Using this feature may greatly increase the complexity of the compiled specification with @@ -265,7 +265,7 @@ register. In this example, statusreg is d first as a 4 byte register, and the bit registers themselves are built by the following define bitrange statement. A single bit register definition consists of an identifier -for the register, followed by ‘=’, then the name of the register +for the register, followed by â=â, then the name of the register containing the bits, and finally a pair of numbers in square brackets. The first number indicates the lowest significant bit in the containing register of the bit range, where bit 0 is the least @@ -282,11 +282,11 @@ bit of statusreg respectively.
The syntax for defining a new bit register is consistent with the pseudo bit range operator, described in -Section 7.7.1.5, “Bit Range Operator”, and the resulting symbol +Section 7.7.1.5, âBit Range Operatorâ, and the resulting symbol is really just a placeholder for this operator. Whenever SLEIGH sees this symbol it generates p-code precisely as if the designer had used the bit range operator -instead. Section 7.7.1.5, “Bit Range Operator”, provides some +instead. Section 7.7.1.5, âBit Range Operatorâ, provides some additional details about how p-code is generated, which apply to the use of bit range registers.
@@ -299,14 +299,14 @@ used as an alternate syntax for defining overlapping registers.The specification designer can define new p-code operations using a define pcodeop statement. This statement automatically reserves an internal form for the new p-code operation and associates an identifier with it. This identifier can then be used in semantic expressions (see -Section 7.7.1.8, “User-Defined Operations”). The following example defines a +Section 7.7.1.8, âUser-Defined Operationsâ). The following example defines a new p-code operation arctan.
@@ -338,15 +338,15 @@ actions that are too esoteric or too complicated to implement.
-Prev | -- | Next +Prev | + | + Next |
3. Preprocessing | +3. Preprocessing | Home | -5. Introduction to Symbols | + 5. Introduction to Symbols |
A SLEIGH specification is typically contained in a single file, -although see Section 3.1, “Including Files”. The file must +although see Section 3.1, âIncluding Filesâ. The file must follow a specific format as parsed by the SLEIGH compiler. In this section, we list the basic formatting rules for this file as enforced by the compiler.
-Comments start with the ‘#’ character and continue to the end of the +Comments start with the â#â character and continue to the end of the line. Comments can appear anywhere except the display section of a -constructor (see Section 7.3, “The Display Section”) where the ‘#’ character will be +constructor (see Section 7.3, âThe Display Sectionâ) where the â#â character will be interpreted as something that should be printed in disassembly.
Identifiers are made up of letters a-z, capitals A-Z, digits 0-9 and -the characters ‘.’ and ‘_’. An identifier can use these characters in +the characters â.â and â_â. An identifier can use these characters in any order and for any length, but it must not start with a digit.
String literals can be used, when specifying names and when specifying how disassembly should be printed, so that special characters are treated as literals. Strings are surrounded by the double quote -character ‘”’ and all characters in between lose their special +character âââ and all characters in between lose their special meaning.
Integers are specified either in a decimal format or in a standard C-style hexadecimal format by prepending the -number with “0x”. Alternately, a binary representation of an integer +number with â0xâ. Alternately, a binary representation of an integer can be given by prepending the string of '0' and '1' characters with "0b".
@@ -82,21 +82,21 @@ can be given by prepending the string of '0' and '1' characters with "0b".Numbers are treated as unsigned except when used in patterns where they are treated as signed (see -Section 7.4, “The Bit Pattern Section”). The number of bytes used to +Section 7.4, âThe Bit Pattern Sectionâ). The number of bytes used to encode the integer when specifying the semantics of an instruction is inferred from other parts of the syntax (see -Section 7.3, “The Display Section”). Otherwise, integers should +Section 7.3, âThe Display Sectionâ). Otherwise, integers should be thought of as having arbitrary precision. Currently, SLEIGH stores integers internally with 64 bits of precision.
White space characters include space, tab, line-feed, vertical -line-feed, and carriage-return (‘ ‘, ‘\t’, ‘\r’, ‘\v’, -‘\n’). Variations in spacing have no effect on the parsing of the file +line-feed, and carriage-return (â â, â\tâ, â\râ, â\vâ, +â\nâ). Variations in spacing have no effect on the parsing of the file except in string literals.
@@ -106,15 +106,15 @@ except in string literals.-Prev | -- | Next +Prev | + | + Next |
SLEIGH | +SLEIGH | Home | -3. Preprocessing | + 3. Preprocessing |
SLEIGH provides support for simple file inclusion, macros, and other basic preprocessing functions. These are all invoked with directives -that start with the ‘@’ character, which must be the first character +that start with the â@â character, which must be the first character in the line.
In general a single SLEIGH specification is contained in a single file, and the compiler is invoked on one file at a time. Multiple @@ -54,7 +54,7 @@ own @include directives.
SLEIGH allows simple (unparameterized) macro definitions and expansions. A macro definition occurs on one line and starts with @@ -62,7 +62,7 @@ the @define directive. This is followed by an identifier for the macro and then a string to which the macro should expand. The string must either be a proper identifier itself or surrounded with double quotes. The macro can then be -expanded with typical “$(identifier)” syntax at any other point in the +expanded with typical â$(identifier)â syntax at any other point in the specification following the definition.
@@ -72,9 +72,9 @@ define endian=$(ENDIAN);
This example defines a macro identified as ENDIAN -with the string “big”, and then expands the macro in a later SLEIGH +with the string âbigâ, and then expands the macro in a later SLEIGH statement. Macro definitions can also be made from the command line -and in the “.spec” file, allowing multiple specification variations to +and in the â.specâ file, allowing multiple specification variations to be derived from one file. SLEIGH also has an @undef directive which removes the definition of a macro from that point on in the file. @@ -85,7 +85,7 @@ definition of a macro from that point on in the file.
SLEIGH supports several directives that allow conditional inclusion of parts of a specification, based on the existence of a macro, or its @@ -103,7 +103,7 @@ and @endif.
The @ifdef directive is followed by a macro identifier and evaluates to true if the macro is defined. @@ -129,14 +129,14 @@ or @elif directive (See below).
The @if directive is followed by a
boolean expression with macros as the variables and strings as the
constants. Comparisons between macros and strings are currently
limited to string equality or inequality. But individual comparisons
can be combined arbitrarily using parentheses and the boolean
-operators ‘&&’, ‘||’, and ‘^^’. These represent a logical
+operators â&&â, â||â, and â^^â. These represent a logical
and, a logical or, and
a logical exclusive-or operation respectively. It
is possible to test whether a particular macro is defined within the
@@ -158,7 +158,7 @@ is defined.
An @else directive splits the lines
bounded by an @if directive and
@@ -180,12 +180,12 @@ one @else, which must occur after all
the @elif directives.
@@ -198,15 +198,15 @@ the @elif directives.
We list all the p-code operations by name along with the syntax for
invoking them within the semantic section of a constructor definition
-(see Section 7.7, “The Semantic Section”), and with a
+(see Section 7.7, âThe Semantic Sectionâ), and with a
description of the operator. The terms v0
and v1 represent identifiers of individual input
varnodes to the operation. In terms of syntax, v0
@@ -46,7 +46,7 @@ to lowest.
Table 5. Semantic Expression Operators and Syntax Table 5. Semantic Expression Operators and Syntax Table 6. Basic Statements and Associated Operators Table 6. Basic Statements and Associated Operators Table 7. Branching Statements Table 7. Branching Statements
After the definition section, we are prepared to start writing the
body of the specification. This part of the specification shows how
@@ -61,7 +61,7 @@ Formally a Specific Symbol is defined as
The named registers that we defined earlier are the simplest examples
of specific symbols (see
-Section 4.4, “Naming Registers”). The symbol identifier
+Section 4.4, âNaming Registersâ). The symbol identifier
itself is the string that will get printed in disassembly and the
varnode associated with the symbol is the one constructed by the
define statement.
@@ -79,7 +79,7 @@ instructions to specific symbols.
The set of instruction encodings that map to a single specific symbol
is called an instruction pattern and is described
-more fully in Section 7.4, “The Bit Pattern Section”. In most cases, this
+more fully in Section 7.4, âThe Bit Pattern Sectionâ. In most cases, this
can be thought of as a mask on the bits of the instruction and a value
that the remaining unmasked bits must match. At any rate, the family
symbol identifier, when taken out of context, represents the entire
@@ -98,14 +98,14 @@ that simulate the instruction.
The symbol responsible for combining smaller family symbols is called
a table, which is fully described in
-Section 7.8, “Tables”. Any table symbol
+Section 7.8, âTablesâ. Any table symbol
can be used in the definition of other table
symbols until the root symbol is fully described. The root symbol has
the predefined identifier instruction.
Almost all identifiers live in the same global "scope". The global scope includes
All of the names in this scope must be unique. Each
-individual constructor (defined in Section 7, “Constructors”)
+individual constructor (defined in Section 7, âConstructorsâ)
defines a local scope for operand names. As with most languages, a
local symbol with the same name as a global
symbol hides the global symbol while that scope
@@ -143,13 +143,13 @@ is in effect.
We list all of the symbols that are predefined by SLEIGH.
Table 2. Predefined Symbols Table 2. Predefined Symbols
A token is one of the byte-sized pieces that make
up the machine code instructions being modeled.
@@ -57,7 +57,7 @@ field and the range of bits within the token making up the field. The
size of a field does not need to be a multiple of
8. The range is inclusive where the least significant bit in the token
is labeled 0. When defining tokens that are bigger than 1 byte, the
-global endianess setting (See Section 4.1, “Endianess Definition”)
+global endianess setting (See Section 4.1, âEndianess Definitionâ)
will affect this labeling. Although it is rarely required, it is possible to override
the global endianess setting for a specific token by appending either the qualifier
endian=little or endian=big
@@ -88,11 +88,11 @@ different names.
Fields are the most basic form of family symbol; they define a natural
map from instruction bits to a specific symbol as follows. We take the
-set of bits within the instruction as given by the field’s defining
+set of bits within the instruction as given by the fieldâs defining
range and treat them as an integer encoding. The resulting integer is
both the display portion and the semantic meaning of the specific
symbol. The display string is obtained by converting the integer into
@@ -113,7 +113,7 @@ the dec attribute is not supported]
The default interpretation of a field is probably the most natural but
of course processors interpret fields within an instruction in a wide
@@ -124,7 +124,7 @@ interpretations must be built up out of tables.
Probably the most common processor interpretation
of a field is as an encoding of a particular register. In SLEIGH this
@@ -140,7 +140,7 @@ space separated list of field identifiers surrounded by square
brackets. A registerlist must be a square bracket
surrounded and space separated list of register identifiers as created
with define statements (see Section
-Section 4.4, “Naming Registers”). For each field in
+Section 4.4, âNaming Registersâ). For each field in
the fieldlist, instead of having the display and
semantic meaning of an integer, the field becomes a look-up table for
the given list of registers. The original integer interpretation is
@@ -152,7 +152,7 @@ display and semantic meaning of the field are now taken from the new
register.
-A particular integer can remain unspecified by putting a ‘_’ character
+A particular integer can remain unspecified by putting a â_â character
in the appropriate position of the register list or also if the length
of the register list is less than the integer. A specific integer
encoding of the field that is unspecified like this
@@ -163,7 +163,7 @@ of the instruction.
Sometimes a processor interprets a field as an integer but not the
integer given by the default interpretation. A different integer
@@ -180,12 +180,12 @@ register interpretation is assigned to fields with
an attach variables statement, the
integers in the list are assigned to each field specified in
the fieldlist. [Currently SLEIGH does not support
-unspecified positions in the list using a ‘_’]
+unspecified positions in the list using a â_â]
It is possible to just modify the display characteristics of a field
without changing the semantic meaning. The need for this is rare, but
@@ -196,7 +196,7 @@ appropriate to define overlapping fields, one of which is defined to
have no semantic meaning. The most convenient way to break down the
required disassembly may not be the most convenient way to break down
the semantics. It is also possible to have symbols with semantic
-meaning but no display meaning (see Section 7.4.5, “Invisible Operands”).
+meaning but no display meaning (see Section 7.4.5, âInvisible Operandsâ).
At any rate we can list the display interpretation of a field directly
@@ -218,7 +218,7 @@ encodings.
SLEIGH supports the concept of context
variables. For the most part processor instructions can be
@@ -254,12 +254,12 @@ By default, globally setting a context variable affects instruction decoding
from the point of the change, forward,
following the flow of the instructions, but if the variable is labeled as
noflow, any change is limited to a
-single instruction. (See Section 8.3.1, “Context Flow”)
+single instruction. (See Section 8.3.1, âContext Flowâ)
Once the context variable is defined, in terms of the specification
syntax, it can be treated as if it were just another field. See
-Section 8, “Using Context”, for a complete discussion of how to
+Section 8, âUsing Contextâ, for a complete discussion of how to
use context variables.
-@if PROCESSOR == “mips”
-@ define ENDIAN “big”
-@elif ((PROCESSOR==”x86”)&&(OS!=”win”))
-@ define ENDIAN “little”
+@if PROCESSOR == âmipsâ
+@ define ENDIAN âbigâ
+@elif ((PROCESSOR==âx86â)&&(OS!=âwinâ))
+@ define ENDIAN âlittleâ
@else
-@ define ENDIAN “unknown”
+@ define ENDIAN âunknownâ
@endif
diff --git a/GhidraDocs/languages/html/sleigh_ref.html b/GhidraDocs/languages/html/sleigh_ref.html
index d5e608dccd..b21fc5f37d 100644
--- a/GhidraDocs/languages/html/sleigh_ref.html
+++ b/GhidraDocs/languages/html/sleigh_ref.html
@@ -1,34 +1,34 @@
-
-
-Prev
-
- Next
+PrevÂ
+Â
+Â Next
-
2. Basic Specification Layout
+2. Basic Specification LayoutÂ
Home
- 4. Basic Definitions
+ 4. Basic Definitions
diff --git a/GhidraDocs/languages/html/sleigh_symbols.html b/GhidraDocs/languages/html/sleigh_symbols.html
index df994eb7bd..2022555fb4 100644
--- a/GhidraDocs/languages/html/sleigh_symbols.html
+++ b/GhidraDocs/languages/html/sleigh_symbols.html
@@ -1,24 +1,24 @@
-
-
-Prev
-
-
+PrevÂ
+Â
+Â
-
8. Using Context
+8. Using ContextÂ
Home
-
+Â
diff --git a/GhidraDocs/languages/html/sleigh_tokens.html b/GhidraDocs/languages/html/sleigh_tokens.html
index 2239604d1b..90f10cfc67 100644
--- a/GhidraDocs/languages/html/sleigh_tokens.html
+++ b/GhidraDocs/languages/html/sleigh_tokens.html
@@ -1,24 +1,24 @@
-
-
-Prev
-
- Next
+PrevÂ
+Â
+Â Next
-
4. Basic Definitions
+4. Basic DefinitionsÂ
Home
- 6. Tokens and Fields
+ 6. Tokens and Fields
-Prev
-
- Next
+PrevÂ
+Â
+Â Next
-
5. Introduction to Symbols
+5. Introduction to SymbolsÂ
Home
- 7. Constructors
+ 7. Constructors