mirror of
https://github.com/NationalSecurityAgency/ghidra.git
synced 2025-10-03 09:49:23 +02:00
GP-5692 updated bsim tutorial
This commit is contained in:
parent
e2332dec70
commit
4226376a95
7 changed files with 14 additions and 13 deletions
|
@ -22,7 +22,7 @@ To generate the signature files, execute the following commands in a shell (adju
|
||||||
```bash
|
```bash
|
||||||
cd <ghidra_install_dir>/support
|
cd <ghidra_install_dir>/support
|
||||||
mkdir ~/bsim_sigs
|
mkdir ~/bsim_sigs
|
||||||
./bsim generatesigs ghidra:/<ghidra_project_dir>/postgres_object_files --bsim file:/<database_dir>/example ~/bsim_sigs
|
./bsim generatesigs ghidra:/<ghidra_project_dir>/postgres_object_files ~/bsim_sigs --bsim file:/<database_dir>/example
|
||||||
```
|
```
|
||||||
|
|
||||||
- The ``ghidra:/`` argument is the local project which holds the analyzed binaries.
|
- The ``ghidra:/`` argument is the local project which holds the analyzed binaries.
|
||||||
|
|
|
@ -110,13 +110,13 @@ We use these different versions to demonstrate some of the capabilities of BSim.
|
||||||
The executable-level results are covered in [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md).
|
The executable-level results are covered in [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md).
|
||||||
1. Right-click on the row of the match and perform the **Compare Functions** action to bring up the side-by-side comparison.
|
1. Right-click on the row of the match and perform the **Compare Functions** action to bring up the side-by-side comparison.
|
||||||
- The **Listing View** tab shows the disassembly.
|
- The **Listing View** tab shows the disassembly.
|
||||||
- The **Decompiler Diff View** tab shows the decompiled code.
|
- The **Decompiler View** tab shows the decompiled code.
|
||||||
- Differences in the code are automatically highlighted in cyan.
|
- Differences in the code are automatically highlighted in cyan.
|
||||||
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
|
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
|
||||||
1. Examine the diff views to verify that the match is valid.
|
1. Examine the diff views to verify that the match is valid.
|
||||||
1. Using the **Apply Name** action in the BSim Search Results table, apply the name from the search result to the queried function.
|
1. Using the **Apply Name** action in the BSim Search Results table, apply the name from the search result to the queried function.
|
||||||
|
|
||||||
**Note**: We cover the Decompiler Diff View in greater detail and discuss the various "Apply" actions in [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md).
|
**Note**: We cover the Decompiler View in greater detail and discuss the various "Apply" actions in [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md).
|
||||||
|
|
||||||
### Exercise: Changes to the Source Code
|
### Exercise: Changes to the Source Code
|
||||||
|
|
||||||
|
@ -136,7 +136,7 @@ We use these different versions to demonstrate some of the capabilities of BSim.
|
||||||
``<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41``.
|
``<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41``.
|
||||||
- This executable is based on the same source code as the executable in `example` but compiled for a different architecture.
|
- This executable is based on the same source code as the executable in `example` but compiled for a different architecture.
|
||||||
- **Note**: this file has the same name as the one we used to populate the BSim database, so you will have to give the resulting Ghidra program a different name or import it into a different directory in your Ghidra project.
|
- **Note**: this file has the same name as the one we used to populate the BSim database, so you will have to give the resulting Ghidra program a different name or import it into a different directory in your Ghidra project.
|
||||||
1. Navigate to ``_expandargv`` and issue a BSim query.
|
1. Navigate to ``_expandargv`` and issue a BSim query with a similarity bound of 0.5.
|
||||||
In the decompiler diff view of the single match, what differences do you see regarding ``memmove`` and ``memcpy``?
|
In the decompiler diff view of the single match, what differences do you see regarding ``memmove`` and ``memcpy``?
|
||||||
<details><summary>In the arm64 version...</summary> In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
|
<details><summary>In the arm64 version...</summary> In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
|
||||||
1. Examine the **Listing View** tab and verify that the architectures are indeed different.
|
1. Examine the **Listing View** tab and verify that the architectures are indeed different.
|
||||||
|
|
|
@ -20,6 +20,7 @@ Next, perform the following steps from the Ghidra Code Browser:
|
||||||
We now populate the database with an executable which is contained in the Ghidra distribution.
|
We now populate the database with an executable which is contained in the Ghidra distribution.
|
||||||
|
|
||||||
1. Import and analyze the executable ``<ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_41`` using the default analysis options.
|
1. Import and analyze the executable ``<ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_41`` using the default analysis options.
|
||||||
|
1. Save the analyzed program.
|
||||||
1. Run the Ghidra script ``AddProgramToH2BSimDatabaseScript.java`` on this program.
|
1. Run the Ghidra script ``AddProgramToH2BSimDatabaseScript.java`` on this program.
|
||||||
- The script will ask you to select an H2 database file. Use ``example.mv.db`` in the database directory.
|
- The script will ask you to select an H2 database file. Use ``example.mv.db`` in the database directory.
|
||||||
1. In general you can run this script on other programs to add their signatures to this database, but that's not necessary for the exercises in the next section.
|
1. In general you can run this script on other programs to add their signatures to this database, but that's not necessary for the exercises in the next section.
|
||||||
|
|
|
@ -88,7 +88,7 @@ The token matching algorithm matches a function call in one program to a functio
|
||||||
However, given a matched pair of calls, you can bring up a new comparison window for the callees with the **Compare Matching Callees** action.
|
However, given a matched pair of calls, you can bring up a new comparison window for the callees with the **Compare Matching Callees** action.
|
||||||
|
|
||||||
1. Click in the left panel of the decompile diff window and press ``Ctrl-F``.
|
1. Click in the left panel of the decompile diff window and press ``Ctrl-F``.
|
||||||
1. Enter ``FUN_`` and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name.
|
1. Enter ``FUN_`` and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name (and is not an external function).
|
||||||
1. Right-click on one of the matched tokens and perform the **Compare Matching Callees** action.
|
1. Right-click on one of the matched tokens and perform the **Compare Matching Callees** action.
|
||||||
1. In the comparison of the callees, apply the function signature and data types from the right function to the left function.
|
1. In the comparison of the callees, apply the function signature and data types from the right function to the left function.
|
||||||
Verify that the update is reflected in the decompiler diff view of the callers.
|
Verify that the update is reflected in the decompiler diff view of the callers.
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
In this section, we discuss the Executable Results table.
|
In this section, we discuss the Executable Results table.
|
||||||
Each row of this table corresponds to one executable in the database.
|
Each row of this table corresponds to one executable in the database.
|
||||||
The information in one row is an aggregation of all of the function-level matches into that row's executable.
|
The information in one row is an aggregation of all of the function-level matches into that row's executable.
|
||||||
Your Executable Results table from the previous query should look similar to the following:
|
Here is an example Executable Results table from the previous query (yours may look different depending on the compiler used to create the example files):
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
@ -18,17 +18,17 @@ If you select a single row in the table and right-click on it, you will see the
|
||||||
|
|
||||||
1. Sort the Executable results by descending **Function Count**.
|
1. Sort the Executable results by descending **Function Count**.
|
||||||
An entry in this column shows the number of queried functions which have at least one match in the row's executable (if ``foo`` has 2 or more matches into a given executable, it still only contributes 1 to the function count).
|
An entry in this column shows the number of queried functions which have at least one match in the row's executable (if ``foo`` has 2 or more matches into a given executable, it still only contributes 1 to the function count).
|
||||||
What position is ``demangler_gnu_v2_41``?
|
What position is ``demangler_gnu_v2_41`` in your table?
|
||||||
<details><summary>In this table...</summary> It's in the first position.</details>
|
|
||||||
1. An entry in the **Confidence** column shows the sum of the confidence scores of all matches into the corresponding executable.
|
1. An entry in the **Confidence** column shows the sum of the confidence scores of all matches into the corresponding executable.
|
||||||
If ``foo`` has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score.
|
If ``foo`` has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score.
|
||||||
Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now much further down the list.
|
Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now further down the list.
|
||||||
<details><summary>What could explain this?</summary> If there are many function matches but the sum of all the confidences is relatively low, it is likely that many of the matches involve small functions with common BSim signatures.</details>
|
<details><summary>What could explain this?</summary> If there are many function matches but the sum of all the confidences is relatively low, it is likely that many of the matches involve small functions with common BSim signatures.</details>
|
||||||
1. In the Executable match table, right click on ``demangler_gnu_v2_41`` and apply the filter action.
|
1. In the Executable match table, right click on ``demangler_gnu_v2_41`` and apply the filter action.
|
||||||
Sort the filtered function matches by descending confidence.
|
Sort the filtered function matches by descending confidence.
|
||||||
Starting at the top, examine some of the matches and convince yourself that the given explanation is correct.
|
Starting at the top, examine some of the matches and convince yourself that the given explanation is correct.
|
||||||
- **Note**: You can remove the filter using the **Filter Results** icon  in the toolbar.
|
- **Note**: You can remove the filter using the **Filter Results** icon  in the toolbar.
|
||||||
We'll discuss this further in [BSim Filters](BSimTutorial_Filters.md)
|
We'll discuss this further in [BSim Filters](BSimTutorial_Filters.md)
|
||||||
|
1. Determine the highest confidence match in ``demangler_gnu_v2_41`` and query all functions in ``postgres`` again, this time with a Confidence bound slightly higher than this confidence value. Verify that ``demangler_gnu_v2_41`` is not in the list of executable matches.
|
||||||
|
|
||||||
From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action.
|
From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action.
|
||||||
Keep in mind that such functions can "pollute" the results of a blanket query.
|
Keep in mind that such functions can "pollute" the results of a blanket query.
|
||||||
|
|
|
@ -16,7 +16,7 @@ To build the files, execute the following commands in a shell: [^1]
|
||||||
[^1]: You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully. The error messages are generally informative. See the comments in ``make-postgres.sh``.
|
[^1]: You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully. The error messages are generally informative. See the comments in ``make-postgres.sh``.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd <ghidra_install_dir>/Features/BSim
|
cd <ghidra_install_dir>/Ghidra/Features/BSim
|
||||||
export CFLAGS="-O2 -g"
|
export CFLAGS="-O2 -g"
|
||||||
./support/make-postgres.sh
|
./support/make-postgres.sh
|
||||||
mkdir ~/postgres_object_files
|
mkdir ~/postgres_object_files
|
||||||
|
|
|
@ -14,7 +14,7 @@ You should see the following result:
|
||||||
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-significance.
|
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-significance.
|
||||||
Verify that that is the case for this table.
|
Verify that that is the case for this table.
|
||||||
1. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions?
|
1. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions?
|
||||||
<details><summary>Answer:</summary> These are all instances of PostgreSQL statistics-reporting functions. Their bodies are quite similar and they have identical BSim signatures.</details>
|
<details><summary>Answer:</summary> These are all instances of PostgreSQL error-reporting functions. Their bodies are quite similar and they have identical BSim signatures.</details>
|
||||||
|
|
||||||
## Exercise: Selections and Queries
|
## Exercise: Selections and Queries
|
||||||
|
|
||||||
|
@ -22,7 +22,7 @@ Using the hit count column, it is possible to exclude functions with large numbe
|
||||||
|
|
||||||
1. In the Overview Table, select all functions whose hit count is 2 or less.
|
1. In the Overview Table, select all functions whose hit count is 2 or less.
|
||||||
1. Right-click on the selection and perform the **Search Selected Functions** action.
|
1. Right-click on the selection and perform the **Search Selected Functions** action.
|
||||||
Sort the query results by descending **Function Count** and verify that ``demangler_gnu_v2_41`` is far down the list.
|
Sort the query results by descending **Confidence** and verify that ``demangler_gnu_v2_41`` is far down the list.
|
||||||
|
|
||||||
## Exercise: Vector Hashes
|
## Exercise: Vector Hashes
|
||||||
|
|
||||||
|
@ -36,7 +36,7 @@ An optional column, **Vector Hash**, can be used to distinguish between these tw
|
||||||
1. Enable the **Vector Hash** Column in the Overview Table.
|
1. Enable the **Vector Hash** Column in the Overview Table.
|
||||||
1. Find two functions with the same vector hash.
|
1. Find two functions with the same vector hash.
|
||||||
1. Select the two corresponding rows in the table and then transfer the selection to the Listing using the  icon in the BSim Overview toolbar.
|
1. Select the two corresponding rows in the table and then transfer the selection to the Listing using the  icon in the BSim Overview toolbar.
|
||||||
1. In the Listing, press ``Shift-C`` or right-click and perform the **Compare Selected Functions** action.
|
1. In the Listing, right-click and perform **Function -> Compare Function(s)**
|
||||||
1. In the resulting Function Comparison window, convince yourself that these two functions should have the same BSim signature.
|
1. In the resulting Function Comparison window, convince yourself that these two functions should have the same BSim signature.
|
||||||
|
|
||||||
Next Section: [Queries and Filters](BSimTutorial_Filters.md)
|
Next Section: [Queries and Filters](BSimTutorial_Filters.md)
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue