GP-5692 updated bsim tutorial

This commit is contained in:
James 2025-05-16 18:09:06 +00:00
parent e2332dec70
commit 4226376a95
7 changed files with 14 additions and 13 deletions

View file

@ -22,7 +22,7 @@ To generate the signature files, execute the following commands in a shell (adju
```bash
cd <ghidra_install_dir>/support
mkdir ~/bsim_sigs
./bsim generatesigs ghidra:/<ghidra_project_dir>/postgres_object_files --bsim file:/<database_dir>/example ~/bsim_sigs
./bsim generatesigs ghidra:/<ghidra_project_dir>/postgres_object_files ~/bsim_sigs --bsim file:/<database_dir>/example
```
- The ``ghidra:/`` argument is the local project which holds the analyzed binaries.

View file

@ -110,13 +110,13 @@ We use these different versions to demonstrate some of the capabilities of BSim.
The executable-level results are covered in [From Matching Functions to Matching Executables](BSimTutorial_Exe_Results.md).
1. Right-click on the row of the match and perform the **Compare Functions** action to bring up the side-by-side comparison.
- The **Listing View** tab shows the disassembly.
- The **Decompiler Diff View** tab shows the decompiled code.
- The **Decompiler View** tab shows the decompiled code.
- Differences in the code are automatically highlighted in cyan.
- Either view can be toggled between a horizontal split and a vertical split using the drop-down menu.
1. Examine the diff views to verify that the match is valid.
1. Using the **Apply Name** action in the BSim Search Results table, apply the name from the search result to the queried function.
**Note**: We cover the Decompiler Diff View in greater detail and discuss the various "Apply" actions in [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md).
**Note**: We cover the Decompiler View in greater detail and discuss the various "Apply" actions in [Evaluating Matches and Applying Information](BSimTutorial_Evaluating_Matches.md).
### Exercise: Changes to the Source Code
@ -136,7 +136,7 @@ We use these different versions to demonstrate some of the capabilities of BSim.
``<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41``.
- This executable is based on the same source code as the executable in `example` but compiled for a different architecture.
- **Note**: this file has the same name as the one we used to populate the BSim database, so you will have to give the resulting Ghidra program a different name or import it into a different directory in your Ghidra project.
1. Navigate to ``_expandargv`` and issue a BSim query.
1. Navigate to ``_expandargv`` and issue a BSim query with a similarity bound of 0.5.
In the decompiler diff view of the single match, what differences do you see regarding ``memmove`` and ``memcpy``?
<details><summary>In the arm64 version...</summary> In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.</details>
1. Examine the **Listing View** tab and verify that the architectures are indeed different.

View file

@ -20,6 +20,7 @@ Next, perform the following steps from the Ghidra Code Browser:
We now populate the database with an executable which is contained in the Ghidra distribution.
1. Import and analyze the executable ``<ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_41`` using the default analysis options.
1. Save the analyzed program.
1. Run the Ghidra script ``AddProgramToH2BSimDatabaseScript.java`` on this program.
- The script will ask you to select an H2 database file. Use ``example.mv.db`` in the database directory.
1. In general you can run this script on other programs to add their signatures to this database, but that's not necessary for the exercises in the next section.

View file

@ -88,7 +88,7 @@ The token matching algorithm matches a function call in one program to a functio
However, given a matched pair of calls, you can bring up a new comparison window for the callees with the **Compare Matching Callees** action.
1. Click in the left panel of the decompile diff window and press ``Ctrl-F``.
1. Enter ``FUN_`` and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name.
1. Enter ``FUN_`` and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name (and is not an external function).
1. Right-click on one of the matched tokens and perform the **Compare Matching Callees** action.
1. In the comparison of the callees, apply the function signature and data types from the right function to the left function.
Verify that the update is reflected in the decompiler diff view of the callers.

View file

@ -3,7 +3,7 @@
In this section, we discuss the Executable Results table.
Each row of this table corresponds to one executable in the database.
The information in one row is an aggregation of all of the function-level matches into that row's executable.
Your Executable Results table from the previous query should look similar to the following:
Here is an example Executable Results table from the previous query (yours may look different depending on the compiler used to create the example files):
![executable results](images/exe_results.png)
@ -18,17 +18,17 @@ If you select a single row in the table and right-click on it, you will see the
1. Sort the Executable results by descending **Function Count**.
An entry in this column shows the number of queried functions which have at least one match in the row's executable (if ``foo`` has 2 or more matches into a given executable, it still only contributes 1 to the function count).
What position is ``demangler_gnu_v2_41``?
<details><summary>In this table...</summary> It's in the first position.</details>
What position is ``demangler_gnu_v2_41`` in your table?
1. An entry in the **Confidence** column shows the sum of the confidence scores of all matches into the corresponding executable.
If ``foo`` has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score.
Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now much further down the list.
Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now further down the list.
<details><summary>What could explain this?</summary> If there are many function matches but the sum of all the confidences is relatively low, it is likely that many of the matches involve small functions with common BSim signatures.</details>
1. In the Executable match table, right click on ``demangler_gnu_v2_41`` and apply the filter action.
Sort the filtered function matches by descending confidence.
Starting at the top, examine some of the matches and convince yourself that the given explanation is correct.
- **Note**: You can remove the filter using the **Filter Results** icon ![Filter Results](images/exec.png) in the toolbar.
We'll discuss this further in [BSim Filters](BSimTutorial_Filters.md)
1. Determine the highest confidence match in ``demangler_gnu_v2_41`` and query all functions in ``postgres`` again, this time with a Confidence bound slightly higher than this confidence value. Verify that ``demangler_gnu_v2_41`` is not in the list of executable matches.
From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action.
Keep in mind that such functions can "pollute" the results of a blanket query.

View file

@ -16,7 +16,7 @@ To build the files, execute the following commands in a shell: [^1]
[^1]: You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully. The error messages are generally informative. See the comments in ``make-postgres.sh``.
```bash
cd <ghidra_install_dir>/Features/BSim
cd <ghidra_install_dir>/Ghidra/Features/BSim
export CFLAGS="-O2 -g"
./support/make-postgres.sh
mkdir ~/postgres_object_files

View file

@ -14,7 +14,7 @@ You should see the following result:
1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-significance.
Verify that that is the case for this table.
1. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions?
<details><summary>Answer:</summary> These are all instances of PostgreSQL statistics-reporting functions. Their bodies are quite similar and they have identical BSim signatures.</details>
<details><summary>Answer:</summary> These are all instances of PostgreSQL error-reporting functions. Their bodies are quite similar and they have identical BSim signatures.</details>
## Exercise: Selections and Queries
@ -22,7 +22,7 @@ Using the hit count column, it is possible to exclude functions with large numbe
1. In the Overview Table, select all functions whose hit count is 2 or less.
1. Right-click on the selection and perform the **Search Selected Functions** action.
Sort the query results by descending **Function Count** and verify that ``demangler_gnu_v2_41`` is far down the list.
Sort the query results by descending **Confidence** and verify that ``demangler_gnu_v2_41`` is far down the list.
## Exercise: Vector Hashes
@ -36,7 +36,7 @@ An optional column, **Vector Hash**, can be used to distinguish between these tw
1. Enable the **Vector Hash** Column in the Overview Table.
1. Find two functions with the same vector hash.
1. Select the two corresponding rows in the table and then transfer the selection to the Listing using the ![make selection icon](images/text_align_justify.png) icon in the BSim Overview toolbar.
1. In the Listing, press ``Shift-C`` or right-click and perform the **Compare Selected Functions** action.
1. In the Listing, right-click and perform **Function -> Compare Function(s)**
1. In the resulting Function Comparison window, convince yourself that these two functions should have the same BSim signature.
Next Section: [Queries and Filters](BSimTutorial_Filters.md)