diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html deleted file mode 100755 index 5daff89a45..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html +++ /dev/null @@ -1,83 +0,0 @@ -

BSim Databases from the Command Line

- -

The bsim command-line utility, located in the support directory of a Ghidra distribution, is used to create, populate, and manage BSim databases. -It works for all BSim database backends. -This utility offers a number of commands, many of which have several options. -In this section, we cover only a small subset of the possibilities.

- -

Running bsim with no arguments will print a detailed usage message.

- -

Generating Signature Files

- -

The first step is to create signature files from the binaries in the Ghidra project. -Signature files are XML files which contain the BSim signatures and metadata needed by the BSim server.

- -

Important: It’s simplest to exit Ghidra before performing the next steps, because:

- - -

To generate the signature files, execute the following commands in a shell (adjust as necessary for Windows).

- -
cd <ghidra_install_dir>/support
-mkdir ~/bsim_sigs
-./bsim generatesigs ghidra:/<ghidra_project_dir>/postgres_object_files --bsim file:/<database_dir>/example ~/bsim_sigs
-
- - - -

Committing Signature Files

- -

Now, we commit the signatures to the BSim database with the following command (still in the support directory).

- -
./bsim commitsigs file:/<database_dir>/example ~/bsim_sigs 
-
- -

Once the signatures have been committed, start Ghidra again.

- -

Aside: Creating a Database

- -

We continue to use the database example, so this step isn’t necessary for the exercises.

- -

However, if we hadn’t created example using CreateH2BSimDatabaseScript.java, we could have used the following command:

- -
./bsim createdatabase file:/<database_dir>/example medium_nosize
-
- - -

Aside: Executable Categories and Function Tags

- -

It’s worth a brief note about Executable Categories and Function Tags, although they are not used in any of the following exercises.

- -

A BSim database can record user-defined metadata about an executable (executable categories) or about a function (function tags). -Categories and tags can then be used as filter elements in a BSim query. -For example, you could restrict a BSim query to search only in executables of the category “OPEN_SOURCE” or to functions which have been tagged “COMPRESSION_FUNCTIONS”.

- -

Executable categories in BSim are implemented using program properties, and function tags in BSim correspond to function tags in Ghidra. Properties and tags both have uses in Ghidra which are independent of BSim. -So, if we want a BSim database to record a particular category or tag, we must indicate that explicitly.

- -

For example, to inform the database that we wish to record the ORIGIN category, you would execute the command

- -
./bsim addexecategory file:/<database_dir>/example ORIGIN
-
- -

Executable categories can be added to a program using the script SetExecutableCategoryScript.java.

- -

Next Section: Evaluating Matches and Applying Information

diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.html deleted file mode 100755 index 331432a81c..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.html +++ /dev/null @@ -1,200 +0,0 @@ -

Basic BSim Queries

- -

In this section, we demonstrate some applications of our BSim database.

- -

Registering a BSim Database

- -

In order to query the database, you must register it with Ghidra:

- -
    -
  1. From The Code Browser, Select BSim -> Manage Servers.
  2. -
  3. In the BSim Server Manager dialog, click the green plus add server icon.
  4. -
  5. Select the File radio button and use the chooser to select example.mv.db
  6. -
  7. Click OK
  8. -
  9. Click Dismiss to close the dialog.
  10. -
- -

How to Query a BSim Database

- -

Before presenting the exercises, we describe the general mechanics of querying a BSim database.

- -

Initiating a BSim Query

- -

There are a number of ways to initiate a BSim query, including:

- - - -

For these cases, the function(s) being queried depend on the current selection. -If there is no selection, the function containing the current address is queried. -If there is a selection, all functions whose entry points are within the selection are queried. -An easy way to query all functions in a program is to select all addresses with Ctrl-A in the Listing window and then initiate a BSim query.

- -

It is also possible to initiate a BSim query from the Decompiler window. -Simply right-click on a function name token and select BSim… to query the corresponding function. -This action is available on the name token in the decompiled function’s signature as well as tokens corresponding to names of callees.

- -

All of these actions bring up the BSim Search Dialog.

- -

The BSim Search Dialog

- -

From the BSim Search Dialog, you can

- - - -

bsim search dialog icon

- -

Selecting a BSim Database

- -

To query a registered BSim database, select that server from the BSim Server drop-down.

- -

Setting Query Options

- -

Similarity and confidence are scores used to evaluate the relationship between two vectors. -The respective fields in the dialog set lower bounds for these values for the matches returned by BSim.

- - - -

Confidence is used to judge the significance of a match. -For example, many executables contain a function which simply returns a constant value. -Given two executables, each with such a function, the similarity score between the corresponding BSim vectors will be 1.0. -However, the confidence score of the match will be quite low, indicating that it is not very significant that the two executables “share” this code.

- -

In general, setting the thresholds involves a tradeoff: lower values mean that the database is more likely to return legitimate matches with significant differences, but also more likely to return matches which simply happen to share some features by chance. -The results of a BSim query can be sorted by the similarity and/or confidence of each match, so a common practice is to set the thresholds relatively low and to examine the matches in descending sort order.

- -

The Matches per Function bound controls the number of results returned for a single function. -Note that in large collections, certain small or common functions might have substantial numbers of identical matches.

- -

Filters are discussed in BSim Filters.

- -

Performing the Query

- -

Click the Search button in the dialog to perform a query.

- -

After successfully issuing a query, you will also see a Search Function(s) action (without the ellipsis) in certain contexts. -This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Search Dialog).

- -

Exercises

- -

The database example contains vectors from a Linux executable used by Ghidra’s GNU demangler. -Ghidra ships with several other versions of this executable. -We use these different versions to demonstrate some of the capabilities of BSim.

- -

Note: Use the default query settings and autoanalysis options for the exercises unless otherwise specified.

- -

Exercise: Function Identification

- -
    -
  1. Import and analyze the binary <ghidra_install_dir>/GPL/DemanglerGnu/os/win_x86_64/demangler_gnu_v2_41.exe. - -
  2. -
  3. Examine this binary in Ghidra and verify that the original function names are not present. - -
  4. -
  5. Using the default query options, query example for matches to the function at 140006760.
  6. -
  7. You should see the following search results: - search results - -
  8. -
  9. Right-click on the row of the match and perform the Compare Functions action to bring up the side-by-side comparison. - -
  10. -
  11. Examine the diff views to verify that the match is valid.
  12. -
  13. Using the Apply Name action in the BSim Search Results table, apply the name from the search result to the queried function.
  14. -
- -

Note: We cover the Decompiler Diff View in greater detail and discuss the various “Apply” actions in Evaluating Matches and Applying Information.

- -

Exercise: Changes to the Source Code

- -
    -
  1. Import and analyze the executable <ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_24. - -
  2. -
  3. Navigate to the function expandargv in demangler_gnu_v2_24 and issue a BSim query.
  4. -
  5. What differences do you see in the decompiled code of the single match? -
    In demangler_gnu_v2_41... The main differences are that call to dupargv is now in an if clause (and decompiler creates a related local variable) and there are two additional calls to free.
    -
  6. -
  7. The relevant source files are included with the Ghidra distribution: - -
  8. -
  9. Verify that the differences you found are present in the source.
  10. -
- -

Exercise: Cross-architectural Matching

- -
    -
  1. Import and analyze the executable -<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41. - -
  2. -
  3. Navigate to _expandargv and issue a BSim query. -In the decompiler diff view of the single match, what differences do you see regarding memmove and memcpy? -
    In the arm64 version... In the arm64_version, the compiler replaced these functions with __memmove_chk and __memcpy_chk. The __chk versions have an extra parameter related to preventing buffer overflows. Neither the names nor the bodies of callees are incorporated into BSim signatures, but the arguments of a call are, so this change partly explains why the BSim vectors are not identical.
    -
  4. -
  5. Examine the Listing View tab and verify that the architectures are indeed different.
  6. -
- -

A Remark on Query Thresholds and Indices

- -

Q: If you set the similarity and confidence thresholds to 0.0, will a BSim query return all of the functions in the database?

- -

A: No, because

- - -

Next Section: Ghidra from the Command Line

- diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Creating_Database_From_GUI.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Creating_Database_From_GUI.html deleted file mode 100755 index bbd0342634..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Creating_Database_From_GUI.html +++ /dev/null @@ -1,38 +0,0 @@ -

Creating and Populating a BSim Database from the Ghidra GUI

- -

This section explains how to create and populate an H2-backed BSim database from the Ghidra GUI.

- -

Creating the Database

- -

To create a BSim database, first create a directory on your file system to contain the database.

- -

Next, perform the following steps from the Ghidra Code Browser:

- -
    -
  1. Run the Ghidra script CreateH2BSimDatabaseScript.java.
  2. -
  3. In the resulting dialog: -
      -
    1. Enter “example” in the Database Name field.
    2. -
    3. Select the new directory in the Database Directory field.
    4. -
    5. Don’t change any of the other fields.
    6. -
    -
  4. -
  5. Click OK.
  6. -
- -

Populating the Database

- -

We now populate the database with an executable which is contained in the Ghidra distribution.

- -
    -
  1. Import and analyze the executable <ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_41 using the default analysis options.
  2. -
  3. Run the Ghidra script AddProgramToH2BSimDatabaseScript.java on this program. - -
  4. -
  5. In general you can run this script on other programs to add their signatures to this database, but that’s not necessary for the exercises in the next section.
  6. -
- -

Next Section: Basic BSim Queries

- diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.html deleted file mode 100755 index 2504edf8aa..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.html +++ /dev/null @@ -1,22 +0,0 @@ -

Starting Ghidra and Enabling the BSim Plugin:

- -

To begin the tutorial, perform the following steps:

- -
    -
  1. Launch Ghidra.
  2. -
  3. Create a new non-shared project for this tutorial.
  4. -
  5. Launch the Code Browser.
  6. -
- -

To enable BSim, perform the following steps:

- -
    -
  1. File -> Configure from the Code Browser.
  2. -
  3. Click on the Configure link of the BSim entry.
  4. -
  5. In the resulting dialog, ensure that the checkbox for BSimSearchPlugin is checked.
  6. -
- -

configure dialog

- -

Next Section: Creating and Populating a BSim Database from the GUI

- diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html deleted file mode 100755 index be7a5a6f1c..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html +++ /dev/null @@ -1,135 +0,0 @@ -

Evaluating Matches and Applying Information

- -

Summarizing what we’ve created over the last few sections, we now have:

-
    -
  1. A stripped executable (postgres).
  2. -
  3. A Ghidra project containing some object files with debug information1 used to build that executable.
  4. -
  5. A BSim database containing the BSim signatures of the object files.
  6. -
- -

We now demonstrate using BSim to help reverse engineer postgres. -While doing this, we’ll showcase some of the features available in the decompiler diff view.

- -

Exercise: Exploring the Highlights

- -

Import and analyze the stripped postgres executable into the tutorial project, then perform the following steps:

- -
    -
  1. Select all functions in postgres via Ctrl-A in the Listing.
  2. -
  3. Perform a BSim query of the database example. - -
  4. -
  5. Sort the rows by confidence and find the row with grouping_planner as the matching function. -The corresponding function in postgres should have a default name.
  6. -
  7. Examine this match in the side-by-side decompiler view. -Note that the matching function has better data type information due to the debug information.
  8. -
  9. Q: Why does the placement of the double argument differ between the functions? -
    Answer Floating point values and integer/pointer values are passed in separate sets of registers. -Neither ordering is wrong since both are consistent with the instructions of the function. -The debug info records a specific signature (and ordering) for the function, which Ghidra applies. -In the version without debug information, the decompiler used heuristics to determine the function's signature.
    -
  10. -
- -

For matches with a fair number of differences, the decompiler diff panel can get pretty colorful. -Furthermore, as you click around, tokens will gain and lose highlights of various colors. -It’s worth giving a brief explanation of when highlighting happens and what the different colors mean. -Some terminology: if you click on a token in a decompiler panel, that token becomes the focused token.

- -

Decomp Diff Window

- -

The colors:

- - - -

Exercise: Locking and Unlocking Scrolling

- -

By default, scrolling in the diff window is synchronized. -This means that scrolling within one window will also scroll within the other window. -In the decompiler diff window, scrolling works by matching one line in the left function with one line in the right function. -The two functions are aligned using those lines. -Initially, the functions are aligned using the functions’ signatures.

- -

As you click around in either function, the “aligning lines” will change. -If the focused token has a match, the scrolling is re-centered based on the lines containing the matched tokens. -If the focused token does not have a match, the functions will be aligned using the closest token to the focused token which does have a match.

- -

Synchronized scrolling can be toggled using the lock icon and unlock icon icons in the toolbar.

- -
    -
  1. Experiment with locking and unlocking synchronized scrolling.
  2. -
- -

Exercise: Applying Signatures

- -

If you are satisfied with a given match, you might want to apply information about the matching function to the queried function. -For example, you might want to apply the name or signature of the function. -There are some subtleties which determine how much information is safe to apply. -Hence there are three actions available under the Apply From Other menu when you right-click in the left panel:

- -
    -
  1. Function Name will apply the right function’s name and namespace to the function on the left.
  2. -
  3. Function Signature will apply the name, namespace, and “skeleton” data types. - Structure and union data types are not transferred. - Instead, empty placeholder structures are created.
  4. -
  5. Function Signature and Data Types will apply the name and signature with full data types. -This may result in many data types being imported into the program (consider structures which refer to other structures).
  6. -
- -

Warning: You should be absolutely certain that the datatypes are the exactly the same before applying signatures and data types. -If there have been any changes to a datatype’s definition, you could end up bringing incorrect datatypes into a program, even using BSim matches with 1.0 similarity. -Applying full data types is also problematic for cross-architecture matches.

- -
    -
  1. Since we know it’s safe, apply the function signature and data types to the left function.
  2. -
- -

There are similarly-named actions available on rows of the Function Matches table in the BSim Search Results window. -The Status column contains information about which rows have had their matches applied.

- -

Exercise: Comparing Callees

- -

The token matching algorithm matches a function call in one program to a function call in another by considering the data flow into and out of the CALL instruction, but it does not do anything with the bodies of the callees. -However, given a matched pair of calls, you can bring up a new comparison window for the callees with the Compare Matching Callees action.

- -
    -
  1. Click in the left panel of the decompile diff window and press Ctrl-F.
  2. -
  3. Enter FUN_ and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name.
  4. -
  5. Right-click on one of the matched tokens and perform the Compare Matching Callees action.
  6. -
  7. In the comparison of the callees, apply the function signature and data types from the right function to the left function. -Verify that the update is reflected in the decompiler diff view of the callers.
  8. -
- -

Exercise: Multiple Comparisons

- -

The function shown in a panel is controlled by a drop-down menu at the top of the panel. -This can be useful when you’d like to evaluate multiple matches to a single function.

- -

Exercise:

- -
    -
  1. In the BSim Search Results window, right-click on a table column name, select Add/Remove Columns, and enable the Matches column.
  2. -
  3. Find two functions in postgres, each of which has exactly two matches. -Select the corresponding four rows in the matches table and perform the Compare Functions action.
  4. -
  5. Experiment with the drop-downs in each panel.
  6. -
- -

In the next section, we discuss the Executable Results table.

- -

Next Section: From Matching Functions to Matching Executables

-
-
    -
  1. -

    Having debug information isn’t necessary to use BSim (as we’ve seen in a previous exercise), but it is convenient. Note that applying debug information can change BSim signatures, which can negatively impact matching between functions with debug information and functions without it. 

    -
  2. -
-
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.html deleted file mode 100755 index 144fb2c2e3..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.html +++ /dev/null @@ -1,46 +0,0 @@ -

From Matching Functions to Matching Executables

- -

In this section, we discuss the Executable Results table. -Each row of this table corresponds to one executable in the database. -The information in one row is an aggregation of all of the function-level matches into that row’s executable. -Your Executable Results table from the previous query should look similar to the following:

- -

executable results

- -

If you select a single row in the table and right-click on it, you will see the following actions:

- - - -

Exercise

- -
    -
  1. Sort the Executable results by descending Function Count. -An entry in this column shows the number of queried functions which have at least one match in the row’s executable (if foo has 2 or more matches into a given executable, it still only contributes 1 to the function count). -What position is demangler_gnu_v2_41? -
    In this table... It's in the first position.
    -
  2. -
  3. An entry in the Confidence column shows the sum of the confidence scores of all matches into the corresponding executable. -If foo has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score. -Sort the Executable results by descending confidence and observe that demangler_gnu_v2_41 is now much further down the list. -
    What could explain this? If there are many function matches but the sum of all the confidences is relatively low, it is likely that many of the matches involve small functions with common BSim signatures.
    -
  4. -
  5. In the Executable match table, right click on demangler_gnu_v2_41 and apply the filter action. -Sort the filtered function matches by descending confidence. -Starting at the top, examine some of the matches and convince yourself that the given explanation is correct. - -
  6. -
- -

From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action. -Keep in mind that such functions can “pollute” the results of a blanket query. -In the next section, we demonstrate a technique to restrict queries to functions which are more likely to have meaningful matches.

- -

Next Section: Overview Queries

diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.html deleted file mode 100755 index 282901dfcc..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.html +++ /dev/null @@ -1,21 +0,0 @@ -

BSim Filters

- -

There are a number of filters that can be applied to BSim queries, involving names, architectures, compilers, ingest dates, user-defined executable categories, and other attributes.

- -

Filters be can applied server-side or client-side. -Server-side filters affect the query results sent to Ghidra from a BSim server and can be applied using the Filters drop-down in the BSim Search dialog. -Client-side filters apply to the BSim Search results table and can be added and removed at will using the Filter Results icon Filter Results. -However, to “undo” a server-side filter, you have to issue another BSim query without the filter.

- -

Exercise: Filters

- -
    -
  1. Select all functions in postgres and bring up the BSim Search dialog.
  2. -
  3. Apply an Executable name does not equal filter with demangler_gnu_v2_41 as the name to exclude.
  4. -
  5. Perform the query and verify demangler_gnu_v2_41 is not in the list of executables with matches.
  6. -
  7. Using the Search Info icon Search Info in the BSim Search Results toolbar, you can see the server-side filters applied to the query. -Verify that this information is correct.
  8. -
  9. Using the Filter Results icon Filter Results, you can apply client-side filters to the query results. Experiment with applying and removing some client-side filters.
  10. -
- -

Next Section: Scripting and Visualization

diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html deleted file mode 100755 index ca7ef85e74..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html +++ /dev/null @@ -1,53 +0,0 @@ -

Ghidra Analysis from the Command Line

- -

For the remaining exercises, we need to populate our BSim database with a number of binaries. -We’d like a consistent set of binaries for the tutorial, but we don’t want to clutter the Ghidra distribution with dozens of additional executables. -Fortunately, the BSim plugin includes a script for building the PostgreSQL backend, and that build process creates hundreds of object files. -So we can just build PostgreSQL and harvest the object files we need.

- -

Note: For the tutorial, we continue to use the H2 BSim backend. -We do not run any PostgreSQL code, we simply analyze some files produced when building PostgreSQL.

- -

Note that these files must be built on a machine running Linux. -Windows users can build these files in a Linux virtual machine.

- -

To build the files, execute the following commands in a shell: 1

- -
cd <ghidra_install_dir>/Features/BSim
-export CFLAGS="-O2 -g"
-./make-postgres.sh
-mkdir ~/postgres_object_files
-cd build
-find . -name p*o -size +100000c -size -700000c -exec cp {} ~/postgres_object_files/ \;
-cd os/linux_x86_64/postgresql/bin
-strip -s postgres
-
- -

To continue on Windows, transfer the ~/postgres_object_files directory and the stripped postgres executable to your Windows machine.

- -

Importing and Analyzing the Exercise Files

- -

Now that we have the executables, we can analyze them with the headless analyzer2. -The headless analyzer is distinct from BSim, but using it is the only feasible way to analyze substantial numbers of binaries.

- -

To analyze the files in Linux, execute the following commands in a shell.

- -
cd <ghidra_install_dir>/support
-./analyzeHeadless <ghidra_project_dir> postgres_object_files -import ~/postgres_object_files/*
-
-

(On windows, use analyzeHeadless.bat and adjust paths accordingly.)

- -

This will create a local Ghidra project called postgres_object_files in the directory <ghidra_project_dir>.

- -

Next Section: BSim from the Command Line

- -
-
    -
  1. -

    You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully. The error messages are generally informative. See the comments in make-postgres.sh

    -
  2. -
  3. -

    The headless analyzer has its own documentation: <ghidra_install_dir>/support/analyzeHeadlessREADME.html

    -
  4. -
-
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Intro.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Intro.html deleted file mode 100755 index 258a3b9c9a..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Intro.html +++ /dev/null @@ -1,114 +0,0 @@ -

Introduction to BSim

- -

As you’ve reverse engineered software, you’ve likely asked the following questions:

- - - -

BSim is intended to help with these questions (and others) by providing a way to search collections of binaries for similar, but not necessarily identical, functions.

- -

How Does BSim Work?

- -

The idea behind BSim is to generate a feature vector for each function in a binary. -The vectors are generated by Ghidra’s decompiler. -Each feature represents a small piece of data flow and/or control flow of the associated function. -The decompiler normalizes the feature vector representation so that different, but functionally equivalent, pieces of code often produce the same features. -Certain attributes, such as values of constants, names of registers, and data types, are intentionally not incorporated into the features.

- -

BSim vectors are compared using cosine similarity. -Discrepancies between the vectors for foo and bar which are caused by differences in compilers, target architectures, and/or small changes to the source code typically result in vectors which are close but not identical.

- -

BSim vectors can be stored in a dedicated database. -BSim databases intended to hold large1 numbers of vectors maintain an index based on locality-sensitive hashing. -The index drastically reduces the number of vector comparisons needed and allows for rapid retrieval of results.

- -

Querying foo against a BSim database typically yields a number of potential matches. -Each individual match for foo can be compared to foo in a side-by-side view, and certain information (such as function name) can be quickly copied from a match to foo.

- -

We frequently call BSim vectors the BSim signature of a function, or just the signature when the context is clear.

- -

Why “BSim”?

- -

We can think of each feature as representing a small piece of the behavior of a function, analogous to a snippet of source code. -Functions whose BSim vectors are close typically have many features in common, that is, they have similar behavior. -Hence the name “BSim”: Behavioral Similiarity.

- -

BSim Clients, BSim Databases, and Ghidra Projects

- -

Using BSim involves the following components:

- - - -

Database Backends

- -

There are three supported database backends for BSim:

- -
    -
  1. -

    PostgreSQL

    - - -
  2. -
  3. -

    Elasticsearch

    - - -
  4. -
  5. -

    H2

    - - -
  6. -
- -

Next Section: Starting Ghidra and Enabling BSim

- -
-
    -
  1. -

    Creating a database requires a database template, which determines the specifics of the index. Currently, Ghidra provides a medium template, intended for databases holding up to 10 million unique vectors, and a large template, intended for databases holding up to 100 million unique vectors. 

    -
  2. -
-
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.html deleted file mode 100755 index 167d54c324..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.html +++ /dev/null @@ -1,51 +0,0 @@ -

Overview Queries

- -

An Overview Query queries a BSim database for the number of matches to each function in an executable. -The matching functions themselves are not returned. -Similarity and Confidence thresholds can be set for an Overview Query, but there is no “Matches per Function” bound and no filters can be applied.

- -

To perform an Overview Query, select BSim -> Perform Overview… from the Code Browser.

- -

Exercise: Hit Counts and Self-Significance

- -
    -
  1. Perform an Overview query on postgres using the default query thresholds. -You should see the following result: -overview window
  2. -
  3. Sort the table by the “Hit Count” column in ascending order. Typically, the functions with the largest hit counts will have low self-significance. -Verify that that is the case for this table.
  4. -
  5. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions? -
    Answer: These are all instances of PostgreSQL statistics-reporting functions. Their bodies are quite similar and they have identical BSim signatures.
    -
  6. -
- -

Exercise: Selections and Queries

- -

Using the hit count column, it is possible to exclude functions with large numbers of matches.

- -
    -
  1. In the Overview Table, select all functions whose hit count is 2 or less.
  2. -
  3. Right-click on the selection and perform the Search Selected Functions action. -Sort the query results by descending Function Count and verify that demangler_gnu_v2_41 is far down the list.
  4. -
- -

Exercise: Vector Hashes

- -

Suppose foo and bar have the same number of hits in the Overview table. -There are two possibilities:

-
    -
  1. foo and bar have distinct feature vectors which happen to have the same number of matches.
  2. -
  3. foo and bar have the same feature vector.
  4. -
- -

An optional column, Vector Hash, can be used to distinguish between these two cases.

- -
    -
  1. Enable the Vector Hash Column in the Overview Table.
  2. -
  3. Find two functions with the same vector hash.
  4. -
  5. Select the two corresponding rows in the table and then transfer the selection to the Listing using the make selection icon icon in the BSim Overview toolbar.
  6. -
  7. In the Listing, press Shift-C or right-click and perform the Compare Selected Functions action.
  8. -
  9. In the resulting Function Comparison window, convince yourself that these two functions should have the same BSim signature.
  10. -
- -

Next Section: Queries and Filters

diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.html deleted file mode 100755 index c2d3c4c2d1..0000000000 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.html +++ /dev/null @@ -1,23 +0,0 @@ -

Scripting and Visualization

- -

Finally, we briefly mention a few other topics related to BSim.

- -

Scripting BSim

- -

There are are number of example scripts in the BSim script category, which demonstrate how to interact with BSim programmatically.

- -

script manager

- -

Visualizing Features

- -

Finally, if you’d like to see the particular BSim features in a function, you can use the BSim Feature Visualizer. -This plugin allows you to highlight regions of the decompiled code corresponding to a particular feature and to display a graph representing the feature.

- -

To use this plugin, first enable the BSimFeatureVisualizerPlugin via File -> Configure from the Code Browser. -You can then bring it up via BSim -> BSim Feature Visualizer.

- -

feature visualizer

- -

This is the end of the tutorial.

- -

Return to the Beginning

diff --git a/GhidraDocs/GhidraClass/BSim/README.html b/GhidraDocs/GhidraClass/BSim/README.html deleted file mode 100755 index e6b4c0082f..0000000000 --- a/GhidraDocs/GhidraClass/BSim/README.html +++ /dev/null @@ -1,24 +0,0 @@ -

BSim Tutorial

- -

BSim is a Ghidra plugin for finding structurally similar functions in (potentially large) collections of binaries. -It is based on Ghidra’s decompiler and can find matches across compilers, architectures, and/or small changes to source code.

- -

This tutorial demonstrates how create a small BSim database and walks through some typical use cases.

- -

Detailed information about BSim can be found in the “BSim” entry of the Ghidra Help.

- -
    -
  1. Introduction to BSim
  2. -
  3. Starting Ghidra and Enabling BSim
  4. -
  5. Creating and Populating a BSim Database from the GUI
  6. -
  7. Basic BSim Queries
  8. -
  9. Ghidra from the Command Line
  10. -
  11. BSim from the Command Line
  12. -
  13. Evaluating Matches
  14. -
  15. From Matching Functions to Matching Executables
  16. -
  17. Overview Queries
  18. -
  19. BSim Filters
  20. -
  21. Scripting and Visualization
  22. -
- -

Next Section: Introduction to BSim

diff --git a/GhidraDocs/build.gradle b/GhidraDocs/build.gradle index 97cded2550..076d90275e 100644 --- a/GhidraDocs/build.gradle +++ b/GhidraDocs/build.gradle @@ -55,4 +55,8 @@ rootProject.assembleMarkdownToHtml { from ("${this.projectDir}/InstallationGuide.md") { into "docs" } + from ("${this.projectDir}/GhidraClass/BSim") { + include "*.md" + into "docs/GhidraClass/BSim" + } }