diff --git a/Ghidra/Features/BSim/ghidra_scripts/AddProgramToH2BSimDatabaseScript.java b/Ghidra/Features/BSim/ghidra_scripts/AddProgramToH2BSimDatabaseScript.java index f031634822..91ca083fc4 100644 --- a/Ghidra/Features/BSim/ghidra_scripts/AddProgramToH2BSimDatabaseScript.java +++ b/Ghidra/Features/BSim/ghidra_scripts/AddProgramToH2BSimDatabaseScript.java @@ -13,9 +13,8 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -//Generate BSim signatures for the current program. The URL for the program is -//created from the local storage location. These signatures are intended for the -//in-memory database backend. +//Generates and commits the BSim signatures for the currentProgram to the +//selected H2 BSim database //@category BSim import java.io.File; import java.io.IOException; @@ -41,9 +40,6 @@ import ghidra.program.model.listing.FunctionManager; import ghidra.util.MessageType; import ghidra.util.Msg; -//@category BSim -//Generates and commits the BSim signatures for the currentProgram to the -//selected H2 BSim database public class AddProgramToH2BSimDatabaseScript extends GhidraScript { private static final String DATABASE = "H2 Database"; diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html new file mode 100755 index 0000000000..aef41cf5d3 --- /dev/null +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html @@ -0,0 +1,83 @@ +
The bsim
command-line utility, located in the support
directory of a Ghidra distribution, is used to create, populate, and manage BSim databases.
+It works for all BSim database backends.
+This utility offers a number of commands, many of which have several options.
+In this section, we cover only a small subset of the possibilities.
Running bsim
with no arguments will print a detailed usage message.
The first step is to create signature files from the binaries in the Ghidra project. +Signature files are XML files which contain the BSim signatures and metadata needed by the BSim server.
+ +Important: It’s simplest to exit Ghidra before performing the next steps, because:
+postgres_object_files
project open in Ghidra, signature generation will fail.
+Non-shared projects are locked when open, and the lock will prevent the signature-generating process from accessing the project.To generate the signature files, execute the following commands in a shell (adjust as necessary for Windows).
+ +cd <ghidra_install_dir>/support
+mkdir ~/bsim_sigs
+./bsim generatesigs ghidra:/<ghidra_project_dir>/postgres_object_files bsim=file:/<database_dir>/example ~/bsim_sigs
+
+
+ghidra:/
argument is the local project which holds the analyzed binaries.
+Note that there is only one forward slash in the URL for a local project.bsim=
argument is the URL of the BSim database.
+This command does not add any signatures to the database, but it does query the database for its settings.Now, we commit the signatures to the BSim database with the following command (still in the support
directory).
./bsim commitsigs file:/<database_dir>/example ~/bsim_sigs
+
+
+Once the signatures have been committed, start Ghidra again.
+ +We continue to use the database example
, so this step isn’t necessary for the exercises.
However, if we hadn’t created example
using CreateH2BSimDatabaseScript.java
, we could have used the following command:
./bsim createdatabase file:/<database_dir>/example medium_nosize
+
+medium_nosize
is a database template.
+ createdatabase
command can also be used to create a BSim database on a PostgreSQL or Elasticsearch server, provided the servers are configured and running.
+See the “BSim” entry in the Ghidra help for details.It’s worth a brief note about Executable Categories and Function Tags, although they are not used in any of the following exercises.
+ +A BSim database can record user-defined metadata about an executable (executable categories) or about a function (function tags). +Categories and tags can then be used as filter elements in a BSim query. +For example, you could restrict a BSim query to search only in executables of the category “OPEN_SOURCE” or to functions which have been tagged “COMPRESSION_FUNCTIONS”.
+ +Executable categories in BSim are implemented using program properties, and function tags in BSim correspond to function tags in Ghidra. Properties and tags both have uses in Ghidra which are independent of BSim. +So, if we want a BSim database to record a particular category or tag, we must indicate that explicitly.
+ +For example, to inform the database that we wish to record the ORIGIN category, you would execute the command
+ +./bsim addexecategory file:/<database_dir>/example ORIGIN
+
+
+Executable categories can be added to a program using the script SetExecutableCategoryScript.java
.
Next Section: Evaluating Matches and Applying Information
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.md index a37261f94f..01b03cc887 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_BSim_Command_Line.md @@ -10,7 +10,7 @@ Running ``bsim`` with no arguments will print a detailed usage message. ## Generating Signature Files The first step is to create signature files from the binaries in the Ghidra project. -Signature files are XML files which contain the BSim vectors and other metadata needed by the BSim server. +Signature files are XML files which contain the BSim signatures and metadata needed by the BSim server. **Important**: It's simplest to exit Ghidra before performing the next steps, because: - The H2-backed database can only be accessed by one process at a time. @@ -44,7 +44,7 @@ Once the signatures have been committed, start Ghidra again. We continue to use the database ``example``, so this step isn't necessary for the exercises. -However, if we hadn't created ``example`` using a script, we could have used the following command: +However, if we hadn't created ``example`` using ``CreateH2BSimDatabaseScript.java``, we could have used the following command: ```bash ./bsim createdatabase file:/In this section, we demonstrate some applications of our BSim database.
+ +In order to query the database, you must register it with Ghidra:
+ +example.mv.db
Before presenting the exercises, we describe the general mechanics of querying a BSim database.
+ +There are a number of ways to initiate a BSim query, including:
+ +For these cases, the function(s) being queried depend on the current selection.
+If there is no selection, the function containing the current address is queried.
+If there is a selection, all functions whose entry points are within the selection are queried.
+An easy way to query all functions in a program is to select all addresses with Ctrl-A
in the Listing window and then initiate a BSim query.
It is also possible to initiate a BSim query from the Decompiler window. +Simply right-click on a function name token and select BSim… to query the corresponding function. +This action is available on the name token in the decompiled function’s signature as well as tokens corresponding to names of callees.
+ +All of these actions bring up the BSim Search Dialog.
+ +From the BSim Search Dialog, you can
+ +To query a registered BSim database, select that server from the BSim Server drop-down.
+ +Similarity and confidence are scores used to evaluate the relationship between two vectors. +The respective fields in the dialog set lower bounds for these values for the matches returned by BSim.
+ +Confidence is used to judge the significance of a match. +For example, many executables contain a function which simply returns a constant value. +Given two executables, each with such a function, the similarity score between the corresponding BSim vectors will be 1.0. +However, the confidence score of the match will be quite low, indicating that it is not very significant that the two executables “share” this code.
+ +In general, setting the thresholds involves a tradeoff: lower values mean that the database is more likely to return legitimate matches with significant differences, but also more likely to return matches which simply happen to share some features by chance. +The results of a BSim query can be sorted by the similarity and/or confidence of each match, so a common practice is to set the thresholds relatively low and to examine the matches in descending sort order.
+ +The Matches per Function bound controls the number of results returned for a single function. +Note that in large collections, certain small or common functions might have substantial numbers of identical matches.
+ +Filters are discussed in BSim Filters.
+ +Click the Search button in the dialog to perform a query.
+ +After successfully issuing a query, you will also see a Search Function(s) action (without the ellipsis) in certain contexts. +This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Search Dialog).
+ +The database example
contains vectors from a Linux executable used by Ghidra’s GNU demangler.
+Ghidra ships with several other versions of this executable.
+We use these different versions to demonstrate some of the capabilities of BSim.
Note: Use the default query settings and autoanalysis options for the exercises unless otherwise specified.
+ +<ghidra_install_dir>/GPL/DemanglerGnu/os/win_x86_64/demangler_gnu_v2_41.exe
.
+ demangler_gnu_v2_41
but compiled with Visual Studio instead of GCC.demangler_gnu_v2_41
.example
for matches to the function at 140006760
.Note: We cover the Decompiler Diff View in greater detail and discuss the various “Apply” actions in Evaluating Matches and Applying Information.
+ +<ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_24
.
+ example
.expandargv
in demangler_gnu_v2_24
and issue a BSim query.<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler_gnu_v2_24/c/argv.c
<ghidra_install_dir>/GPL/DemanglerGnu/src/demangler_gnu_v2_41/c/argv.c
<ghidra_install_dir>/GPL/DemanglerGnu/os/mac_arm_64/demangler_gnu_v2_41
.
+ example
but compiled for a different architecture._expandargv
and issue a BSim query.
+In the decompiler diff view of the single match, what differences do you see regarding memmove
and memcpy
?
+ Q: If you set the similarity and confidence thresholds to 0.0, will a BSim query return all of the functions in the database?
+ +A: No, because
+Next Section: Ghidra from the Command Line
+ diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.md index c14a6885c6..c56d29908a 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Basic_Queries.md @@ -7,7 +7,7 @@ In this section, we demonstrate some applications of our BSim database. In order to query the database, you must register it with Ghidra: 1. From The Code Browser, Select **BSim -> Manage Servers**. -1. In the BSim Server Manager dialog, click the green plus. +1. In the BSim Server Manager dialog, click the green plus . 1. Select the **File** radio button and use the chooser to select ``example.mv.db`` 1. Click **OK** 1. Click **Dismiss** to close the dialog. @@ -27,7 +27,7 @@ There are a number of ways to initiate a BSim query, including: For these cases, the function(s) being queried depend on the current selection. If there is no selection, the function containing the current address is queried. If there is a selection, all functions whose entry points are within the selection are queried. -For example, to query all functions in the program, first select all addresses in the program via ``Ctrl-A`` in the Listing window. +An easy way to query all functions in a program is to select all addresses with ``Ctrl-A`` in the Listing window and then initiate a BSim query. It is also possible to initiate a BSim query from the Decompiler window. Simply right-click on a function name token and select **BSim...** to query the corresponding function. @@ -44,7 +44,7 @@ From the BSim Search Dialog, you can - Bound the number of results returned for each function. - Set query filters. - + #### Selecting a BSim Database @@ -86,7 +86,7 @@ Filters are discussed in [BSim Filters](BSimTutorial_Filters.md). Click the **Search** button in the dialog to perform a query. After successfully issuing a query, you will also see a **Search Function(s)** action (without the ellipsis) in certain contexts. -This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Seach Dialog). +This will perform a BSim query on the selected functions using the same parameters as the last query (skipping the BSim Search Dialog). ## Exercises @@ -96,7 +96,7 @@ We use these different versions to demonstrate some of the capabilities of BSim. **Note**: Use the default query settings and autoanalysis options for the exercises unless otherwise specified. -### Exercise 1: Function Identification +### Exercise: Function Identification 1. Import and analyze the binary ``This section explains how to create and populate an H2-backed BSim database from the Ghidra GUI.
+ +To create a BSim database, first create a directory on your file system to contain the database.
+ +Next, perform the following steps from the Ghidra Code Browser:
+ +CreateH2BSimDatabaseScript.java
.We now populate the database with an executable which is contained in the Ghidra distribution.
+ +<ghidra_install_dir>/GPL/DemanglerGnu/os/linux_x86_64/demangler_gnu_v2_41
using the default analysis options.AddProgramToH2BSimDatabaseScript.java
on this program.
+ example.mv.db
in the database directory.Next Section: Basic BSim Queries
+ diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.html new file mode 100755 index 0000000000..2504edf8aa --- /dev/null +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.html @@ -0,0 +1,22 @@ +To begin the tutorial, perform the following steps:
+ +To enable BSim, perform the following steps:
+ +Configure
link of the BSim
entry.BSimSearchPlugin
is checked.Next Section: Creating and Populating a BSim Database from the GUI
+ diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.md index 429a6bb36a..8737cb6561 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Enabling.md @@ -12,7 +12,7 @@ To enable BSim, perform the following steps: 1. Click on the ``Configure`` link of the ``BSim`` entry. 1. In the resulting dialog, ensure that the checkbox for ``BSimSearchPlugin`` is checked. - + Next Section: [Creating and Populating a BSim Database from the GUI](BSimTutorial_Creating_Database_From_GUI.md) diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html new file mode 100755 index 0000000000..f70196747b --- /dev/null +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html @@ -0,0 +1,135 @@ +Summarizing what we’ve created over the last few sections, we now have:
+postgres
).We now demonstrate using BSim to help reverse engineer postgres
.
+While doing this, we’ll showcase some of the features available in the decompiler diff view.
Import and analyze the stripped postgres
executable into the tutorial project, then perform the following steps:
postgres
via Ctrl-A
in the Listing.example
.
+ grouping_planner
as the matching function.
+The corresponding function in postgres
should have a default name.double
argument between the functions?
+ For matches with a fair number of differences, the decompiler diff panel can get pretty colorful. +Furthermore, as you click around, tokens will gain and lose highlights of various colors. +It’s worth giving a brief explanation of when highlighting happens and what the different colors mean. +Some terminology: if you click on a token in a decompiler panel, that token becomes the focused token.
+ +The colors:
+ +By default, scrolling in the diff window is synchronized. +This means that scrolling within one window will also scroll within the other window. +In the decompiler diff window, scrolling works by matching one line in the left function with one line in the right function. +The two functions are aligned using those lines. +Initially, the functions are aligned using the functions’ signatures.
+ +As you click around in either function, the “aligning lines” will change. +If the focused token has a match, the scrolling is re-centered based on the lines containing the matched tokens. +If the focused token does not have a match, the functions will be aligned using the closest token to the focused token which does have a match.
+ +Synchronized scrolling can be toggled using the and
icons in the toolbar.
If you are satisfied with a given match, you might want to apply information about the matching function to the queried function. +For example, you might want to apply the name or signature of the function. +There are some subtleties which determine how much information is safe to apply. +Hence there are three actions available under the Apply From Other menu when you right-click in the left panel:
+ +Warning: You should be absolutely certain that the datatypes are the exactly the same before applying signatures and data types. +If there have been any changes to a datatype’s definition, you could end up bringing incorrect datatypes into a program, even using BSim matches with 1.0 similarity. +Applying full data types is also problematic for cross-architecture matches.
+ +There are similarly-named actions available on rows of the Function Matches table in the BSim Search Results window. +The Status column contains information about which rows have had their matches applied.
+ +The token matching algorithm matches a function call in one program to a function call in another by considering the data flow into and out of the CALL
instruction, but it does not do anything with the bodies of the callees.
+However, given a matched pair of calls, you can bring up a new comparison window for the callees with the Compare Matching Callees action.
Ctrl-F
.FUN_
and search for matched function calls where the callee in the left window has a default name and the callee in the right window has a non-default name.The function shown in a panel is controlled by a drop-down menu at the top of the panel. +This can be useful when you’d like to evaluate multiple matches to a single function.
+ +Exercise:
+ +postgres
, each of which has exactly two matches.
+Select the corresponding four rows in the matches table and perform the Compare Functions action.In the next section, we discuss the Executable Results table.
+ +Next Section: From Matching Functions to Matching Executables
+Having debug information isn’t necessary to use BSim (as we’ve seen in a previous exercise), but it is convenient. Note that applying debug information can change BSim signatures, which can negatively impact matching between functions with debug information and functions without it. ↩
+In this section, we discuss the Executable Results table. +Each row of this table corresponds to one executable in the database. +The information in one row is an aggregation of all of the function-level matches into that row’s executable. +Your Executable Results table from the previous query should look similar to the following:
+ +If you select a single row in the table and right-click on it, you will see the following actions:
+ +foo
has 2 or more matches into a given executable, it still only contributes 1 to the function count).
+What position is demangler_gnu_v2_41
?
+ foo
has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score.
+Sort the Executable results by descending confidence and observe that demangler_gnu_v2_41
is now much further down the list.
+ demangler_gnu_v2_41
and apply the filter action.
+Sort the filtered function matches by descending confidence.
+Starting at the top, examine some of the matches and convince yourself that the given explanation is correct.
+ From this exercise, we see that unrelated functions can be duplicates of each other, either because they are small or because they perform a common generic action. +Keep in mind that such functions can “pollute” the results of a blanket query. +In the next section, we demonstrate a technique to restrict queries to functions which are more likely to have meaningful matches.
+ +Next Section: Overview Queries
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.md index c0768f40f9..bbe37b0184 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Exe_Results.md @@ -23,15 +23,15 @@ If you select a single row in the table and right-click on it, you will see the 1. An entry in the **Confidence** column shows the sum of the confidence scores of all matches into the corresponding executable. If ``foo`` has more than one match into a given executable, only the one with the highest (function-level) confidence contributes to the (executable-level) confidence score. Sort the Executable results by descending confidence and observe that ``demangler_gnu_v2_41`` is now much further down the list. -There are a number of filters that can be applied to BSim queries, involving names, architectures, compilers, ingest dates, user-defined executable categories, and other attributes.
+ +Filters be can applied server-side or client-side.
+Server-side filters affect the query results sent to Ghidra from a BSim server and can be applied using the Filters drop-down in the BSim Search dialog.
+Client-side filters apply to the BSim Search results table and can be added and removed at will using the Filter Results icon .
+However, to “undo” a server-side filter, you have to issue another BSim query without the filter.
postgres
and bring up the BSim Search dialog.demangler_gnu_v2_41
as the name to exclude.demangler_gnu_v2_41
is not in the list of executables with matches.Next Section: Scripting and Visualization
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.md index 8821bf5454..c6c9d7adfd 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Filters.md @@ -1,13 +1,13 @@ # BSim Filters -There are a number of filters that can be applied to BSim queries, involving names, architectures, compilers, ingest dates, user-defined executable categories, and many other attributes. +There are a number of filters that can be applied to BSim queries, involving names, architectures, compilers, ingest dates, user-defined executable categories, and other attributes. Filters be can applied *server-side* or *client-side*. -Server-side filters affect the query results sent to Ghidra from a BSim server. -Client-side filters apply to the BSim Search results table and can be added and removed at will. -However, to "undo" a server-side filter, you have to issue an additional BSim query without the filter. +Server-side filters affect the query results sent to Ghidra from a BSim server and can be applied using the **Filters** drop-down in the BSim Search dialog. +Client-side filters apply to the BSim Search results table and can be added and removed at will using the **Filter Results** icon . +However, to "undo" a server-side filter, you have to issue another BSim query without the filter. + -Server-side filters can be applied using the **Filters** drop-down in the BSim Search dialog. ## Exercise: Filters diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html new file mode 100755 index 0000000000..6630087b96 --- /dev/null +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html @@ -0,0 +1,56 @@ +For the remaining exercises, we need to populate our BSim database with a number of binaries. +We’d like a consistent set of binaries for the tutorial, but we don’t want to clutter the Ghidra distribution with dozens of additional executables. +Fortunately, the BSim plugin includes a script for building the PostgreSQL backend, and that build process creates hundreds of object files. +So we can just build PostgreSQL and harvest the object files we need.
+ +Note: For the tutorial, we continue to use the H2 BSim backend. +We do not run any PostgreSQL code, we simply analyze some files produced when building PostgreSQL.
+ +Note that these files must be built on a machine running Linux. +Windows users can build these files in a Linux virtual machine.
+ +First, download postgresql-15.3.tar.gz
from the PostgreSQL web site.
+Put this file in <ghidra_install_dir>/Ghidra/Features/BSim
.
To build the files, execute the following commands in a shell: 1
+ +cd <ghidra_install_dir>/Features/BSim
+export CFLAGS="-O2 -g"
+./make-postgres.sh
+mkdir ~/postgres_object_files
+cd build
+find . -name p*o -size +100000c -size -700000c -exec cp {} ~/postgres_object_files/ \;
+cd os/linux_x86_64/postgresql/bin
+strip -s postgres
+
+
+To continue on Windows, transfer the ~/postgres_object_files
directory and the stripped postgres
executable to your Windows machine.
Now that we have the executables, we can analyze them with the headless analyzer2. +The headless analyzer is distinct from BSim, but using it is the only feasible way to analyze substantial numbers of binaries.
+ +To analyze the files in Linux, execute the following commands in a shell.
+ +cd <ghidra_install_dir>/support
+./analyzeHeadless <ghidra_project_dir> postgres_object_files -import ~/postgres_object_files/*
+
+(On windows, use analyzeHeadless.bat
and adjust paths accordingly.)
This will create a local Ghidra project called postgres_object_files
in the directory <ghidra_project_dir>
.
Next Section: BSim from the Command Line
+ +You may need to install additional packages and/or change some build options in order for PostgreSQL to build successfully. The error messages are generally informative. See the comments in make-postgres.sh
. ↩
The headless analyzer has its own documentation: <ghidra_install_dir>/support/analyzeHeadlessREADME.html
. ↩
As you’ve reverse engineered software, you’ve likely asked the following questions:
+ +BSim is intended to help with these questions (and others) by providing a way to search collections of binaries for similar, but not necessarily identical, functions.
+ +The idea behind BSim is to generate a feature vector for each function in a binary. +The vectors are generated by Ghidra’s decompiler. +Each feature represents a small piece of data flow and/or control flow of the associated function. +The decompiler normalizes the feature vector representation so that different, but functionally equivalent, pieces of code often produce the same features. +Certain attributes, such as values of constants, names of registers, and data types, are intentionally not incorporated into the features.
+ +BSim vectors are compared using cosine similarity.
+Discrepancies between the vectors for foo
and bar
which are caused by differences in compilers, target architectures, and/or small changes to the source code typically result in vectors which are close but not identical.
BSim vectors can be stored in a dedicated database. +BSim databases intended to hold large1 numbers of vectors maintain an index based on locality-sensitive hashing. +The index drastically reduces the number of vector comparisons needed and allows for rapid retrieval of results.
+ +Querying foo
against a BSim database typically yields a number of potential matches.
+Each individual match for foo
can be compared to foo
in a side-by-side view, and certain information (such as function name) can be quickly copied from a match to foo
.
We frequently call BSim vectors the BSim signature of a function, or just the signature when the context is clear.
+ +We can think of each feature as representing a small piece of the behavior of a function, analogous to a snippet of source code. +Functions whose BSim vectors are close typically have many features in common, that is, they have similar behavior. +Hence the name “BSim”: Behavioral Similiarity.
+ +Using BSim involves the following components:
+ +There are three supported database backends for BSim:
+ +PostgreSQL
+ +Elasticsearch
+ +BSimElasticPlugin
extension contains an Elasticsearch plugin for BSim.H2
+ +Next Section: Starting Ghidra and Enabling BSim
+ +Creating a database requires a database template, which determines the specifics of the index. Currently, Ghidra provides a medium template, intended for databases holding up to 10 million unique vectors, and a large template, intended for databases holding up to 100 million unique vectors. ↩
+An Overview Query queries a BSim database for the number of matches to each function in an executable. +The matching functions themselves are not returned. +Similarity and Confidence thresholds can be set for an Overview Query, but there is no “Matches per Function” bound and no filters can be applied.
+ +To perform an Overview Query, select BSim -> Perform Overview… from the Code Browser.
+ +postgres
using the default query thresholds.
+You should see the following result:
+Using the hit count column, it is possible to exclude functions with large numbers of matches.
+ +demangler_gnu_v2_41
is far down the list.Suppose foo
and bar
have the same number of hits in the Overview table.
+There are two possibilities:
foo
and bar
have distinct feature vectors which happen to have the same number of matches.foo
and bar
have the same feature vector.An optional column, Vector Hash, can be used to distinguish between these two cases.
+ +Shift-C
or right-click and perform the Compare Selected Functions action.Next Section: Queries and Filters
diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.md index 2286673cee..b2034c01bc 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Overview_Queries.md @@ -2,11 +2,11 @@ An **Overview Query** queries a BSim database for the number of matches to each function in an executable. The matching functions themselves are not returned. -Similarity and Confidence thresholds can be set for an Overview query, but there is no "Matches per Function" bound and no filters can be set. +Similarity and Confidence thresholds can be set for an Overview Query, but there is no "Matches per Function" bound and no filters can be applied. To perform an Overview Query, select **BSim -> Perform Overview...** from the Code Browser. -## Exercise 1: Hit Counts and Self-Significance. +## Exercise: Hit Counts and Self-Significance 1. Perform an Overview query on ``postgres`` using the default query thresholds. You should see the following result: @@ -14,9 +14,9 @@ You should see the following result: 1. Sort the table by the "Hit Count" column in ascending order. Typically, the functions with the largest hit counts will have low self-significance. Verify that that is the case for this table. 1. Q: Examine the functions with the highest hit count. Why are there so many matches for these functions? -Finally, we briefly mention a few other topics related to BSim.
+ +There are are number of example scripts in the BSim
script category, which demonstrate how to interact with BSim programmatically.
Finally, if you’d like to see the particular BSim features in a function, you can use the BSim Feature Visualizer. +This plugin allows you to highlight regions of the decompiled code corresponding to a particular feature and to display a graph representing the feature.
+ +To use this plugin, first enable the BSimFeatureVisualizerPlugin
via File -> Configure from the Code Browser.
+You can then bring it up via BSim -> BSim Feature Visualizer.
This is the end of the tutorial.
+ + diff --git a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.md b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.md index 37be02df9b..37f52720f7 100644 --- a/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.md +++ b/GhidraDocs/GhidraClass/BSim/BSimTutorial_Scripting.md @@ -4,9 +4,9 @@ Finally, we briefly mention a few other topics related to BSim. ## Scripting BSim -There are are number of example scripts in the ``BSim`` script category, which demonstrate how to interact with BSim programmatically: +There are are number of example scripts in the ``BSim`` script category, which demonstrate how to interact with BSim programmatically. - + ## Visualizing Features @@ -14,10 +14,10 @@ Finally, if you'd like to see the particular BSim features in a function, you ca This plugin allows you to highlight regions of the decompiled code corresponding to a particular feature and to display a graph representing the feature. To use this plugin, first enable the ``BSimFeatureVisualizerPlugin`` via **File -> Configure** from the Code Browser. -You can then bring it via **BSim -> BSim Feature Visualizer**. +You can then bring it up via **BSim -> BSim Feature Visualizer**. - + This is the end of the tutorial. -[Return to the Beginning](README.md) \ No newline at end of file +[Return to the Beginning](README.md) diff --git a/GhidraDocs/GhidraClass/BSim/README.html b/GhidraDocs/GhidraClass/BSim/README.html new file mode 100755 index 0000000000..e6b4c0082f --- /dev/null +++ b/GhidraDocs/GhidraClass/BSim/README.html @@ -0,0 +1,24 @@ +BSim is a Ghidra plugin for finding structurally similar functions in (potentially large) collections of binaries. +It is based on Ghidra’s decompiler and can find matches across compilers, architectures, and/or small changes to source code.
+ +This tutorial demonstrates how create a small BSim database and walks through some typical use cases.
+ +Detailed information about BSim can be found in the “BSim” entry of the Ghidra Help.
+ +Next Section: Introduction to BSim
diff --git a/GhidraDocs/GhidraClass/BSim/images/Plus2.png b/GhidraDocs/GhidraClass/BSim/images/Plus2.png new file mode 100644 index 0000000000..add4ad53dd Binary files /dev/null and b/GhidraDocs/GhidraClass/BSim/images/Plus2.png differ diff --git a/GhidraDocs/GhidraClass/BSim/images/script_manager.png b/GhidraDocs/GhidraClass/BSim/images/script_manager.png index 295704e53b..fac7a09989 100644 Binary files a/GhidraDocs/GhidraClass/BSim/images/script_manager.png and b/GhidraDocs/GhidraClass/BSim/images/script_manager.png differ diff --git a/GhidraDocs/certification.manifest b/GhidraDocs/certification.manifest index bc47c5fc0c..e6168f473a 100644 --- a/GhidraDocs/certification.manifest +++ b/GhidraDocs/certification.manifest @@ -21,18 +21,31 @@ GhidraClass/AdvancedDevelopment/GhidraAdvancedDevelopment.html||GHIDRA|||This fi GhidraClass/AdvancedDevelopment/GhidraAdvancedDevelopment_withNotes.html||Public Domain|||Slight modification of code that is available for distribution, without restrictions, (original extremely permissive wtf license allows us to change IP to Public Domain),from https://github.com/paulrouget/dzslides.|END| GhidraClass/AdvancedDevelopment/Images/GhidraLogo64.png||GHIDRA||||END| GhidraClass/AdvancedDevelopment/Images/highLevelClasses.png||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_BSim_Command_Line.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_BSim_Command_Line.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Basic_Queries.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Basic_Queries.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Creating_Database_From_GUI.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Creating_Database_From_GUI.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Enabling.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Enabling.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Evaluating_Matches.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Evaluating_Matches.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Exe_Results.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Exe_Results.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Filters.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Filters.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Ghidra_Command_Line.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Intro.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Intro.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Overview_Queries.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Overview_Queries.md||GHIDRA||||END| +GhidraClass/BSim/BSimTutorial_Scripting.html||GHIDRA||||END| GhidraClass/BSim/BSimTutorial_Scripting.md||GHIDRA||||END| +GhidraClass/BSim/README.html||GHIDRA||||END| GhidraClass/BSim/README.md||GHIDRA||||END| +GhidraClass/BSim/images/Plus2.png||GHIDRA||||END| GhidraClass/BSim/images/actions.png||GHIDRA||||END| GhidraClass/BSim/images/basic_query.png||GHIDRA||||END| GhidraClass/BSim/images/bsim_search_dialog.png||GHIDRA||||END|