GP-4009 Introduced BSim functionality including support for postgresql,

elasticsearch and h2 databases.  Added BSim correlator to Version
Tracking.
This commit is contained in:
caheckman 2023-11-17 01:13:42 +00:00 committed by ghidra1
parent f0f5b8f2a4
commit 0865a3dfb0
509 changed files with 77125 additions and 934 deletions

View file

@ -0,0 +1,67 @@
<?xml version='1.0' encoding='ISO-8859-1' ?>
<!--
This is an XML file intended to be parsed by the Ghidra help system. It is loosely based
upon the JavaHelp table of contents document format. The Ghidra help system uses a
TOC_Source.xml file to allow a module with help to define how its contents appear in the
Ghidra help viewer's table of contents. The main document (in the Base module)
defines a basic structure for the
Ghidra table of contents system. Other TOC_Source.xml files may use this structure to insert
their files directly into this structure (and optionally define a substructure).
In this document, a tag can be either a <tocdef> or a <tocref>. The former is a definition
of an XML item that may have a link and may contain other <tocdef> and <tocref> children.
<tocdef> items may be referred to in other documents by using a <tocref> tag with the
appropriate id attribute value. Using these two tags allows any module to define a place
in the table of contents system (<tocdef>), which also provides a place for
other TOC_Source.xml files to insert content (<tocref>).
During the help build time, all TOC_Source.xml files will be parsed and validated to ensure
that all <tocref> tags point to valid <tocdef> tags. From these files will be generated
<module name>_TOC.xml files, which are table of contents files written in the format
desired by the JavaHelp system. Additionally, the generated files will be merged together
as they are loaded by the JavaHelp system. In the end, when displaying help in the Ghidra
help GUI, there will be one table of contents that has been created from the definitions in
all of the modules' TOC_Source.xml files.
Tags and Attributes
<tocdef>
-id - the name of the definition (this must be unique across all TOC_Source.xml files)
-text - the display text of the node, as seen in the help GUI
-target** - the file to display when the node is clicked in the GUI
-sortgroup - this is a string that defines where a given node should appear under a given
parent. The string values will be sorted by the JavaHelp system using
a javax.text.RulesBasedCollator. If this attribute is not specified, then
the text of attribute will be used.
<tocref>
-id - The id of the <tocdef> that this reference points to
**The URL for the target is relative and should start with 'help/topics'. This text is
used by the Ghidra help system to provide a universal starting point for all links so that
they can be resolved at runtime, across modules.
-->
<tocroot>
<tocref id="Ghidra Functionality">
<tocref id="Version Tracking">
<tocref id="VTCorrelators">
<tocdef id="BSimCorrelator"
text="BSim Program Correlator"
target="help/topics/BSimCorrelator/BSim_Correlator.html" />
</tocref>
</tocref>
</tocref>
</tocroot>

View file

@ -0,0 +1,99 @@
<!DOCTYPE doctype PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN">
<HTML>
<HEAD>
<META name="generator" content=
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
<TITLE>BSim Program Correlator</TITLE>
<META http-equiv="Content-Type" content="text/html; charset=windows-1252">
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
</HEAD>
<BODY lang="EN-US">
<H1><A name="BSim_Correlator"></A>BSim Program Correlator</H1>
<BLOCKQUOTE>
<P>The BSim <A href="help/topics/VersionTrackingPlugin/VT_Correlators.html">Program
Correlator</A> uses the decompiler to generate confidence scores between potentially matching
functions in the source and destination programs. Function call-graphs are used to further
boost the scores and distinguish between conflicting matches.
.</P>
<P>The decompiler generates a formal feature vector for a function, where individual features
are extracted from the control-flow and data-flow characteristics of its normalized p-code
representation. </P>
<P>Functions are compared by comparing their corresponding feature vectors, from which
similarity and confidence scores are extracted.</P>
<P>A confidence score, for this correlator, is an open-ended floating-point value
(ranging from -infinity to +infinity) describing the amount of correspondence between the
control-flow and data-flow of two functions. A good working range for setting thresholds
(below) and for describing function pairs with some matching features is 0.0 to 100.0.
A score of 0.0 corresponds to functions with roughly equal amounts of similar and dissimilar features.
A score of 10.0 is typical for small identical functions, and 100.0 is achieved by pairs
of larger sized identical functions.</P>
<P>The correlator initially collects high confidence (high scoring) matches as a "seed" set.
Then, using call-graph information, the seed matches are extended to additional matches
throughout the programs.</P>
<P>There are four options for the BSim Program Correlator:</P>
<P><B>Confidence Threshold for a Match</B></P>
<BLOCKQUOTE>
<P>This option sets the threshold for accepting
a new match by following the call-graph from a previously accepted pair of matching functions.
Because potential pairs are drawn from the local call-graph neighborhood of an
accepted pair, this threshold is typically set lower than the seed threshold.</P>
</BLOCKQUOTE>
<P><B>Confidence Threshold for a Seed</B></P>
<BLOCKQUOTE>
<P>This establishes the threshold for choosing
potential matches as part of the initial "seed" set. Be careful setting this threshold
lower than the default, as any false match in the initial seed set is more likely to propagate.</P>
</BLOCKQUOTE>
<P><B>Memory Model</B></P>
<BLOCKQUOTE>
<P>The memory model option selects how much memory to use for finding
matches. If you run out of memory correlating large programs, lower this choice to "Medium"
or "Small"...note however that correlation may be slightly less accurate.</P>
</BLOCKQUOTE>
<P><B>Use Accepted Matches as Seeds</B></P>
<BLOCKQUOTE>
<P>This option indicates whether to include
previously accepted matches, typically from other correlators, into the initial "seed" set.
The BSim Program Correlator will still try to find additional seed matches to merge
with the already accepted matches. If you want to only use the incoming accepted
matches, set the Confidence Threshold for a Seed extremely high (like 99999999 or
so). Be careful to accept only high confidence matches prior to using this option, as
any errors in the initial seed set are more likely to propagate.</P>
</BLOCKQUOTE>
</BLOCKQUOTE><!-- Main content blockquote -->
<P class="relatedtopic">Related Topics:</P>
<UL>
<LI><A href="help/topics/VersionTrackingPlugin/VT_Correlators.html">Version Tracking Program
Correlators</A></LI>
<LI><A href="help/topics/VersionTrackingPlugin/VT_Wizard.html">Version Tracking
Wizard</A></LI>
<LI><A href="help/topics/VersionTrackingPlugin/VT_Tool.html">Version Tracking Tool</A></LI>
<LI><A href="help/topics/VersionTrackingPlugin/Version_Tracking_Intro.html">Version Tracking
Introduction</A></LI>
</UL><BR>
<BR>
</BODY>
</HTML>

View file

@ -0,0 +1,350 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import generic.cache.CachingPool;
import generic.cache.CountingBasicFactory;
import generic.concurrent.QCallback;
import generic.jar.ResourceFile;
import generic.lsh.LSHMemoryModel;
import generic.lsh.vector.*;
import ghidra.app.decompiler.*;
import ghidra.app.decompiler.parallel.ParallelDecompiler;
import ghidra.app.decompiler.signature.SignatureResult;
import ghidra.feature.vt.api.main.VTMatchInfo;
import ghidra.feature.vt.api.main.VTMatchSet;
import ghidra.feature.vt.api.util.VTAbstractProgramCorrelator;
import ghidra.feature.vt.api.util.VTFunctionSizeUtil;
import ghidra.features.bsim.query.GenSignatures;
import ghidra.framework.options.ToolOptions;
import ghidra.program.model.address.*;
import ghidra.program.model.lang.CompilerSpec.EvaluationModelType;
import ghidra.program.model.lang.LanguageID;
import ghidra.program.model.lang.PrototypeModel;
import ghidra.program.model.listing.*;
import ghidra.program.model.symbol.Reference;
import ghidra.program.model.symbol.ReferenceManager;
import ghidra.util.Msg;
import ghidra.util.exception.CancelledException;
import ghidra.util.task.TaskMonitor;
import ghidra.util.xml.SpecXmlUtils;
import ghidra.xml.NonThreadedXmlPullParserImpl;
import ghidra.xml.XmlPullParser;
/**
* Correlator which discovers functional matches by comparing data-flow feature vectors.
* An initial seed set of high confidence matches are chosen. The match set is extended
* from the seeds by using local neighborhoods around the accepted match to efficiently
* discover new matches.
*/
public class BSimProgramCorrelator extends VTAbstractProgramCorrelator {
private LSHVectorFactory vectorFactory;
private static final int TIMEOUT = 60;
public static final double SIMILARITY_THRESHOLD = 0.5;
// note that the utils function strips out thunks now so we just set
// minimum size to 0 assuming call graph will save us
public static final int FUNCTION_MINIMUM_SIZE = 0;
protected BSimProgramCorrelator(Program sourceProgram, AddressSetView sourceAddressSet,
Program destinationProgram, AddressSetView destinationAddressSet, ToolOptions options) {
super(sourceProgram, sourceAddressSet, destinationProgram, destinationAddressSet, options);
vectorFactory = new WeightedLSHCosineVectorFactory();
}
@Override
public String getName() {
return BSimProgramCorrelatorFactory.NAME;
}
@Override
protected void doCorrelate(VTMatchSet matchSet, TaskMonitor monitor) throws CancelledException {
ToolOptions options = getOptions();
LSHMemoryModel model = options.getEnum(BSimProgramCorrelatorFactory.MEMORY_MODEL,
BSimProgramCorrelatorFactory.MEMORY_MODEL_DEFAULT);
double confThreshold = options.getDouble(BSimProgramCorrelatorFactory.SEED_CONF_THRESHOLD,
BSimProgramCorrelatorFactory.SEED_CONF_THRESHOLD_DEFAULT);
double impThreshold = options.getDouble(BSimProgramCorrelatorFactory.IMPLICATION_THRESHOLD,
BSimProgramCorrelatorFactory.IMPLICATION_THRESHOLD_DEFAULT);
boolean useAcceptedMatchesAsSeeds =
options.getBoolean(BSimProgramCorrelatorFactory.USE_ACCEPTED_MATCHES_AS_SEEDS,
BSimProgramCorrelatorFactory.USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT);
boolean useNamespace = false; // By default we don't have namespace info
boolean useCallRefs = false; // By default we use decompiler to generate callgraph
List<FunctionPair> result;
try {
LanguageID id1 = getSourceProgram().getLanguageID();
LanguageID id2 = getDestinationProgram().getLanguageID();
//Use special weights for LSHCosineVectors
ResourceFile defaultWeightsFile = GenSignatures.getWeightsFile(id1, id2);
if (defaultWeightsFile == null) {
// known limitation; hoped to be fixed in the future
Msg.showWarn(this, null, "Cannot Compare Programs",
"<html>Cannot currently compare programs with such different architectures.<br>" +
"Source program is " + id1.getIdAsString() + "<br>" +
"Destination program is " + id2.getIdAsString());
return;
}
if (defaultWeightsFile.getName().contains("cpool")) {
// With constant pool languages (Dalvik, JVM)
useNamespace = true; // We have reliable namespace info
useCallRefs = true; // We don't have absolute calls, use references
}
InputStream input = defaultWeightsFile.getInputStream();
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
SpecXmlUtils.getXmlHandler(), false);
vectorFactory.readWeights(parser);
input.close();
monitor.setMessage("Generating source dictionary");
List<FunctionNode> rawSourceNodes =
generateNodes(getSourceProgram(), getSourceAddressSet(), useCallRefs, monitor);
FunctionNodeContainer sourceNodes =
new FunctionNodeContainer(getSourceProgram(), rawSourceNodes);
monitor.setMessage("Generating destination dictionary");
List<FunctionNode> rawDestNodes = generateNodes(getDestinationProgram(),
getDestinationAddressSet(), useCallRefs, monitor);
FunctionNodeContainer destNodes =
new FunctionNodeContainer(getDestinationProgram(), rawDestNodes);
BSimProgramCorrelatorMatching omni =
new BSimProgramCorrelatorMatching(sourceNodes, destNodes, vectorFactory,
confThreshold, impThreshold, SIMILARITY_THRESHOLD, useNamespace, model);
omni.discoverPotentialMatches(monitor);
if (!omni.generateSeeds(matchSet, useAcceptedMatchesAsSeeds, monitor)) {
Msg.info(this, "BSim Program Correlator could not find any seeds");
}
result = omni.doMatching(monitor); //Do the matching!
}
catch (InterruptedException e) {
Msg.error(this, "Error Correlating", e.getCause());
CancelledException cancelledException = new CancelledException();
cancelledException.initCause(e);
throw cancelledException;
}
catch (CancelledException ce) {
throw ce;
}
catch (Exception e) {
Msg.error(this, "Error Correlating", e.getCause());
CancelledException cancelledException = new CancelledException();
cancelledException.initCause(e);
throw cancelledException;
}
wrapUp(result, matchSet, monitor); // Display matches, print stuff, etc.
return;
}
private static void addExternalFunctions(Program program, List<FunctionNode> list,
LSHVectorFactory vFactory, TaskMonitor monitor) throws CancelledException {
FunctionIterator iter = program.getFunctionManager().getExternalFunctions();
// Create a generic feature vector to represent external functions
int[] externalFeatures = new int[1];
externalFeatures[0] = 0xfade5eed;
LSHVector externalVector = vFactory.buildVector(externalFeatures);
while (iter.hasNext()) {
monitor.checkCancelled();
Function func = iter.next();
FunctionNode node = new FunctionNode(func, externalVector, new ArrayList<Address>());
list.add(node);
}
}
private List<FunctionNode> generateNodes(final Program program, AddressSetView addrSet,
boolean useCallRefs, final TaskMonitor monitor)
throws InterruptedException, CancelledException, Exception {
monitor.checkCancelled();
CachingPool<DecompInterface> decompilerPool = new CachingPool<DecompInterface>(
new DecompilerFactory(program, vectorFactory.getSettings()));
ParallelDecompilerCallback callback =
new ParallelDecompilerCallback(decompilerPool, vectorFactory, useCallRefs);
List<FunctionNode> results = null;
try {
AddressSetView refinedAddressSet = VTFunctionSizeUtil.minimumSizeFunctionFilter(program,
addrSet, FUNCTION_MINIMUM_SIZE, monitor);
results = ParallelDecompiler.decompileFunctions(callback, program, refinedAddressSet,
monitor);
}
finally {
decompilerPool.dispose();
}
addExternalFunctions(program, results, vectorFactory, monitor);
monitor.setMessage("Collecting dictionary results");
return results;
}
private static void wrapUp(List<FunctionPair> result, final VTMatchSet matchSet,
final TaskMonitor monitor) throws CancelledException {
//Populate the table with matches.
monitor.setMessage("Adding results to database");
monitor.setIndeterminate(false);
monitor.initialize(result.size());
int ii = 0;
for (FunctionPair resMatch : result) {
VTMatchInfo match = resMatch.getMatch(matchSet);
++ii;
if (ii % 1000 == 0) {
monitor.checkCancelled();
monitor.incrementProgress(1000);
}
matchSet.addMatch(match);
}
return;
}
/**
* Establish decompiler options for the feature vector calculation
* @param program is the specific program to decompile
* @return the formal options object
*/
private static DecompileOptions getDecompilerOptions(Program program) {
DecompileOptions options = new DecompileOptions();
options.setNoCastPrint(true);
try {
final PrototypeModel model = program.getCompilerSpec()
.getPrototypeEvaluationModel(EvaluationModelType.EVAL_CURRENT);
options.setProtoEvalModel(model.getName());
}
catch (Exception e) {
Msg.warn(BSimProgramCorrelator.class,
"problem setting prototype evaluation model: " + e.getMessage());
}
options.setDefaultTimeout(TIMEOUT);
return options;
}
//==================================================================================================
// Inner Classes
//==================================================================================================
private static class DecompilerFactory extends CountingBasicFactory<DecompInterface> {
private Program program;
private int settings;
DecompilerFactory(Program program, int set) {
this.program = program;
settings = set;
}
@Override
public DecompInterface doCreate(int itemNumber) throws IOException {
DecompInterface decompiler = new DecompInterface();
decompiler.setOptions(getDecompilerOptions(program));
decompiler.setSignatureSettings(settings);
if (!decompiler.openProgram(program)) {
throw new IOException(decompiler.getLastMessage());
}
return decompiler;
}
@Override
public void doDispose(DecompInterface decompiler) {
decompiler.dispose();
}
}
private static class ParallelDecompilerCallback implements QCallback<Function, FunctionNode> {
private LSHVectorFactory vectorFactory;
private CachingPool<DecompInterface> pool;
private boolean callsByReference;
ParallelDecompilerCallback(CachingPool<DecompInterface> decompilerPool,
LSHVectorFactory vFactory, boolean refCalls) {
vectorFactory = vFactory;
this.pool = decompilerPool;
callsByReference = refCalls;
}
private ArrayList<Address> getCallAddressesByReference(Function function,
TaskMonitor monitor) throws CancelledException {
ArrayList<Address> resultList = new ArrayList<Address>();
Program program = function.getProgram();
ReferenceManager referenceManager = program.getReferenceManager();
AddressSetView addresses = function.getBody();
AddressIterator addressIterator = addresses.getAddresses(true);
while (addressIterator.hasNext()) {
monitor.checkCancelled();
Address address = addressIterator.next();
Reference[] referencesFrom = referenceManager.getReferencesFrom(address);
if (referencesFrom != null) {
for (Reference reference : referencesFrom) {
if (reference.getReferenceType().isCall()) {
resultList.add(reference.getToAddress());
}
}
}
}
return resultList;
}
@Override
public FunctionNode process(Function function, TaskMonitor monitor) throws Exception {
monitor.checkCancelled();
DecompInterface decompiler = pool.get();
try {
LSHVector vec = null;
ArrayList<Address> callAddresses = null;
SignatureResult sigres =
decompiler.generateSignatures(function, !callsByReference, TIMEOUT, monitor);
if (sigres == null) {
callAddresses = new ArrayList<Address>();
}
else {
vec = vectorFactory.buildVector(sigres.features);
if (callsByReference) {
callAddresses = getCallAddressesByReference(function, monitor);
}
else {
callAddresses = sigres.calllist; //It will take a second pass through the data to figure out how the call graph fits together.
}
}
FunctionNode res = new FunctionNode(function, vec, callAddresses);
if (res.getVector() == null) {
String errmsg = decompiler.getLastMessage();
if (errmsg.startsWith("Bad command")) {
throw new DecompileException(BSimProgramCorrelatorFactory.NAME, errmsg);
}
}
return res;
}
finally {
pool.release(decompiler);
}
}
}
}

View file

@ -0,0 +1,105 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import generic.lsh.LSHMemoryModel;
import ghidra.feature.vt.api.main.VTProgramCorrelator;
import ghidra.feature.vt.api.main.VTProgramCorrelatorAddressRestrictionPreference;
import ghidra.feature.vt.api.util.VTAbstractProgramCorrelatorFactory;
import ghidra.feature.vt.api.util.VTOptions;
import ghidra.program.model.address.AddressSetView;
import ghidra.program.model.listing.Program;
import ghidra.util.HelpLocation;
public class BSimProgramCorrelatorFactory extends VTAbstractProgramCorrelatorFactory {
public static final String NAME = "BSim Function Matching";
public static final String DESC =
"Finds function matches by using data flow and call graph similarities between the " +
"source and destination programs.";
public static final String MEMORY_MODEL = "Memory Model";
public static final LSHMemoryModel MEMORY_MODEL_DEFAULT = LSHMemoryModel.LARGE;
public static final String MEMORY_MODEL_DESC =
"Amount of memory used to compute matches. Smaller models are slightly less accurate.";
public static final String SEED_CONF_THRESHOLD = "Confidence Threshold for a Seed";
public static final double SEED_CONF_THRESHOLD_DEFAULT = 10.0;
public static final String SEED_CONF_THRESHOLD_DESC =
"For threshold N, the probability that a seed is incorrect is approximately 1/2^(N/5+9).";
public static final String IMPLICATION_THRESHOLD = "Confidence Threshold for a Match";
public static final double IMPLICATION_THRESHOLD_DEFAULT = 0.0;
public static final String IMPLICATION_THRESHOLD_DESC =
"For threshold N, the probability that a match is incorrect is approximately 1/2^(N/5+9).";
public static final String USE_ACCEPTED_MATCHES_AS_SEEDS = "Use Accepted Matches as Seeds";
public static final boolean USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT = true;
public static final String USE_ACCEPTED_MATCHES_AS_SEEDS_DESC =
"Already accepted matches will also be used as seeds.";
@Override
public int getPriority() {
return 50;
}
@Override
protected VTProgramCorrelator doCreateCorrelator(Program sourceProgram,
AddressSetView sourceAddressSet, Program destinationProgram,
AddressSetView destinationAddressSet, VTOptions options) {
return new BSimProgramCorrelator(sourceProgram, sourceAddressSet, destinationProgram,
destinationAddressSet, options);
}
@Override
public VTProgramCorrelatorAddressRestrictionPreference getAddressRestrictionPreference() {
return VTProgramCorrelatorAddressRestrictionPreference.RESTRICTION_NOT_ALLOWED;
}
@Override
public VTOptions createDefaultOptions() {
VTOptions options = new VTOptions(NAME);
HelpLocation help = new HelpLocation("BSimCorrelator", "BSim_Correlator");
options.setEnum(MEMORY_MODEL, MEMORY_MODEL_DEFAULT);
options.registerOption(MEMORY_MODEL, MEMORY_MODEL_DEFAULT, help, MEMORY_MODEL_DESC);
options.setDouble(SEED_CONF_THRESHOLD, SEED_CONF_THRESHOLD_DEFAULT);
options.registerOption(SEED_CONF_THRESHOLD, SEED_CONF_THRESHOLD_DEFAULT, help,
SEED_CONF_THRESHOLD_DESC);
options.setDouble(IMPLICATION_THRESHOLD, IMPLICATION_THRESHOLD_DEFAULT);
options.registerOption(IMPLICATION_THRESHOLD, IMPLICATION_THRESHOLD_DEFAULT, help,
IMPLICATION_THRESHOLD_DESC);
options.setBoolean(USE_ACCEPTED_MATCHES_AS_SEEDS, USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT);
options.registerOption(USE_ACCEPTED_MATCHES_AS_SEEDS, USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT,
help, USE_ACCEPTED_MATCHES_AS_SEEDS_DESC);
options.setOptionsHelpLocation(help);
return options;
}
@Override
public String getDescription() {
return DESC;
}
@Override
public String getName() {
return NAME;
}
}

View file

@ -0,0 +1,787 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.util.*;
import java.util.Map.Entry;
import org.apache.commons.collections4.MultiValuedMap;
import org.apache.commons.collections4.multimap.HashSetValuedHashMap;
import generic.concurrent.*;
import generic.lsh.LSHMemoryModel;
import generic.lsh.vector.LSHVectorFactory;
import generic.lsh.vector.VectorCompare;
import ghidra.feature.vt.api.NeighborGenerator.NeighborhoodPair;
import ghidra.feature.vt.api.main.*;
import ghidra.program.model.address.Address;
import ghidra.program.model.listing.Function;
import ghidra.program.model.listing.Program;
import ghidra.util.Msg;
import ghidra.util.exception.CancelledException;
import ghidra.util.task.TaskMonitor;
/**
* Class for running the BSim function matching algorithm, which happens in stages:
* 1) Construct BSimProgramCorrelatorMatching with prepopulated FunctionNodeContainers, one for source and destination programs
* 2) Call discoverPotentialMatches to do raw vector comparisons among source and destination
* 3) Call generateSeeds to select an initial set of high confidence matches
* 4) Call doMatching to extend the seed set into a full list of matches
*/
public class BSimProgramCorrelatorMatching {
private SortedSet<PotentialPair> implications; // Current potential matches sorted by score
private FunctionNodeContainer sourceNodes; // Nodes (functions) associated with the source program
private FunctionNodeContainer destNodes; // Nodes associated with the destination program
private LSHVectorFactory vectorFactory; // Factory for generating weighted vectors for comparing nodes
private LinkedList<FunctionPair> matches; // The list of final matches
private Set<FunctionPair> seeds; // Initial set of match pairs used for growing out full set of matches
private List<FunctionPair> discoveredMatches; // Raw of set of pairs of similar functions
private double confThreshold; // Initial confidence threshold for selecting seed matches
private double impThreshold; // Confidence threshold for extending to additional matches
private double potentialSimThreshold; // Similarity threshold used when discovering potential matches
private LSHMemoryModel memoryModel; // The memory model to use when binning vectors
private boolean useNamespaceNeighbors; // True if namespace information is used in matching
/**
* This class is used to lookup potential matches in the {@link BinningSystem} and do
* secondary testing by computing similarities of feature vectors.
* Searching happens in parallel.
*/
private class MatchingCallback implements QCallback<FunctionNode, List<FunctionPair>> {
private BinningSystem sourceBinning;
private double simThreshold;
MatchingCallback(BinningSystem sourceBinning, double simThreshold) {
this.sourceBinning = sourceBinning;
this.simThreshold = simThreshold;
}
@Override
public List<FunctionPair> process(FunctionNode queryNode, TaskMonitor monitor)
throws Exception {
monitor.checkCancelled();
if ((queryNode == null) || (queryNode.getVector() == null)) {
monitor.incrementProgress(1);
return null;
}
List<FunctionPair> associates = new LinkedList<FunctionPair>();
findSimilarNodes(associates, queryNode, monitor);
monitor.incrementProgress(1);
return associates;
}
/**
* Lookup potential matches for -queryNode- in the binning system,
* and perform secondary testing to see if we have a full (potential) match.
* Pairs that exceed the threshold are added to the -results- list
* @param results is the list of FunctionPairs passing the similarity test
* @param queryNode is the base FunctionNode to compare
* @param monitor is the TaskMonitor
* @throws CancelledException if the user cancels the correlation
*/
private void findSimilarNodes(List<FunctionPair> results, FunctionNode queryNode,
TaskMonitor monitor) throws CancelledException {
//Set up for matching via feature vector comparison.
Set<FunctionNode> neighbors = sourceBinning.lookup(queryNode);
VectorCompare veccompare = new VectorCompare();
//Check each neighbor from the system of binnings to see if they pass a round of matching.
for (FunctionNode neighbor : neighbors) {
monitor.checkCancelled();
//Feature vector computations
double similarity = neighbor.getVector().compare(queryNode.getVector(), veccompare);
if (similarity < simThreshold) {
continue;
}
double confidence = vectorFactory.calculateSignificance(veccompare);
//Create FunctionPair (bridge in the graph from source to dest)
FunctionPair newPair =
new FunctionPair(neighbor, queryNode, similarity, confidence);
results.add(newPair);
}
}
}
/**
* @param sourceNodes is the container for source functions
* @param destNodes is the container for destination functions
* @param vFactory is the factory for building feature vectors during analysis
* @param conf is the initial confidence threshold for seeds
* @param imp is the follow-on confidence for extending to additional matches
* @param sim is the similarity threshold used when discovering matches
* @param useNamespace true if namespace info is used to find additional matches
* @param model is the memory model to use when discovering seed matches
*/
public BSimProgramCorrelatorMatching(FunctionNodeContainer sourceNodes,
FunctionNodeContainer destNodes, LSHVectorFactory vFactory, double conf, double imp,
double sim, boolean useNamespace, LSHMemoryModel model) {
this.sourceNodes = sourceNodes;
this.destNodes = destNodes;
this.vectorFactory = vFactory;
confThreshold = conf;
impThreshold = imp;
potentialSimThreshold = sim;
useNamespaceNeighbors = useNamespace;
memoryModel = model;
implications = new TreeSet<PotentialPair>();
}
/**
* Formally accept a FunctionPair as a match. Update bookkeeping to indicate the match.
* @param bridge is the pair to accept as a match
*/
private void acceptMatch(FunctionPair bridge) {
FunctionNode sourceNode = bridge.getSourceNode();
FunctionNode destNode = bridge.getDestNode();
sourceNode.setAcceptedMatch(true);
destNode.setAcceptedMatch(true);
matches.add(bridge);
// Given the pair, remove the source and destination as a potential matches from any other node.
Iterator<Entry<FunctionNode, FunctionPair>> iter = sourceNode.getAssociateIterator();
while (iter.hasNext()) {
iter.next().getKey().removeAssociate(sourceNode);
}
iter = destNode.getAssociateIterator();
while (iter.hasNext()) {
iter.next().getKey().removeAssociate(destNode);
}
sourceNode.clearAssociates(); // Clear old potential matches
destNode.clearAssociates();
}
/**
* Do vector comparisons between the source and destination FunctionNodes.
* Anything discovered that exceeds {@link #potentialSimThreshold} is placed into {@link #discoveredMatches}
* A {@link BinningSystem} is built, then individual FunctionNodes are searched in parallel.
* @param monitor is the TaskMonitor
* @throws Exception for user cancellation or other problems
*/
public void discoverPotentialMatches(TaskMonitor monitor) throws Exception {
BinningSystem binning = new BinningSystem(memoryModel);
monitor.setMessage("Binning source functions...");
monitor.initialize(sourceNodes.size());
binning.add(sourceNodes.iterator(), monitor);
monitor.setMessage("Zealously over-pairing matches...");
monitor.initialize(destNodes.size());
//
// Queue setup
//
GThreadPool pool = GThreadPool.getPrivateThreadPool("BSimProgramCorrelatorMatching");
QCallback<FunctionNode, List<FunctionPair>> callback =
new MatchingCallback(binning, potentialSimThreshold);
// @formatter:off
ConcurrentQ<FunctionNode, List<FunctionPair>> queue =
new ConcurrentQBuilder<FunctionNode, List<FunctionPair>>()
.setThreadPool(pool)
.setCollectResults(true)
.setMonitor(monitor)
.build(callback);
// @formatter:on
//
// Submit and wait for results
//
queue.add(destNodes.iterator());
Collection<QResult<FunctionNode, List<FunctionPair>>> results;
try {
results = queue.waitForResults();
}
finally {
queue.dispose();
}
discoveredMatches = new LinkedList<FunctionPair>();
for (QResult<FunctionNode, List<FunctionPair>> result : results) {
monitor.checkCancelled();
List<FunctionPair> pieces = result.getResult();
if (pieces == null) {
continue;
}
for (FunctionPair bridge : pieces) {
monitor.checkCancelled();
if (bridge != null) {
FunctionNode sourceNode = bridge.getSourceNode();
FunctionNode destNode = bridge.getDestNode();
sourceNode.addAssociate(destNode, bridge);
destNode.addAssociate(sourceNode, bridge);
discoveredMatches.add(bridge);
}
}
}
}
/**
* Find the last index in the (sorted) list where the confidence is >= threshold
* @param pairs is the sorted list
* @param threshold to find
* @return the index
*/
private static int findIndexMatchingThreshold(ArrayList<FunctionPair> pairs, double threshold) {
int min = 0;
int max = pairs.size() - 1;
while (min < max) {
int mid = (min + max + 1) / 2; // Guarantee if min != max, then mid != min
FunctionPair pair = pairs.get(mid);
if (pair.getConfResult() < threshold) {
max = mid - 1;
}
else {
min = mid;
}
}
return min;
}
/**
* Choose seed FunctionNode pairs with the highest confidence from among {@link #discoveredMatches}
* making sure there are no conflicts, (a FunctionNode that is involved in multiple matches).
* Selection happens in rounds. During a round:
* a) "Accept" all pairs for which there is no immediate conflict
* b) If a pair has conflicts, throw it out if either:
* 1) The number of children is different between source and dest (difference > threshold)
* 2) The function length is different between source and dest (difference > threshold)
*
* Between rounds the "accepted" pairs and the "thrown out" pairs may remove conflicts from the
* remaining pairs. Each round the threshold for throwing out a conflict is tightened.
*
* The process terminates when no new pairs are accepted during a round.
* The accepted pairs are sorted by confidence, and those exceeding {@link #confThreshold} become
* the final seed set.
* @param monitor is the TaskMonitor
* @throws CancelledException if the user cancels the correlation
*/
private void chooseSeeds(TaskMonitor monitor) throws CancelledException {
monitor.setMessage("Generating seeds...");
ArrayList<FunctionPair> finalPairs = new ArrayList<FunctionPair>();
HashSet<FunctionNode> matchedSource = new HashSet<FunctionNode>(); // Source functions that are matched
HashSet<FunctionNode> matchedDest = new HashSet<FunctionNode>(); // Dest functions that are matched
MultiValuedMap<FunctionNode, FunctionPair> sourceHoldOn =
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Conflicting source functions held for next round
MultiValuedMap<FunctionNode, FunctionPair> destHoldOn =
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Conflicting dest functions held for next round
MultiValuedMap<FunctionNode, FunctionPair> sourceFormatted =
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Current set of potential pairs, indexed by source
MultiValuedMap<FunctionNode, FunctionPair> destFormatted =
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Current set of potential pairs, indexed by dest
for (FunctionPair pair : discoveredMatches) { // Copy putative matches into the "current" set of potential pairs
sourceFormatted.put(pair.getSourceNode(), pair);
destFormatted.put(pair.getDestNode(), pair);
}
discoveredMatches = null; // The raw match list is no longer needed beyond this point
int keepLen = sourceFormatted.size();
if (keepLen == 0) {
return;
}
boolean changed = true;
double ratioThresh = .5; // Initial threshold for throwing out pairs. Counts can differ by a factor of 2 to 1.
while (changed) { // Keep going until no change (no new pairs)
monitor.checkCancelled();
final Collection<FunctionPair> values = sourceFormatted.values();
monitor.initialize(values.size());
for (FunctionPair entry : values) {
monitor.checkCancelled();
monitor.incrementProgress(1);
if (!hasConflicts(entry, sourceFormatted, destFormatted)) { // Check for conflicts in our current set
finalPairs.add(entry); // Accept immediately if no conflicts
matchedSource.add(entry.getSourceNode());
matchedDest.add(entry.getDestNode());
}
else {
if (!matchedSource.contains(entry.getSourceNode()) &&
!matchedDest.contains(entry.getDestNode())) {
// If there is a conflict, but neither side has been matched yet,
// decide if we throw out pair by comparing count ratios to ratioThresh
// Compute "number of children" ratio
double leftside =
Math.min((double) entry.getSourceNode().getChildren().size(),
(double) entry.getDestNode().getChildren().size());
double rightside =
Math.max((double) entry.getSourceNode().getChildren().size(),
(double) entry.getDestNode().getChildren().size());
double childRatio = (rightside == 0 ? 0 : leftside / rightside); // Always <= 1.0
// Compute byte length ratio
leftside = (double) entry.getSourceNode().getLen() /
(double) entry.getDestNode().getLen();
double lenRatio = Math.min(leftside, 1 / leftside); // Always <= 1.0
if (lenRatio > ratioThresh && childRatio > ratioThresh) { // Test both ratios against threshold
// Keep (don't throw out) if both ratios exceed threshold
sourceHoldOn.put(entry.getSourceNode(), entry);
destHoldOn.put(entry.getDestNode(), entry);
}
}
}
}
sourceFormatted = sourceHoldOn; // Update our "current" set of sources
destFormatted = destHoldOn; // Update our "current" set of dests
changed = (keepLen != values.size()); // Did we get any new pairs this round?
keepLen = sourceHoldOn.values().size();
sourceHoldOn = new HashSetValuedHashMap<FunctionNode, FunctionPair>();
destHoldOn = new HashSetValuedHashMap<FunctionNode, FunctionPair>();
ratioThresh = (2 + ratioThresh) / 3; // Tighten the ratio threshold for next round
// Move closer to 1.0 threshold (counts are exactly equal)
}
if (finalPairs.isEmpty()) {
return; // found no seeds
}
Collections.sort(finalPairs, CONF_COMPARATOR);
double curConf = finalPairs.get(0).getConfResult();
if (curConf < confThreshold) {
Msg.warn(this, "Initial value of seed confidence too high (" + confThreshold +
")...resetting seed confidence to " + curConf);
confThreshold = curConf;
}
int lastIndex = findIndexMatchingThreshold(finalPairs, confThreshold); // Last index that still meets threshold
for (int i = 0; i < lastIndex + 1; ++i) {
FunctionPair pair = finalPairs.get(i);
seeds.add(pair);
}
}
private static boolean hasConflicts(FunctionPair entry,
MultiValuedMap<FunctionNode, FunctionPair> sourceFormatted,
MultiValuedMap<FunctionNode, FunctionPair> destFormatted) {
Collection<FunctionPair> sources = sourceFormatted.get(entry.getSourceNode());
if (sources != null && sources.size() > 1) {
return true;
}
Collection<FunctionPair> dests = destFormatted.get(entry.getDestNode());
if (dests != null && dests.size() > 1) {
return true;
}
return false;
}
/**
* Generate seed matches, placing the FunctionPair into the {@link #seeds} container.
* Seeds come from a) previously accepted matches and b) the {@link #discoveredMatches}
* @param matchSet is used to identify already accepted matches
* @param useAcceptedMatchesAsSeeds is true if previously accepted matches are considered seeds
* @param monitor is the TaskMonitor
* @return true if at least one seed was identified
* @throws CancelledException if the user cancels the correlation
*/
public boolean generateSeeds(VTMatchSet matchSet, boolean useAcceptedMatchesAsSeeds,
TaskMonitor monitor) throws CancelledException {
seeds = new HashSet<FunctionPair>();
if (useAcceptedMatchesAsSeeds) {
findAcceptedSeeds(matchSet, monitor);
}
chooseSeeds(monitor);
return !seeds.isEmpty();
}
/**
* Establish what neighborhood generation strategy will be used
* @param round - which round to build a strategy for
* @return an array of NeighborGenerators
*/
private NeighborGenerator[] buildNeighborGenerators(int round) {
ArrayList<NeighborGenerator> generatorList = new ArrayList<NeighborGenerator>();
if (round == 0) {
// For first round only collect new matches from "close" relationships (i.e. parent/child)
// of the seed match.
generatorList.add(new NeighborGenerator.Children(vectorFactory, impThreshold));
generatorList.add(new NeighborGenerator.Parents(vectorFactory, impThreshold));
// If the format includes explicit namespace information for functions,
// use it when generating new matches.
if (useNamespaceNeighbors) {
generatorList.add(
new NamespaceNeighborhood(vectorFactory, impThreshold, sourceNodes, destNodes));
}
}
else {
// For later rounds, also collect matches from more distant relationships (grandparent, grandchild, etc.)
generatorList.add(new NeighborGenerator.Children(vectorFactory, impThreshold));
generatorList.add(new NeighborGenerator.Parents(vectorFactory, impThreshold));
generatorList.add(new NeighborGenerator.GrandChildren(vectorFactory, impThreshold));
generatorList.add(new NeighborGenerator.Siblings(vectorFactory, impThreshold));
generatorList.add(new NeighborGenerator.Spouses(vectorFactory, impThreshold));
generatorList.add(new NeighborGenerator.GrandParents(vectorFactory, impThreshold));
if (useNamespaceNeighbors) {
generatorList.add(
new NamespaceNeighborhood(vectorFactory, impThreshold, sourceNodes, destNodes));
}
}
NeighborGenerator[] res = new NeighborGenerator[generatorList.size()];
generatorList.toArray(res);
return res;
}
/**
* Given a set of -seeds- iteratively extend the set of matches
* Loop greedily picking the best relative match, maintaining score sorts and other bookkeeping
* @param monitor is the TaskMonitor
* @return the final list of FunctionPairs as official matches
* @throws CancelledException if the user cancels the correlation
*/
public List<FunctionPair> doMatching(TaskMonitor monitor) throws CancelledException {
matches = new LinkedList<FunctionPair>();
for (int round = 0; round < 2; round++) {
monitor.checkCancelled();
NeighborGenerator[] generatorList = buildNeighborGenerators(round);
if (round == 0) {
monitor.setMessage("Matching round 1...");
monitor.initialize(seeds.size());
for (FunctionPair bridge : seeds) {
monitor.checkCancelled();
monitor.incrementProgress(1);
acceptMatch(bridge);
PotentialPair impliedPair = analyze(bridge, generatorList);
if (impliedPair != null) {
implications.add(impliedPair);
}
}
seeds = null; // seeds are no longer needed, free up memory
}
else {
implications.clear();
monitor.setMessage("Matching round 2...");
monitor.initialize(matches.size());
for (FunctionPair bridge : matches) {
monitor.checkCancelled();
monitor.incrementProgress(1);
PotentialPair impliedPair = analyze(bridge, generatorList);
if (impliedPair != null) {
implications.add(impliedPair);
}
}
}
monitor.setMessage("Gathering matches for round " + (round + 1) + "...");
int maxSize = implications.size();
monitor.initialize(maxSize + 1);
while (true) {
monitor.checkCancelled();
int size = implications.size();
if (size > maxSize) {
maxSize = size;
monitor.setMaximum(maxSize + 1);
}
monitor.setProgress((maxSize - size) + 1);
if (size == 0) {
break;
}
PotentialPair bestImplied = implications.last();
implications.remove(bestImplied);
FunctionPair bridge =
bestImplied.getSource().findEdge(bestImplied.getDestination());
if (bridge != null) {
acceptMatch(bridge);
PotentialPair impliedPair = analyze(bridge, generatorList);
if (impliedPair != null) {
implications.add(impliedPair);
}
}
// Let pair that produced this new match select a new PotentialPair
PotentialPair impliedPair = analyze(bestImplied.getOrigin(), generatorList);
if (impliedPair != null) {
implications.add(impliedPair);
}
if (implications.isEmpty() || implications.last().getScore() < impThreshold) {
break;
}
}
}
//Hole Patching
LinkedList<FunctionPair> matchCopy = new LinkedList<FunctionPair>(matches);
VectorCompare veccompare = new VectorCompare();
monitor.setMessage("Patching holes...");
monitor.initialize(matches.size());
for (FunctionPair bridge : matchCopy) {
monitor.checkCancelled();
monitor.incrementProgress(1);
if (bridge.getSourceNode().getParents().size() == 1 &&
bridge.getDestNode().getParents().size() == 1) {
FunctionNode sp = bridge.getSourceNode().getParents().iterator().next();
FunctionNode dp = bridge.getDestNode().getParents().iterator().next();
if (sp.findEdge(dp) == null && !sp.isAcceptedMatch() && !dp.isAcceptedMatch()) {
double similarity = sp.getVector().compare(dp.getVector(), veccompare);
double confidence = vectorFactory.calculateSignificance(veccompare);
FunctionPair rentBridge = new FunctionPair(sp, dp, similarity, confidence);
acceptMatch(rentBridge);
}
}
}
return matches;
}
//Compare pairs by confidence.
private static final Comparator<FunctionPair> CONF_COMPARATOR = new Comparator<FunctionPair>() {
@Override
public int compare(FunctionPair o1, FunctionPair o2) {
return Double.compare(o2.getConfResult(), o1.getConfResult());
}
};
/**
* Run through the VersionTrack match-set looking for matches between functions
* that have been formally marked as "accepted"
* @param myMatchSet is the match-set to examine
* @param monitor is the TaskMonitor
* @throws CancelledException if the user cancels the correlation
*/
private void findAcceptedSeeds(VTMatchSet myMatchSet, TaskMonitor monitor)
throws CancelledException {
monitor.setMessage("Using accepted matches as seeds...");
VTSession session = myMatchSet.getSession();
VTAssociationManager associationManager = session.getAssociationManager();
int associationCount = associationManager.getAssociationCount();
monitor.initialize(associationCount);
List<VTAssociation> associations = associationManager.getAssociations();
Program sourceProgram = sourceNodes.getProgram();
Program destinationProgram = destNodes.getProgram();
for (VTAssociation association : associations) {
monitor.checkCancelled();
if (association.getType().equals(VTAssociationType.FUNCTION) &&
association.getStatus() == VTAssociationStatus.ACCEPTED) {
Address sourceAddress = association.getSourceAddress();
Function sourceFunction = sourceProgram.getListing().getFunctionAt(sourceAddress);
Address destinationAddress = association.getDestinationAddress();
Function destinationFunction =
destinationProgram.getListing().getFunctionAt(destinationAddress);
if (sourceFunction != null && destinationFunction != null) {
FunctionNode sn = sourceNodes.get(sourceAddress);
if (sn != null) {
FunctionNode dn = destNodes.get(destinationAddress);
if (dn != null) {
FunctionPair bridge = sn.findEdge(dn);
if (bridge != null) {
seeds.add(bridge);
}
}
}
}
}
monitor.incrementProgress(1);
}
}
/**
* Given an accepted FunctionPair and methods for generating neighborhoods,
* For each generation method, generate a source neighborhood and a dest neighborhood
* and search for pairs between the two neighborhoods with the highest confidence score.
*
* @param pair is the accepted FunctionPair
* @param generatorList is the list of neighborhood generators
* @return the highest confidence pair across all pairs of neighborhoods
*/
private PotentialPair analyze(FunctionPair pair, NeighborGenerator[] generatorList) {
FunctionNode sourceNode = pair.getSourceNode();
FunctionNode destNode = pair.getDestNode();
double confResult = pair.getConfResult();
double implicationScore = 0;
PotentialPair bestImplied = null;
for (NeighborGenerator generator : generatorList) {
NeighborhoodPair nPair = generator.generate(sourceNode, destNode);
PotentialPair srcToDestPair =
calculateBestNeighbor(nPair.srcNeighbors, nPair.destNeighbors, confResult);
if (srcToDestPair.getScore() > implicationScore) {
implicationScore = srcToDestPair.getScore();
bestImplied = srcToDestPair;
}
PotentialPair destToSrcPair =
calculateBestNeighbor(nPair.destNeighbors, nPair.srcNeighbors, confResult);
destToSrcPair.swap(); // PotentialPair is returned with opposite from and to nodes
if (destToSrcPair.getScore() > implicationScore) {
implicationScore = destToSrcPair.getScore();
bestImplied = destToSrcPair;
}
}
if (bestImplied != null) {
bestImplied.setOrigin(pair);
}
return bestImplied;
}
/**
* Among a -range- of pairs with the same score, return a pair that does not conflict with
* any other pair in the range, i.e. the source and destination of the pair or not
* involved in another pair (with the same score).
* @param potentialPairs is the (ordered) set of pairs
* @param firstIndex is the start index of the range
* @param lastIndex is the last index of the range
* @return an unconflicted pair or null if none exist
*/
private static PotentialPair unconflictedPair(ArrayList<PotentialPair> potentialPairs,
int firstIndex, int lastIndex) {
for (int i = firstIndex; i <= lastIndex; i++) {
FunctionNode myFrom = potentialPairs.get(i).getSource();
FunctionNode myTo = potentialPairs.get(i).getDestination();
boolean useMe = true;
for (int j = firstIndex; j <= lastIndex; j++) { // Look for conflicts in entries with same score
if (i == j) {
continue;
}
FunctionNode yourFrom = potentialPairs.get(j).getSource();
FunctionNode yourTo = potentialPairs.get(j).getDestination();
if (myFrom == yourFrom || myTo == yourTo) {
useMe = false; // Conflict found. Can't use this one.
break;
}
}
if (useMe) { // No conflict found
return potentialPairs.get(i); // Use this entry
}
}
return null;
}
/**
* Adjust an original confidence score between functions -a- and -b-
* based on the likelihood of children matching and parents matching.
* @param conf is the original confidence
* @param a is one side of the function pair
* @param b is the other side
* @return the adjusted score
*/
private static double adjustConfidenceScore(double conf, FunctionNode a, FunctionNode b) {
final int childrenSize = b.getChildren().size();
double ratio = (childrenSize == 0 ? 0 : (double) a.getChildren().size() / childrenSize);
final double kidRatio = Math.min(ratio, 1 / ratio);
final int parentsSize = b.getParents().size();
ratio = (parentsSize == 0 ? 0 : (double) a.getParents().size() / parentsSize);
final double rentRatio = Math.min(ratio, 1 / ratio);
ratio = (double) a.getLen() / b.getLen();
final double lenRatio = Math.min(ratio, 1 / ratio);
return 0.25 * conf * lenRatio * (1 + kidRatio) * (1 + rentRatio);
}
/**
* Find the first PotentialPair where there is no conflict.
* Sort the pairs based on score, and divide them into ranges of equal score.
* Look for the first PotentialPair whose source and dest are not involved with any
* other pair within an equal score range.
* @param potentialPairs is the array of pairs
* @return the first (highest scoring) unconflicted pair (or null)
*/
private static PotentialPair findFirstUnconflictedPair(
ArrayList<PotentialPair> potentialPairs) {
Collections.sort(potentialPairs); // Sort pairs based on score
int lastIndex = potentialPairs.size() - 1;
while (lastIndex >= 0) {
double score = potentialPairs.get(lastIndex).getScore();
int firstIndex = lastIndex - 1;
while (firstIndex >= 0 && potentialPairs.get(firstIndex).getScore() >= score) {
firstIndex -= 1;
}
PotentialPair bestPair = unconflictedPair(potentialPairs, firstIndex + 1, lastIndex);
if (bestPair != null) {
return bestPair;
}
lastIndex = firstIndex;
}
return PotentialPair.EMPTY_PAIR; // No match found. We get here in the case of conflict-only matrices.
}
/**
* Given matching neighborhoods, look at "matrix" of scores for pairs across them.
* Return the most likely pair.
* @param aNeighbors is the first neighborhood
* @param bNeighbors is the second neighborhood
* @param confResult is the confidence score associated with the accepted match
* @return the most likely pair as a PotentialPair
*/
private PotentialPair calculateBestNeighbor(Set<FunctionNode> aNeighbors,
Set<FunctionNode> bNeighbors, double confResult) {
ArrayList<PotentialPair> potentialPairs = new ArrayList<PotentialPair>();
PotentialPair bestPair = PotentialPair.EMPTY_PAIR;
int bestCount = 0; // Number of pairs with the same (currently) best score
// CRITICAL LOOP
for (FunctionNode relative : aNeighbors) { // For every function in the source neighborhood
if (relative.isAcceptedMatch()) {
continue;
}
double bestAdjustedScore = 0; // Best score you're seeing for just this relative.
double relSum = 0; // Sum of relative's scores for associates...for normalizing.
double bestOriginalScore = 0; // So that we can recover the entry without computation.
FunctionNode bestRelAssoc = null; // The highest scoring associate
// CRITICAL INNER LOOP
Iterator<Entry<FunctionNode, FunctionPair>> iter = relative.getAssociateIterator();
while (iter.hasNext()) { // Run through every putative match to -relative-
Entry<FunctionNode, FunctionPair> entry = iter.next();
final FunctionNode associate = entry.getKey();
final double value = entry.getValue().getConfResult();
if (bNeighbors.contains(associate)) { // Does the dest side of the match lie in dest neighborhood
double entryAdjusted = adjustConfidenceScore(value, relative, associate);
relSum += entryAdjusted; // Keep track of score sum for normalization
if (entryAdjusted >= bestAdjustedScore) { // Keep track of highest scoring pair
bestAdjustedScore = entryAdjusted;
bestRelAssoc = associate;
bestOriginalScore = value;
}
}
}
if (relSum > 0) {
// Compute a final score that takes into account the dimensions of the neighborhoods
// and scores of other potential pairs across the neighborhoods
double tempMax = bNeighbors.size() * (bestOriginalScore + confResult) *
bestAdjustedScore / relSum;
PotentialPair newPair = new PotentialPair(relative, bestRelAssoc, tempMax);
potentialPairs.add(newPair);
if (tempMax > bestPair.getScore()) { // We have seen a new maximum.
bestPair = newPair; // Keep track of the new best
bestCount = 1; // Restart the counter
}
else if (tempMax == bestPair.getScore()) { // A tie score with the current best
bestCount += 1;
}
}
}
if (bestCount == 0 || bestPair.getScore() == 0) {
return PotentialPair.EMPTY_PAIR; // The default null object passed for nothing found.
}
if (bestCount == 1) { // There is a unique best entry. Use it.
return bestPair;
}
return findFirstUnconflictedPair(potentialPairs); // The best pair is a tie, we need to go deeper into the list
}
}

View file

@ -0,0 +1,119 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.util.*;
import generic.lsh.*;
import generic.lsh.vector.HashEntry;
import ghidra.util.exception.CancelledException;
import ghidra.util.task.TaskMonitor;
/**
* Container for FunctionNodes so that nodes that are "near" each other
* (meaning the nodes' feature vectors have high cosine-similarity)
* can be discovered. As nodes are added, they are distributed across
* bins, where similar nodes tend to be placed into the same bins.
*/
class BinningSystem {
private final int L; // Number of distinct binnings
private int[][] partitionIdentities;
private TreeMap<Integer, TreeSet<FunctionNode>>[] binSys;
/**
* Construct a container that holds the FunctionNodes. If model is not null, then the FunctionNodes will be indexed
* @param model is the particular configuration model to use for this
*/
@SuppressWarnings("unchecked")
public BinningSystem(LSHMemoryModel model) {
int k = model.getK(); // k = #of hyperplanes comprising the each binning.
L = KandL.memoryModelToL(model);
this.partitionIdentities = new int[L][];
this.binSys = new TreeMap[L]; // A system of L binnings.
Random random = new Random(23);
for (int ii = 0; ii < L; ++ii) {
this.partitionIdentities[ii] = new int[k];
for (int jj = 0; jj < k; ++jj) {
this.partitionIdentities[ii][jj] = random.nextInt();
}
this.binSys[ii] = new TreeMap<Integer, TreeSet<FunctionNode>>();
}
}
/**
* Add a list of {@link FunctionNode} objects into the bins
* @param iter is an iterator over the raw FunctionNodes to add
* @param monitor is the TaskMonitor
* @throws CancelledException for user cancellation of the correlator
*/
public void add(Iterator<FunctionNode> iter, TaskMonitor monitor) throws CancelledException {
while (iter.hasNext()) {
FunctionNode node = iter.next();
monitor.checkCancelled();
monitor.incrementProgress(1);
if (node.getVector() == null) {
continue;
}
int[] features = getBinIds(node);
for (int ii = 0; ii < features.length; ++ii) {
TreeSet<FunctionNode> list = binSys[ii].get(features[ii]);
if (list == null) {
list = new TreeSet<FunctionNode>();
binSys[ii].put(features[ii], list);
}
list.add(node);
}
}
}
/**
* Returns the union of all the bins containing the exemplar FunctionNode.
* These nodes are likely to similar to the exemplar, but need secondary testing.
* @param node is the exemplar
* @return a set of FunctionNodes
*/
public Set<FunctionNode> lookup(FunctionNode node) {
TreeSet<FunctionNode> result = new TreeSet<FunctionNode>();
int[] features = getBinIds(node);
for (int ii = 0; ii < features.length; ++ii) {
TreeSet<FunctionNode> list = binSys[ii].get(features[ii]);
if (list != null) {
result.addAll(list);
}
}
return result;
}
/**
* Given a node, calculate the binId for each binning in this system
* @param node is the FunctionNode to label
* @return an array of ids
*/
private int[] getBinIds(FunctionNode node) {
if (node.getVector() == null) {
return null;
}
int[] result = new int[L];
HashEntry[] entries = node.getVector().getEntries();
for (int ii = 0; ii < L; ++ii) {
int hash = Partition.hash(partitionIdentities[ii], entries);
result[ii] = hash;
}
return result;
}
}

View file

@ -0,0 +1,200 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.util.*;
import java.util.Map.Entry;
import generic.lsh.vector.LSHVector;
import ghidra.program.model.address.Address;
import ghidra.program.model.listing.Function;
/**
* Information about a single function the correlator is attempting to match
*/
public class FunctionNode implements Comparable<FunctionNode> {
private final Address addr; // Address of the function represented, also unique identifier
private final String name; // Name of the function this node represents.
private final LSHVector vec; // Feature vector
private ArrayList<Address> callAddresses; // Addresses of functions this node calls.
private final Set<FunctionNode> children; // Who do I call in the call graph?
private final Set<FunctionNode> parents; // Who calls me in the call graph?
private Map<FunctionNode, FunctionPair> associates; // Potential matches on the other side? And what's our conf?
private final int len; // Number of addresses in the body of this function
private boolean acceptedMatch; // Has this node been formally matched with something
/**
* Allocate a container for FunctionNodes as needed by the NeighborGenerators. These are generally small sets
* where we need to check containment constantly.
* @return the container
*/
public static Set<FunctionNode> neigborhoodAllocate() {
return new HashSet<FunctionNode>();
}
public FunctionNode(Function function, LSHVector vector, ArrayList<Address> callAddresses) {
this.addr = function.getEntryPoint();
this.name = function.getName();
this.vec = vector;
this.callAddresses = callAddresses; //It will take a second pass through the data to figure out how the call graph fits together.
this.associates = new HashMap<FunctionNode, FunctionPair>();
this.children = neigborhoodAllocate();
this.parents = neigborhoodAllocate();
int val = (int) function.getBody().getNumAddresses();
this.len = (val == 0) ? 1 : val; // Guarantee a non-zero length
this.acceptedMatch = false;
}
@Override
public int hashCode() {
return ((addr == null) ? 0 : addr.hashCode());
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
FunctionNode other = (FunctionNode) obj;
if (addr == null) {
if (other.addr != null) {
return false;
}
}
else if (!addr.equals(other.addr)) {
return false;
}
return true;
}
@Override
public int compareTo(FunctionNode other) {
return addr.compareTo(other.addr); // Compare by address
}
@Override
public String toString() {
return name;
}
/**
* @return the Address of the entry point of the Function represented by this node
*/
public Address getAddress() {
return addr;
}
/**
* @return the feature vector associated with this node (function)
*/
public LSHVector getVector() {
return vec;
}
/**
* Grab the raw call addresses, releasing the memory in the process
* @return the list of addresses
*/
public List<Address> releaseCallAddresses() {
List<Address> res = callAddresses;
callAddresses = null; // Release our reference to addresses
return res;
}
/**
* @return the set of functions (FunctionNodes) called by this function
*/
public Set<FunctionNode> getChildren() {
return children;
}
/**
* @return the set of functions (FunctionNodes) that call this function
*/
public Set<FunctionNode> getParents() {
return parents;
}
/**
* Add a (potential) match for this node. The match
* is stored with a FunctionPair object holding similarity information
* @param other is the potentially matching FunctionNode
* @param pair is the FunctionPair describing the similarity
*/
public void addAssociate(FunctionNode other, FunctionPair pair) {
associates.put(other, pair);
}
/**
* Remove what was previously considered a potential match.
* @param other is the matching FunctionNode
*/
public void removeAssociate(FunctionNode other) {
associates.remove(other);
}
/**
* Clear all potential matches.
*/
public void clearAssociates() {
associates.clear();
}
/**
* @return an iterator over all potential matches for this node
*/
public Iterator<Entry<FunctionNode, FunctionPair>> getAssociateIterator() {
return associates.entrySet().iterator();
}
/**
* If -other- is a potential match, return the FunctionPair describing the similarity
* @param other is the possible potential match
* @return the FunctionPair describing the match or null, if -other- is not a potential match
*/
public FunctionPair findEdge(FunctionNode other) {
return associates.get(other);
}
/**
* @return the number of addresses in the function body represented by this node
*/
public int getLen() {
return len;
}
/**
* @return true if this node has been formally matched by the correlator
*/
public boolean isAcceptedMatch() {
return acceptedMatch;
}
/**
* Mark that this node has been matched (not matched) by the correlator
* @param used is true if this node has been matched
*/
public void setAcceptedMatch(boolean used) {
this.acceptedMatch = used;
}
}

View file

@ -0,0 +1,101 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.util.*;
import ghidra.program.model.address.Address;
import ghidra.program.model.listing.*;
/**
* Container of FunctionNodes corresponding to functions in a single Program
*/
public class FunctionNodeContainer {
private Program program; // Program containing all the functions
private Map<Address, FunctionNode> addrToNode; // Map from Address to FunctionNode representing the function
public FunctionNodeContainer(Program program, List<FunctionNode> nodeList) {
this.program = program;
addrToNode = new TreeMap<Address, FunctionNode>();
for (FunctionNode node : nodeList) {
addrToNode.put(node.getAddress(), node);
}
generateCallGraph();
}
public Program getProgram() {
return program;
}
/**
* Get the FunctionNode associated with a specific address
* @param addr the Address to search for
* @return the corresponding FunctionNode (or null if addr maps to nothing)
*/
public FunctionNode get(Address addr) {
return addrToNode.get(addr);
}
/**
* @return the number of FunctionNodes held in this container
*/
public int size() {
return addrToNode.size();
}
/**
* @return an iterator over all FunctionNodes in this container, in address order
*/
public Iterator<FunctionNode> iterator() {
return addrToNode.values().iterator();
}
/**
* Generate program call-graph in terms of FunctionNodes
* Uses the call address attached to each raw FunctionNode
* Once the xrefs are built, the original call address arrays are released
*/
private void generateCallGraph() {
FunctionManager mgr = program.getFunctionManager();
for (FunctionNode node : addrToNode.values()) { //Addresses are associated to nodes.
if (node != null) {
List<Address> callAddresses = node.releaseCallAddresses();
for (Address addr : callAddresses) {
FunctionNode kid;
for (;;) {
kid = addrToNode.get(addr); //These nodes are the vertices in the call graph.
if (kid != null) {
break;
}
Function f = mgr.getFunctionAt(addr); // If addr does not link to a node, it is most likely a thunk
if (f == null) {
break;
}
if (!f.isThunk()) {
break;
}
addr = f.getThunkedFunction(false).getEntryPoint(); // Replace with address of thunked function
}
if (kid != null) {
node.getChildren().add(kid);
kid.getParents().add(node);
}
}
}
}
return;
}
}

View file

@ -0,0 +1,133 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import ghidra.feature.vt.api.main.*;
/**
* A possible match between source and destination.
*/
public class FunctionPair {
private FunctionNode sourceNode; // Function from the source program
private FunctionNode destNode; // Function from the destination program
private double simResult; // Similarity of the pair (0.0 to 1.0)
private double confResult; // Confidence score of the pair
/**
* Constructor
* @param source the source function
* @param dest the destination function
* @param simRes the computed similarity score
* @param confRes the computed confidence score
*/
public FunctionPair(FunctionNode source, FunctionNode dest, double simRes, double confRes) {
this.sourceNode = source;
this.destNode = dest;
this.simResult = simRes;
this.confResult = confRes;
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((destNode == null) ? 0 : destNode.hashCode());
result = prime * result + ((sourceNode == null) ? 0 : sourceNode.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
FunctionPair other = (FunctionPair) obj;
if (destNode == null) {
if (other.destNode != null) {
return false;
}
}
else if (!destNode.equals(other.destNode)) {
return false;
}
if (sourceNode == null) {
if (other.sourceNode != null) {
return false;
}
}
else if (!sourceNode.equals(other.sourceNode)) {
return false;
}
return true;
}
/**
* Compute the formal Version Tracking match record corresponding to this pair
* @param matchSet is the match set the record should be added to
* @return the match record
*/
public VTMatchInfo getMatch(VTMatchSet matchSet) {
VTMatchInfo result = new VTMatchInfo(matchSet);
result.setSimilarityScore(new VTScore(simResult));
result.setConfidenceScore(new VTScore(confResult));
result.setAssociationType(VTAssociationType.FUNCTION);
result.setSourceAddress(sourceNode.getAddress());
result.setDestinationAddress(destNode.getAddress());
result.setSourceLength(sourceNode.getLen());
result.setDestinationLength(destNode.getLen());
return result;
}
@Override
public String toString() {
return sourceNode.toString() + "," + destNode.toString();
}
/**
* @return info about the source function
*/
public FunctionNode getSourceNode() {
return sourceNode;
}
/**
* @return info about the destination function
*/
public FunctionNode getDestNode() {
return destNode;
}
/**
* @return the similarity score of the pair
*/
public double getSimResult() {
return simResult;
}
/**
* @return the confidence score of the pair
*/
public double getConfResult() {
return confResult;
}
}

View file

@ -0,0 +1,136 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.util.Set;
import java.util.TreeMap;
import generic.lsh.vector.LSHVectorFactory;
import ghidra.program.model.listing.Function;
import ghidra.program.model.symbol.*;
/**
* A neighborhood generator that, for a given function, generates all functions
* in the same namespace. For efficiency, it caches the namespace sets it generates.
*/
public class NamespaceNeighborhood extends NeighborGenerator {
private FunctionNodeContainer sourceNodes; // Reference to global set of source functions
private FunctionNodeContainer destNodes; // Reference to global set of destination functions
private TreeMap<Long, Set<FunctionNode>> sourceSets; // Map from namespace ID to matching set of source functions
private TreeMap<Long, Set<FunctionNode>> destSets; // Map from namespace ID to matching set of dest functions
private TreeMap<PairLabel, NeighborhoodPair> namespacePair; // Map from pair of namespace IDs to pair of namespace sets
private PairLabel cacheKey; // internal key for quick lookups into namespacePair map
private static class PairLabel implements Comparable<PairLabel> {
public Long srcLabel;
public Long destLabel;
@Override
public int compareTo(PairLabel o) {
int srcCmp = Long.compare(srcLabel.longValue(), o.srcLabel.longValue());
if (srcCmp != 0) {
return srcCmp;
}
return Long.compare(destLabel.longValue(), o.destLabel.longValue());
}
}
public NamespaceNeighborhood(LSHVectorFactory vectorFactory, double impThreshold,
FunctionNodeContainer sourceNodes, FunctionNodeContainer destNodes) {
super(vectorFactory, impThreshold);
this.sourceNodes = sourceNodes;
this.destNodes = destNodes;
sourceSets = new TreeMap<Long, Set<FunctionNode>>();
destSets = new TreeMap<Long, Set<FunctionNode>>();
namespacePair = new TreeMap<PairLabel, NeighborhoodPair>();
cacheKey = new PairLabel();
}
private Namespace getNamespace(FunctionNode root, FunctionNodeContainer container) {
Function function =
container.getProgram().getFunctionManager().getFunctionAt(root.getAddress());
if (function == null) {
return null;
}
Namespace namespace = function.getParentNamespace();
return namespace;
}
private Set<FunctionNode> buildNeighborhood(Namespace namespace, Long namespaceKey,
FunctionNodeContainer container, TreeMap<Long, Set<FunctionNode>> sets) {
Set<FunctionNode> resultSet = sets.get(namespaceKey);
if (resultSet == null) {
resultSet = FunctionNode.neigborhoodAllocate();
SymbolTable symbolTable = container.getProgram().getSymbolTable();
SymbolIterator iter = symbolTable.getSymbols(namespace);
while (iter.hasNext()) {
Symbol sym = iter.next();
if (sym.getSymbolType() != SymbolType.FUNCTION) {
continue;
}
FunctionNode node = container.get(sym.getAddress());
if (node != null) {
resultSet.add(node);
}
}
sets.put(namespaceKey, resultSet);
}
return resultSet;
}
private NeighborhoodPair findPair(Long srcKey, Long destKey) {
cacheKey.srcLabel = srcKey;
cacheKey.destLabel = destKey;
return namespacePair.get(cacheKey);
}
private void cachePair(Long srcKey, Long destKey, NeighborhoodPair pair) {
PairLabel newLabel = new PairLabel();
newLabel.srcLabel = srcKey;
newLabel.destLabel = destKey;
namespacePair.put(newLabel, pair);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
Namespace srcNamespace = getNamespace(srcRoot, sourceNodes);
Namespace destNamespace = getNamespace(destRoot, destNodes);
if (srcNamespace == null || destNamespace == null) {
NeighborhoodPair pair = new NeighborhoodPair();
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
return pair; // Empty pair
}
Long srcNamespaceKey = srcNamespace.getID();
Long destNamespaceKey = destNamespace.getID();
NeighborhoodPair pair = findPair(srcNamespaceKey, destNamespaceKey);
if (pair == null) {
pair = new NeighborhoodPair();
pair.srcNeighbors =
buildNeighborhood(srcNamespace, srcNamespaceKey, sourceNodes, sourceSets);
pair.destNeighbors =
buildNeighborhood(destNamespace, destNamespaceKey, destNodes, destSets);
cachePair(srcNamespaceKey, destNamespaceKey, pair);
}
if (!pair.isFilledOut) {
if (fillOutPairs(pair, 10000)) {
pair.isFilledOut = true;
}
}
return pair;
}
}

View file

@ -0,0 +1,287 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import java.util.ArrayList;
import java.util.Set;
import generic.lsh.vector.*;
/**
* Class(es) for constructing a "neighborhood" of functions around a function
* that we know has a match. Comparing across neighborhoods provides a large
* cut-down in both search time and uncertainty when trying to find additional matches.
*/
public abstract class NeighborGenerator {
public static final int RELATIVE_COMPARES = 25; // Maximum number of extra compares between "relative" sets
private double impThreshold; // Confidence threshold for extending to additional matches
private LSHVectorFactory vectorFactory;
public static class NeighborhoodPair {
public Set<FunctionNode> srcNeighbors;
public Set<FunctionNode> destNeighbors;
public boolean isFilledOut = false;
}
public NeighborGenerator(LSHVectorFactory vectorFactory, double impThreshold) {
this.vectorFactory = vectorFactory;
this.impThreshold = impThreshold;
}
/**
* Given roots from the source program and the destination program,
* generate a neighborhood of functions related to each root.
* @param srcRoot is the root from the source program
* @param destRoot is the root from the destination program
* @return a pair of "neighborhoods" as a set of FunctionNodes
*/
public abstract NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot);
/**
* Do the feature vector comparison of every source to every destination and create
* new putative matches (associates) if the comparison score exceeds {@link #impThreshold}
* @param unmatchedSource is the list of sources
* @param unmatchedDest is the list of destinations
*/
private void searchForNewMatches(ArrayList<FunctionNode> unmatchedSource,
ArrayList<FunctionNode> unmatchedDest) {
VectorCompare veccompare = new VectorCompare();
for (FunctionNode src : unmatchedSource) {
LSHVector srcvec = src.getVector();
for (FunctionNode dst : unmatchedDest) {
if (src.findEdge(dst) != null) {
continue; // This pair has already been compared
}
// Feature vector computations
double similarity = srcvec.compare(dst.getVector(), veccompare);
double confidence = vectorFactory.calculateSignificance(veccompare);
if (confidence < impThreshold) {
continue;
}
FunctionPair newPair = new FunctionPair(src, dst, similarity, confidence);
src.addAssociate(dst, newPair);
dst.addAssociate(src, newPair);
}
}
}
/**
* If nodes haven't been compared before, compare them and add an associate if it passes threshold
* @param pair is the two sets of nodes that we are comparing between
* @param maxCompares is the maximum number of comparisons to perform
* @return true is comparisons were actually performed
*/
protected boolean fillOutPairs(NeighborhoodPair pair, int maxCompares) {
ArrayList<FunctionNode> unmatchedSource = new ArrayList<FunctionNode>();
ArrayList<FunctionNode> unmatchedDest = null;
for (FunctionNode src : pair.srcNeighbors) {
if (src.isAcceptedMatch()) {
continue;
}
if (src.getVector() == null) {
continue;
}
unmatchedSource.add(src);
}
if (unmatchedSource.isEmpty()) {
return false;
}
if (unmatchedSource.size() > maxCompares) {
return false;
}
unmatchedDest = new ArrayList<FunctionNode>();
for (FunctionNode dst : pair.destNeighbors) {
if (dst.isAcceptedMatch()) {
continue;
}
if (dst.getVector() == null) {
continue;
}
unmatchedDest.add(dst);
}
if (unmatchedDest.isEmpty()) {
return false;
}
if (unmatchedSource.size() * unmatchedDest.size() > maxCompares) {
return false;
}
searchForNewMatches(unmatchedSource, unmatchedDest);
return true;
}
/**
* Parents of -root-
*/
public static class Parents extends NeighborGenerator {
public Parents(LSHVectorFactory vectorFactory, double impThreshold) {
super(vectorFactory, impThreshold);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
NeighborhoodPair pair = new NeighborhoodPair();
pair.srcNeighbors = srcRoot.getParents();
pair.destNeighbors = destRoot.getParents();
fillOutPairs(pair, RELATIVE_COMPARES);
return pair;
}
}
/**
* Children of -root-
*/
public static class Children extends NeighborGenerator {
public Children(LSHVectorFactory vectorFactory, double impThreshold) {
super(vectorFactory, impThreshold);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
NeighborhoodPair pair = new NeighborhoodPair();
pair.srcNeighbors = srcRoot.getChildren();
pair.destNeighbors = destRoot.getChildren();
fillOutPairs(pair, RELATIVE_COMPARES);
return pair;
}
}
/**
* Grand parents of -root-
*/
public static class GrandParents extends NeighborGenerator {
public GrandParents(LSHVectorFactory vectorFactory, double impThreshold) {
super(vectorFactory, impThreshold);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
NeighborhoodPair pair = new NeighborhoodPair();
Set<FunctionNode> tempRels = srcRoot.getParents();
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.srcNeighbors.addAll(rel.getParents());
}
pair.srcNeighbors.remove(srcRoot);
tempRels = destRoot.getParents();
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.destNeighbors.addAll(rel.getParents());
}
pair.destNeighbors.remove(destRoot);
fillOutPairs(pair, RELATIVE_COMPARES);
return pair;
}
}
/**
* Grandchildren of -root-
*/
public static class GrandChildren extends NeighborGenerator {
public GrandChildren(LSHVectorFactory vectorFactory, double impThreshold) {
super(vectorFactory, impThreshold);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
NeighborhoodPair pair = new NeighborhoodPair();
Set<FunctionNode> tempRels = srcRoot.getChildren();
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.srcNeighbors.addAll(rel.getChildren());
}
pair.srcNeighbors.remove(srcRoot);
tempRels = destRoot.getChildren();
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.destNeighbors.addAll(rel.getChildren());
}
pair.destNeighbors.remove(destRoot);
fillOutPairs(pair, RELATIVE_COMPARES);
return pair;
}
}
/**
* Functions that share a parent with -root-
*/
public static class Siblings extends NeighborGenerator {
public Siblings(LSHVectorFactory vectorFactory, double impThreshold) {
super(vectorFactory, impThreshold);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
NeighborhoodPair pair = new NeighborhoodPair();
Set<FunctionNode> tempRels = srcRoot.getParents();
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.srcNeighbors.addAll(rel.getChildren());
}
pair.srcNeighbors.remove(srcRoot);
tempRels = destRoot.getParents();
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.destNeighbors.addAll(rel.getChildren());
}
pair.destNeighbors.remove(destRoot);
fillOutPairs(pair, RELATIVE_COMPARES);
return pair;
}
}
/**
* Functions that share a child with -root-
*/
public static class Spouses extends NeighborGenerator {
public Spouses(LSHVectorFactory vectorFactory, double impThreshold) {
super(vectorFactory, impThreshold);
}
@Override
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
NeighborhoodPair pair = new NeighborhoodPair();
Set<FunctionNode> tempRels = srcRoot.getChildren();
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.srcNeighbors.addAll(rel.getParents());
}
pair.srcNeighbors.remove(srcRoot);
tempRels = destRoot.getChildren();
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
for (FunctionNode rel : tempRels) {
pair.destNeighbors.addAll(rel.getParents());
}
pair.destNeighbors.remove(destRoot);
fillOutPairs(pair, RELATIVE_COMPARES);
return pair;
}
}
}

View file

@ -0,0 +1,66 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
/**
* Given a matching FunctionPair, this object represents a different
* potential match taken from neighborhoods of the match endpoints.
*/
public class PotentialPair implements Comparable<PotentialPair> {
private FunctionPair originBridge; // Accepted match that induced this potential match
private FunctionNode fromNode; // Source node of potential match
private FunctionNode toNode; // Destination node of potential match
private double score; // implication score associated with potential match
public static final PotentialPair EMPTY_PAIR = new PotentialPair(null, null, 0.0);
public PotentialPair(FunctionNode src, FunctionNode dest, double sc) {
fromNode = src;
toNode = dest;
score = sc;
}
public double getScore() {
return score;
}
public FunctionNode getSource() {
return fromNode;
}
public FunctionNode getDestination() {
return toNode;
}
public FunctionPair getOrigin() {
return originBridge;
}
public void setOrigin(FunctionPair pair) {
originBridge = pair;
}
public void swap() {
FunctionNode tmp = fromNode;
fromNode = toNode;
toNode = tmp;
}
@Override
public int compareTo(PotentialPair o) {
return Double.compare(score, o.score);
}
}

View file

@ -0,0 +1,35 @@
/* ###
* IP: GHIDRA
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.gui.validator;
import ghidra.app.plugin.core.analysis.validator.PostAnalysisValidator;
import ghidra.app.plugin.core.decompiler.validator.DecompilerParameterIDValidator;
import ghidra.feature.vt.api.main.VTSession;
import ghidra.program.model.listing.Program;
public class DecompilerParameterIDVTPreconditionValidator extends
VTPostAnalysisPreconditionValidatorAdaptor {
public DecompilerParameterIDVTPreconditionValidator(Program sourceProgram,
Program destinationProgram, VTSession existingResults) {
super(sourceProgram, destinationProgram, existingResults);
}
@Override
protected PostAnalysisValidator createPostAnalysisPreconditionValidator(Program program) {
return new DecompilerParameterIDValidator(program);
}
}

View file

@ -0,0 +1,50 @@
/* ###
* IP: GHIDRA
* EXCLUDE: YES
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package ghidra.feature.vt.api;
import ghidra.program.model.address.Address;
import ghidra.program.model.address.AddressFactory;
import ghidra.program.model.address.AddressSet;
import ghidra.program.model.address.AddressSetView;
import ghidra.program.model.address.AddressSpace;
import org.junit.Test;
public class BSimSelfSimilarCorrelatorTest extends AbstractSelfSimilarCorrelatorTest {
public BSimSelfSimilarCorrelatorTest( ) {
super();
}
@Test
public void testFlow() throws Exception {
exerciseFunctionsForFactory(new BSimProgramCorrelatorFactory(),
// with default settings these three functions won't get matched
getSourceMinus(0x010031ee, 0x01003ac0, 0x01004c1d));
}
private AddressSetView getSourceMinus(long... addresses) {
AddressFactory addressFactory = sourceProgram.getAddressFactory();
AddressSpace addressSpace = addressFactory.getDefaultAddressSpace();
AddressSet set =
new AddressSet(sourceProgram.getMemory().getInitializedAddressSet());
for (long l : addresses) {
Address address = addressSpace.getAddress(l);
set = set.subtract(new AddressSet(address, address));
}
return set;
}
}