mirror of
https://github.com/NationalSecurityAgency/ghidra.git
synced 2025-10-05 19:42:36 +02:00
GP-4009 Introduced BSim functionality including support for postgresql,
elasticsearch and h2 databases. Added BSim correlator to Version Tracking.
This commit is contained in:
parent
f0f5b8f2a4
commit
0865a3dfb0
509 changed files with 77125 additions and 934 deletions
67
Ghidra/Features/VersionTrackingBSim/src/main/help/help/TOC_Source.xml
Executable file
67
Ghidra/Features/VersionTrackingBSim/src/main/help/help/TOC_Source.xml
Executable file
|
@ -0,0 +1,67 @@
|
|||
<?xml version='1.0' encoding='ISO-8859-1' ?>
|
||||
<!--
|
||||
|
||||
This is an XML file intended to be parsed by the Ghidra help system. It is loosely based
|
||||
upon the JavaHelp table of contents document format. The Ghidra help system uses a
|
||||
TOC_Source.xml file to allow a module with help to define how its contents appear in the
|
||||
Ghidra help viewer's table of contents. The main document (in the Base module)
|
||||
defines a basic structure for the
|
||||
Ghidra table of contents system. Other TOC_Source.xml files may use this structure to insert
|
||||
their files directly into this structure (and optionally define a substructure).
|
||||
|
||||
|
||||
In this document, a tag can be either a <tocdef> or a <tocref>. The former is a definition
|
||||
of an XML item that may have a link and may contain other <tocdef> and <tocref> children.
|
||||
<tocdef> items may be referred to in other documents by using a <tocref> tag with the
|
||||
appropriate id attribute value. Using these two tags allows any module to define a place
|
||||
in the table of contents system (<tocdef>), which also provides a place for
|
||||
other TOC_Source.xml files to insert content (<tocref>).
|
||||
|
||||
During the help build time, all TOC_Source.xml files will be parsed and validated to ensure
|
||||
that all <tocref> tags point to valid <tocdef> tags. From these files will be generated
|
||||
<module name>_TOC.xml files, which are table of contents files written in the format
|
||||
desired by the JavaHelp system. Additionally, the generated files will be merged together
|
||||
as they are loaded by the JavaHelp system. In the end, when displaying help in the Ghidra
|
||||
help GUI, there will be one table of contents that has been created from the definitions in
|
||||
all of the modules' TOC_Source.xml files.
|
||||
|
||||
|
||||
Tags and Attributes
|
||||
|
||||
<tocdef>
|
||||
-id - the name of the definition (this must be unique across all TOC_Source.xml files)
|
||||
-text - the display text of the node, as seen in the help GUI
|
||||
-target** - the file to display when the node is clicked in the GUI
|
||||
-sortgroup - this is a string that defines where a given node should appear under a given
|
||||
parent. The string values will be sorted by the JavaHelp system using
|
||||
a javax.text.RulesBasedCollator. If this attribute is not specified, then
|
||||
the text of attribute will be used.
|
||||
|
||||
<tocref>
|
||||
-id - The id of the <tocdef> that this reference points to
|
||||
|
||||
**The URL for the target is relative and should start with 'help/topics'. This text is
|
||||
used by the Ghidra help system to provide a universal starting point for all links so that
|
||||
they can be resolved at runtime, across modules.
|
||||
|
||||
|
||||
-->
|
||||
|
||||
|
||||
<tocroot>
|
||||
<tocref id="Ghidra Functionality">
|
||||
|
||||
<tocref id="Version Tracking">
|
||||
|
||||
<tocref id="VTCorrelators">
|
||||
|
||||
<tocdef id="BSimCorrelator"
|
||||
text="BSim Program Correlator"
|
||||
target="help/topics/BSimCorrelator/BSim_Correlator.html" />
|
||||
|
||||
</tocref>
|
||||
|
||||
</tocref>
|
||||
|
||||
</tocref>
|
||||
</tocroot>
|
|
@ -0,0 +1,99 @@
|
|||
<!DOCTYPE doctype PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN">
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META name="generator" content=
|
||||
"HTML Tidy for Java (vers. 2009-12-01), see jtidy.sourceforge.net">
|
||||
|
||||
<TITLE>BSim Program Correlator</TITLE>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=windows-1252">
|
||||
<LINK rel="stylesheet" type="text/css" href="help/shared/DefaultStyle.css">
|
||||
</HEAD>
|
||||
|
||||
<BODY lang="EN-US">
|
||||
<H1><A name="BSim_Correlator"></A>BSim Program Correlator</H1>
|
||||
|
||||
<BLOCKQUOTE>
|
||||
<P>The BSim <A href="help/topics/VersionTrackingPlugin/VT_Correlators.html">Program
|
||||
Correlator</A> uses the decompiler to generate confidence scores between potentially matching
|
||||
functions in the source and destination programs. Function call-graphs are used to further
|
||||
boost the scores and distinguish between conflicting matches.
|
||||
.</P>
|
||||
|
||||
<P>The decompiler generates a formal feature vector for a function, where individual features
|
||||
are extracted from the control-flow and data-flow characteristics of its normalized p-code
|
||||
representation. </P>
|
||||
|
||||
<P>Functions are compared by comparing their corresponding feature vectors, from which
|
||||
similarity and confidence scores are extracted.</P>
|
||||
|
||||
<P>A confidence score, for this correlator, is an open-ended floating-point value
|
||||
(ranging from -infinity to +infinity) describing the amount of correspondence between the
|
||||
control-flow and data-flow of two functions. A good working range for setting thresholds
|
||||
(below) and for describing function pairs with some matching features is 0.0 to 100.0.
|
||||
A score of 0.0 corresponds to functions with roughly equal amounts of similar and dissimilar features.
|
||||
A score of 10.0 is typical for small identical functions, and 100.0 is achieved by pairs
|
||||
of larger sized identical functions.</P>
|
||||
|
||||
<P>The correlator initially collects high confidence (high scoring) matches as a "seed" set.
|
||||
Then, using call-graph information, the seed matches are extended to additional matches
|
||||
throughout the programs.</P>
|
||||
|
||||
<P>There are four options for the BSim Program Correlator:</P>
|
||||
|
||||
<P><B>Confidence Threshold for a Match</B></P>
|
||||
|
||||
<BLOCKQUOTE>
|
||||
<P>This option sets the threshold for accepting
|
||||
a new match by following the call-graph from a previously accepted pair of matching functions.
|
||||
Because potential pairs are drawn from the local call-graph neighborhood of an
|
||||
accepted pair, this threshold is typically set lower than the seed threshold.</P>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
<P><B>Confidence Threshold for a Seed</B></P>
|
||||
|
||||
<BLOCKQUOTE>
|
||||
<P>This establishes the threshold for choosing
|
||||
potential matches as part of the initial "seed" set. Be careful setting this threshold
|
||||
lower than the default, as any false match in the initial seed set is more likely to propagate.</P>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
<P><B>Memory Model</B></P>
|
||||
|
||||
<BLOCKQUOTE>
|
||||
<P>The memory model option selects how much memory to use for finding
|
||||
matches. If you run out of memory correlating large programs, lower this choice to "Medium"
|
||||
or "Small"...note however that correlation may be slightly less accurate.</P>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
<P><B>Use Accepted Matches as Seeds</B></P>
|
||||
|
||||
<BLOCKQUOTE>
|
||||
<P>This option indicates whether to include
|
||||
previously accepted matches, typically from other correlators, into the initial "seed" set.
|
||||
The BSim Program Correlator will still try to find additional seed matches to merge
|
||||
with the already accepted matches. If you want to only use the incoming accepted
|
||||
matches, set the Confidence Threshold for a Seed extremely high (like 99999999 or
|
||||
so). Be careful to accept only high confidence matches prior to using this option, as
|
||||
any errors in the initial seed set are more likely to propagate.</P>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
</BLOCKQUOTE><!-- Main content blockquote -->
|
||||
|
||||
<P class="relatedtopic">Related Topics:</P>
|
||||
|
||||
<UL>
|
||||
<LI><A href="help/topics/VersionTrackingPlugin/VT_Correlators.html">Version Tracking Program
|
||||
Correlators</A></LI>
|
||||
|
||||
<LI><A href="help/topics/VersionTrackingPlugin/VT_Wizard.html">Version Tracking
|
||||
Wizard</A></LI>
|
||||
|
||||
<LI><A href="help/topics/VersionTrackingPlugin/VT_Tool.html">Version Tracking Tool</A></LI>
|
||||
|
||||
<LI><A href="help/topics/VersionTrackingPlugin/Version_Tracking_Intro.html">Version Tracking
|
||||
Introduction</A></LI>
|
||||
</UL><BR>
|
||||
<BR>
|
||||
</BODY>
|
||||
</HTML>
|
|
@ -0,0 +1,350 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.io.InputStream;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
|
||||
import generic.cache.CachingPool;
|
||||
import generic.cache.CountingBasicFactory;
|
||||
import generic.concurrent.QCallback;
|
||||
import generic.jar.ResourceFile;
|
||||
import generic.lsh.LSHMemoryModel;
|
||||
import generic.lsh.vector.*;
|
||||
import ghidra.app.decompiler.*;
|
||||
import ghidra.app.decompiler.parallel.ParallelDecompiler;
|
||||
import ghidra.app.decompiler.signature.SignatureResult;
|
||||
import ghidra.feature.vt.api.main.VTMatchInfo;
|
||||
import ghidra.feature.vt.api.main.VTMatchSet;
|
||||
import ghidra.feature.vt.api.util.VTAbstractProgramCorrelator;
|
||||
import ghidra.feature.vt.api.util.VTFunctionSizeUtil;
|
||||
import ghidra.features.bsim.query.GenSignatures;
|
||||
import ghidra.framework.options.ToolOptions;
|
||||
import ghidra.program.model.address.*;
|
||||
import ghidra.program.model.lang.CompilerSpec.EvaluationModelType;
|
||||
import ghidra.program.model.lang.LanguageID;
|
||||
import ghidra.program.model.lang.PrototypeModel;
|
||||
import ghidra.program.model.listing.*;
|
||||
import ghidra.program.model.symbol.Reference;
|
||||
import ghidra.program.model.symbol.ReferenceManager;
|
||||
import ghidra.util.Msg;
|
||||
import ghidra.util.exception.CancelledException;
|
||||
import ghidra.util.task.TaskMonitor;
|
||||
import ghidra.util.xml.SpecXmlUtils;
|
||||
import ghidra.xml.NonThreadedXmlPullParserImpl;
|
||||
import ghidra.xml.XmlPullParser;
|
||||
|
||||
/**
|
||||
* Correlator which discovers functional matches by comparing data-flow feature vectors.
|
||||
* An initial seed set of high confidence matches are chosen. The match set is extended
|
||||
* from the seeds by using local neighborhoods around the accepted match to efficiently
|
||||
* discover new matches.
|
||||
*/
|
||||
public class BSimProgramCorrelator extends VTAbstractProgramCorrelator {
|
||||
|
||||
private LSHVectorFactory vectorFactory;
|
||||
private static final int TIMEOUT = 60;
|
||||
public static final double SIMILARITY_THRESHOLD = 0.5;
|
||||
// note that the utils function strips out thunks now so we just set
|
||||
// minimum size to 0 assuming call graph will save us
|
||||
public static final int FUNCTION_MINIMUM_SIZE = 0;
|
||||
|
||||
protected BSimProgramCorrelator(Program sourceProgram, AddressSetView sourceAddressSet,
|
||||
Program destinationProgram, AddressSetView destinationAddressSet, ToolOptions options) {
|
||||
super(sourceProgram, sourceAddressSet, destinationProgram, destinationAddressSet, options);
|
||||
vectorFactory = new WeightedLSHCosineVectorFactory();
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getName() {
|
||||
return BSimProgramCorrelatorFactory.NAME;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected void doCorrelate(VTMatchSet matchSet, TaskMonitor monitor) throws CancelledException {
|
||||
ToolOptions options = getOptions();
|
||||
LSHMemoryModel model = options.getEnum(BSimProgramCorrelatorFactory.MEMORY_MODEL,
|
||||
BSimProgramCorrelatorFactory.MEMORY_MODEL_DEFAULT);
|
||||
|
||||
double confThreshold = options.getDouble(BSimProgramCorrelatorFactory.SEED_CONF_THRESHOLD,
|
||||
BSimProgramCorrelatorFactory.SEED_CONF_THRESHOLD_DEFAULT);
|
||||
double impThreshold = options.getDouble(BSimProgramCorrelatorFactory.IMPLICATION_THRESHOLD,
|
||||
BSimProgramCorrelatorFactory.IMPLICATION_THRESHOLD_DEFAULT);
|
||||
|
||||
boolean useAcceptedMatchesAsSeeds =
|
||||
options.getBoolean(BSimProgramCorrelatorFactory.USE_ACCEPTED_MATCHES_AS_SEEDS,
|
||||
BSimProgramCorrelatorFactory.USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT);
|
||||
|
||||
boolean useNamespace = false; // By default we don't have namespace info
|
||||
boolean useCallRefs = false; // By default we use decompiler to generate callgraph
|
||||
|
||||
List<FunctionPair> result;
|
||||
try {
|
||||
LanguageID id1 = getSourceProgram().getLanguageID();
|
||||
LanguageID id2 = getDestinationProgram().getLanguageID();
|
||||
//Use special weights for LSHCosineVectors
|
||||
ResourceFile defaultWeightsFile = GenSignatures.getWeightsFile(id1, id2);
|
||||
if (defaultWeightsFile == null) {
|
||||
|
||||
// known limitation; hoped to be fixed in the future
|
||||
Msg.showWarn(this, null, "Cannot Compare Programs",
|
||||
"<html>Cannot currently compare programs with such different architectures.<br>" +
|
||||
"Source program is " + id1.getIdAsString() + "<br>" +
|
||||
"Destination program is " + id2.getIdAsString());
|
||||
return;
|
||||
}
|
||||
if (defaultWeightsFile.getName().contains("cpool")) {
|
||||
// With constant pool languages (Dalvik, JVM)
|
||||
useNamespace = true; // We have reliable namespace info
|
||||
useCallRefs = true; // We don't have absolute calls, use references
|
||||
}
|
||||
InputStream input = defaultWeightsFile.getInputStream();
|
||||
XmlPullParser parser = new NonThreadedXmlPullParserImpl(input, "Vector weights parser",
|
||||
SpecXmlUtils.getXmlHandler(), false);
|
||||
vectorFactory.readWeights(parser);
|
||||
input.close();
|
||||
|
||||
monitor.setMessage("Generating source dictionary");
|
||||
List<FunctionNode> rawSourceNodes =
|
||||
generateNodes(getSourceProgram(), getSourceAddressSet(), useCallRefs, monitor);
|
||||
FunctionNodeContainer sourceNodes =
|
||||
new FunctionNodeContainer(getSourceProgram(), rawSourceNodes);
|
||||
|
||||
monitor.setMessage("Generating destination dictionary");
|
||||
List<FunctionNode> rawDestNodes = generateNodes(getDestinationProgram(),
|
||||
getDestinationAddressSet(), useCallRefs, monitor);
|
||||
FunctionNodeContainer destNodes =
|
||||
new FunctionNodeContainer(getDestinationProgram(), rawDestNodes);
|
||||
|
||||
BSimProgramCorrelatorMatching omni =
|
||||
new BSimProgramCorrelatorMatching(sourceNodes, destNodes, vectorFactory,
|
||||
confThreshold, impThreshold, SIMILARITY_THRESHOLD, useNamespace, model);
|
||||
omni.discoverPotentialMatches(monitor);
|
||||
if (!omni.generateSeeds(matchSet, useAcceptedMatchesAsSeeds, monitor)) {
|
||||
Msg.info(this, "BSim Program Correlator could not find any seeds");
|
||||
}
|
||||
result = omni.doMatching(monitor); //Do the matching!
|
||||
}
|
||||
catch (InterruptedException e) {
|
||||
Msg.error(this, "Error Correlating", e.getCause());
|
||||
CancelledException cancelledException = new CancelledException();
|
||||
cancelledException.initCause(e);
|
||||
throw cancelledException;
|
||||
}
|
||||
catch (CancelledException ce) {
|
||||
throw ce;
|
||||
}
|
||||
catch (Exception e) {
|
||||
Msg.error(this, "Error Correlating", e.getCause());
|
||||
CancelledException cancelledException = new CancelledException();
|
||||
cancelledException.initCause(e);
|
||||
throw cancelledException;
|
||||
}
|
||||
|
||||
wrapUp(result, matchSet, monitor); // Display matches, print stuff, etc.
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
private static void addExternalFunctions(Program program, List<FunctionNode> list,
|
||||
LSHVectorFactory vFactory, TaskMonitor monitor) throws CancelledException {
|
||||
FunctionIterator iter = program.getFunctionManager().getExternalFunctions();
|
||||
// Create a generic feature vector to represent external functions
|
||||
int[] externalFeatures = new int[1];
|
||||
externalFeatures[0] = 0xfade5eed;
|
||||
LSHVector externalVector = vFactory.buildVector(externalFeatures);
|
||||
while (iter.hasNext()) {
|
||||
monitor.checkCancelled();
|
||||
Function func = iter.next();
|
||||
FunctionNode node = new FunctionNode(func, externalVector, new ArrayList<Address>());
|
||||
list.add(node);
|
||||
}
|
||||
}
|
||||
|
||||
private List<FunctionNode> generateNodes(final Program program, AddressSetView addrSet,
|
||||
boolean useCallRefs, final TaskMonitor monitor)
|
||||
throws InterruptedException, CancelledException, Exception {
|
||||
|
||||
monitor.checkCancelled();
|
||||
|
||||
CachingPool<DecompInterface> decompilerPool = new CachingPool<DecompInterface>(
|
||||
new DecompilerFactory(program, vectorFactory.getSettings()));
|
||||
ParallelDecompilerCallback callback =
|
||||
new ParallelDecompilerCallback(decompilerPool, vectorFactory, useCallRefs);
|
||||
|
||||
List<FunctionNode> results = null;
|
||||
try {
|
||||
AddressSetView refinedAddressSet = VTFunctionSizeUtil.minimumSizeFunctionFilter(program,
|
||||
addrSet, FUNCTION_MINIMUM_SIZE, monitor);
|
||||
results = ParallelDecompiler.decompileFunctions(callback, program, refinedAddressSet,
|
||||
monitor);
|
||||
}
|
||||
finally {
|
||||
decompilerPool.dispose();
|
||||
}
|
||||
|
||||
addExternalFunctions(program, results, vectorFactory, monitor);
|
||||
|
||||
monitor.setMessage("Collecting dictionary results");
|
||||
return results;
|
||||
}
|
||||
|
||||
private static void wrapUp(List<FunctionPair> result, final VTMatchSet matchSet,
|
||||
final TaskMonitor monitor) throws CancelledException {
|
||||
|
||||
//Populate the table with matches.
|
||||
monitor.setMessage("Adding results to database");
|
||||
monitor.setIndeterminate(false);
|
||||
monitor.initialize(result.size());
|
||||
int ii = 0;
|
||||
for (FunctionPair resMatch : result) {
|
||||
VTMatchInfo match = resMatch.getMatch(matchSet);
|
||||
++ii;
|
||||
if (ii % 1000 == 0) {
|
||||
monitor.checkCancelled();
|
||||
monitor.incrementProgress(1000);
|
||||
}
|
||||
matchSet.addMatch(match);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
/**
|
||||
* Establish decompiler options for the feature vector calculation
|
||||
* @param program is the specific program to decompile
|
||||
* @return the formal options object
|
||||
*/
|
||||
private static DecompileOptions getDecompilerOptions(Program program) {
|
||||
DecompileOptions options = new DecompileOptions();
|
||||
options.setNoCastPrint(true);
|
||||
try {
|
||||
final PrototypeModel model = program.getCompilerSpec()
|
||||
.getPrototypeEvaluationModel(EvaluationModelType.EVAL_CURRENT);
|
||||
options.setProtoEvalModel(model.getName());
|
||||
}
|
||||
catch (Exception e) {
|
||||
Msg.warn(BSimProgramCorrelator.class,
|
||||
"problem setting prototype evaluation model: " + e.getMessage());
|
||||
}
|
||||
options.setDefaultTimeout(TIMEOUT);
|
||||
return options;
|
||||
}
|
||||
|
||||
//==================================================================================================
|
||||
// Inner Classes
|
||||
//==================================================================================================
|
||||
|
||||
private static class DecompilerFactory extends CountingBasicFactory<DecompInterface> {
|
||||
|
||||
private Program program;
|
||||
private int settings;
|
||||
|
||||
DecompilerFactory(Program program, int set) {
|
||||
this.program = program;
|
||||
settings = set;
|
||||
}
|
||||
|
||||
@Override
|
||||
public DecompInterface doCreate(int itemNumber) throws IOException {
|
||||
DecompInterface decompiler = new DecompInterface();
|
||||
decompiler.setOptions(getDecompilerOptions(program));
|
||||
decompiler.setSignatureSettings(settings);
|
||||
if (!decompiler.openProgram(program)) {
|
||||
throw new IOException(decompiler.getLastMessage());
|
||||
}
|
||||
return decompiler;
|
||||
}
|
||||
|
||||
@Override
|
||||
public void doDispose(DecompInterface decompiler) {
|
||||
decompiler.dispose();
|
||||
}
|
||||
}
|
||||
|
||||
private static class ParallelDecompilerCallback implements QCallback<Function, FunctionNode> {
|
||||
|
||||
private LSHVectorFactory vectorFactory;
|
||||
private CachingPool<DecompInterface> pool;
|
||||
private boolean callsByReference;
|
||||
|
||||
ParallelDecompilerCallback(CachingPool<DecompInterface> decompilerPool,
|
||||
LSHVectorFactory vFactory, boolean refCalls) {
|
||||
vectorFactory = vFactory;
|
||||
this.pool = decompilerPool;
|
||||
callsByReference = refCalls;
|
||||
}
|
||||
|
||||
private ArrayList<Address> getCallAddressesByReference(Function function,
|
||||
TaskMonitor monitor) throws CancelledException {
|
||||
ArrayList<Address> resultList = new ArrayList<Address>();
|
||||
Program program = function.getProgram();
|
||||
ReferenceManager referenceManager = program.getReferenceManager();
|
||||
AddressSetView addresses = function.getBody();
|
||||
AddressIterator addressIterator = addresses.getAddresses(true);
|
||||
while (addressIterator.hasNext()) {
|
||||
monitor.checkCancelled();
|
||||
Address address = addressIterator.next();
|
||||
Reference[] referencesFrom = referenceManager.getReferencesFrom(address);
|
||||
if (referencesFrom != null) {
|
||||
for (Reference reference : referencesFrom) {
|
||||
if (reference.getReferenceType().isCall()) {
|
||||
resultList.add(reference.getToAddress());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return resultList;
|
||||
}
|
||||
|
||||
@Override
|
||||
public FunctionNode process(Function function, TaskMonitor monitor) throws Exception {
|
||||
|
||||
monitor.checkCancelled();
|
||||
DecompInterface decompiler = pool.get();
|
||||
try {
|
||||
LSHVector vec = null;
|
||||
ArrayList<Address> callAddresses = null;
|
||||
SignatureResult sigres =
|
||||
decompiler.generateSignatures(function, !callsByReference, TIMEOUT, monitor);
|
||||
if (sigres == null) {
|
||||
callAddresses = new ArrayList<Address>();
|
||||
}
|
||||
else {
|
||||
vec = vectorFactory.buildVector(sigres.features);
|
||||
if (callsByReference) {
|
||||
callAddresses = getCallAddressesByReference(function, monitor);
|
||||
}
|
||||
else {
|
||||
callAddresses = sigres.calllist; //It will take a second pass through the data to figure out how the call graph fits together.
|
||||
}
|
||||
}
|
||||
FunctionNode res = new FunctionNode(function, vec, callAddresses);
|
||||
if (res.getVector() == null) {
|
||||
String errmsg = decompiler.getLastMessage();
|
||||
if (errmsg.startsWith("Bad command")) {
|
||||
throw new DecompileException(BSimProgramCorrelatorFactory.NAME, errmsg);
|
||||
}
|
||||
}
|
||||
return res;
|
||||
}
|
||||
finally {
|
||||
pool.release(decompiler);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,105 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import generic.lsh.LSHMemoryModel;
|
||||
import ghidra.feature.vt.api.main.VTProgramCorrelator;
|
||||
import ghidra.feature.vt.api.main.VTProgramCorrelatorAddressRestrictionPreference;
|
||||
import ghidra.feature.vt.api.util.VTAbstractProgramCorrelatorFactory;
|
||||
import ghidra.feature.vt.api.util.VTOptions;
|
||||
import ghidra.program.model.address.AddressSetView;
|
||||
import ghidra.program.model.listing.Program;
|
||||
import ghidra.util.HelpLocation;
|
||||
|
||||
public class BSimProgramCorrelatorFactory extends VTAbstractProgramCorrelatorFactory {
|
||||
public static final String NAME = "BSim Function Matching";
|
||||
public static final String DESC =
|
||||
"Finds function matches by using data flow and call graph similarities between the " +
|
||||
"source and destination programs.";
|
||||
|
||||
public static final String MEMORY_MODEL = "Memory Model";
|
||||
public static final LSHMemoryModel MEMORY_MODEL_DEFAULT = LSHMemoryModel.LARGE;
|
||||
public static final String MEMORY_MODEL_DESC =
|
||||
"Amount of memory used to compute matches. Smaller models are slightly less accurate.";
|
||||
|
||||
public static final String SEED_CONF_THRESHOLD = "Confidence Threshold for a Seed";
|
||||
public static final double SEED_CONF_THRESHOLD_DEFAULT = 10.0;
|
||||
public static final String SEED_CONF_THRESHOLD_DESC =
|
||||
"For threshold N, the probability that a seed is incorrect is approximately 1/2^(N/5+9).";
|
||||
|
||||
public static final String IMPLICATION_THRESHOLD = "Confidence Threshold for a Match";
|
||||
public static final double IMPLICATION_THRESHOLD_DEFAULT = 0.0;
|
||||
public static final String IMPLICATION_THRESHOLD_DESC =
|
||||
"For threshold N, the probability that a match is incorrect is approximately 1/2^(N/5+9).";
|
||||
|
||||
public static final String USE_ACCEPTED_MATCHES_AS_SEEDS = "Use Accepted Matches as Seeds";
|
||||
public static final boolean USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT = true;
|
||||
public static final String USE_ACCEPTED_MATCHES_AS_SEEDS_DESC =
|
||||
"Already accepted matches will also be used as seeds.";
|
||||
|
||||
@Override
|
||||
public int getPriority() {
|
||||
return 50;
|
||||
}
|
||||
|
||||
@Override
|
||||
protected VTProgramCorrelator doCreateCorrelator(Program sourceProgram,
|
||||
AddressSetView sourceAddressSet, Program destinationProgram,
|
||||
AddressSetView destinationAddressSet, VTOptions options) {
|
||||
return new BSimProgramCorrelator(sourceProgram, sourceAddressSet, destinationProgram,
|
||||
destinationAddressSet, options);
|
||||
}
|
||||
|
||||
@Override
|
||||
public VTProgramCorrelatorAddressRestrictionPreference getAddressRestrictionPreference() {
|
||||
return VTProgramCorrelatorAddressRestrictionPreference.RESTRICTION_NOT_ALLOWED;
|
||||
}
|
||||
|
||||
@Override
|
||||
public VTOptions createDefaultOptions() {
|
||||
VTOptions options = new VTOptions(NAME);
|
||||
HelpLocation help = new HelpLocation("BSimCorrelator", "BSim_Correlator");
|
||||
|
||||
options.setEnum(MEMORY_MODEL, MEMORY_MODEL_DEFAULT);
|
||||
options.registerOption(MEMORY_MODEL, MEMORY_MODEL_DEFAULT, help, MEMORY_MODEL_DESC);
|
||||
|
||||
options.setDouble(SEED_CONF_THRESHOLD, SEED_CONF_THRESHOLD_DEFAULT);
|
||||
options.registerOption(SEED_CONF_THRESHOLD, SEED_CONF_THRESHOLD_DEFAULT, help,
|
||||
SEED_CONF_THRESHOLD_DESC);
|
||||
|
||||
options.setDouble(IMPLICATION_THRESHOLD, IMPLICATION_THRESHOLD_DEFAULT);
|
||||
options.registerOption(IMPLICATION_THRESHOLD, IMPLICATION_THRESHOLD_DEFAULT, help,
|
||||
IMPLICATION_THRESHOLD_DESC);
|
||||
|
||||
options.setBoolean(USE_ACCEPTED_MATCHES_AS_SEEDS, USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT);
|
||||
options.registerOption(USE_ACCEPTED_MATCHES_AS_SEEDS, USE_ACCEPTED_MATCHES_AS_SEEDS_DEFAULT,
|
||||
help, USE_ACCEPTED_MATCHES_AS_SEEDS_DESC);
|
||||
|
||||
options.setOptionsHelpLocation(help);
|
||||
|
||||
return options;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getDescription() {
|
||||
return DESC;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getName() {
|
||||
return NAME;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,787 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.util.*;
|
||||
import java.util.Map.Entry;
|
||||
|
||||
import org.apache.commons.collections4.MultiValuedMap;
|
||||
import org.apache.commons.collections4.multimap.HashSetValuedHashMap;
|
||||
|
||||
import generic.concurrent.*;
|
||||
import generic.lsh.LSHMemoryModel;
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import generic.lsh.vector.VectorCompare;
|
||||
import ghidra.feature.vt.api.NeighborGenerator.NeighborhoodPair;
|
||||
import ghidra.feature.vt.api.main.*;
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.listing.Program;
|
||||
import ghidra.util.Msg;
|
||||
import ghidra.util.exception.CancelledException;
|
||||
import ghidra.util.task.TaskMonitor;
|
||||
|
||||
/**
|
||||
* Class for running the BSim function matching algorithm, which happens in stages:
|
||||
* 1) Construct BSimProgramCorrelatorMatching with prepopulated FunctionNodeContainers, one for source and destination programs
|
||||
* 2) Call discoverPotentialMatches to do raw vector comparisons among source and destination
|
||||
* 3) Call generateSeeds to select an initial set of high confidence matches
|
||||
* 4) Call doMatching to extend the seed set into a full list of matches
|
||||
*/
|
||||
public class BSimProgramCorrelatorMatching {
|
||||
|
||||
private SortedSet<PotentialPair> implications; // Current potential matches sorted by score
|
||||
private FunctionNodeContainer sourceNodes; // Nodes (functions) associated with the source program
|
||||
private FunctionNodeContainer destNodes; // Nodes associated with the destination program
|
||||
private LSHVectorFactory vectorFactory; // Factory for generating weighted vectors for comparing nodes
|
||||
private LinkedList<FunctionPair> matches; // The list of final matches
|
||||
private Set<FunctionPair> seeds; // Initial set of match pairs used for growing out full set of matches
|
||||
private List<FunctionPair> discoveredMatches; // Raw of set of pairs of similar functions
|
||||
private double confThreshold; // Initial confidence threshold for selecting seed matches
|
||||
private double impThreshold; // Confidence threshold for extending to additional matches
|
||||
private double potentialSimThreshold; // Similarity threshold used when discovering potential matches
|
||||
private LSHMemoryModel memoryModel; // The memory model to use when binning vectors
|
||||
private boolean useNamespaceNeighbors; // True if namespace information is used in matching
|
||||
|
||||
/**
|
||||
* This class is used to lookup potential matches in the {@link BinningSystem} and do
|
||||
* secondary testing by computing similarities of feature vectors.
|
||||
* Searching happens in parallel.
|
||||
*/
|
||||
private class MatchingCallback implements QCallback<FunctionNode, List<FunctionPair>> {
|
||||
|
||||
private BinningSystem sourceBinning;
|
||||
private double simThreshold;
|
||||
|
||||
MatchingCallback(BinningSystem sourceBinning, double simThreshold) {
|
||||
this.sourceBinning = sourceBinning;
|
||||
this.simThreshold = simThreshold;
|
||||
}
|
||||
|
||||
@Override
|
||||
public List<FunctionPair> process(FunctionNode queryNode, TaskMonitor monitor)
|
||||
throws Exception {
|
||||
monitor.checkCancelled();
|
||||
|
||||
if ((queryNode == null) || (queryNode.getVector() == null)) {
|
||||
monitor.incrementProgress(1);
|
||||
return null;
|
||||
}
|
||||
|
||||
List<FunctionPair> associates = new LinkedList<FunctionPair>();
|
||||
findSimilarNodes(associates, queryNode, monitor);
|
||||
monitor.incrementProgress(1);
|
||||
return associates;
|
||||
}
|
||||
|
||||
/**
|
||||
* Lookup potential matches for -queryNode- in the binning system,
|
||||
* and perform secondary testing to see if we have a full (potential) match.
|
||||
* Pairs that exceed the threshold are added to the -results- list
|
||||
* @param results is the list of FunctionPairs passing the similarity test
|
||||
* @param queryNode is the base FunctionNode to compare
|
||||
* @param monitor is the TaskMonitor
|
||||
* @throws CancelledException if the user cancels the correlation
|
||||
*/
|
||||
private void findSimilarNodes(List<FunctionPair> results, FunctionNode queryNode,
|
||||
TaskMonitor monitor) throws CancelledException {
|
||||
|
||||
//Set up for matching via feature vector comparison.
|
||||
Set<FunctionNode> neighbors = sourceBinning.lookup(queryNode);
|
||||
VectorCompare veccompare = new VectorCompare();
|
||||
|
||||
//Check each neighbor from the system of binnings to see if they pass a round of matching.
|
||||
for (FunctionNode neighbor : neighbors) {
|
||||
monitor.checkCancelled();
|
||||
|
||||
//Feature vector computations
|
||||
double similarity = neighbor.getVector().compare(queryNode.getVector(), veccompare);
|
||||
if (similarity < simThreshold) {
|
||||
continue;
|
||||
}
|
||||
double confidence = vectorFactory.calculateSignificance(veccompare);
|
||||
|
||||
//Create FunctionPair (bridge in the graph from source to dest)
|
||||
FunctionPair newPair =
|
||||
new FunctionPair(neighbor, queryNode, similarity, confidence);
|
||||
|
||||
results.add(newPair);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* @param sourceNodes is the container for source functions
|
||||
* @param destNodes is the container for destination functions
|
||||
* @param vFactory is the factory for building feature vectors during analysis
|
||||
* @param conf is the initial confidence threshold for seeds
|
||||
* @param imp is the follow-on confidence for extending to additional matches
|
||||
* @param sim is the similarity threshold used when discovering matches
|
||||
* @param useNamespace true if namespace info is used to find additional matches
|
||||
* @param model is the memory model to use when discovering seed matches
|
||||
*/
|
||||
public BSimProgramCorrelatorMatching(FunctionNodeContainer sourceNodes,
|
||||
FunctionNodeContainer destNodes, LSHVectorFactory vFactory, double conf, double imp,
|
||||
double sim, boolean useNamespace, LSHMemoryModel model) {
|
||||
this.sourceNodes = sourceNodes;
|
||||
this.destNodes = destNodes;
|
||||
this.vectorFactory = vFactory;
|
||||
confThreshold = conf;
|
||||
impThreshold = imp;
|
||||
potentialSimThreshold = sim;
|
||||
useNamespaceNeighbors = useNamespace;
|
||||
memoryModel = model;
|
||||
implications = new TreeSet<PotentialPair>();
|
||||
}
|
||||
|
||||
/**
|
||||
* Formally accept a FunctionPair as a match. Update bookkeeping to indicate the match.
|
||||
* @param bridge is the pair to accept as a match
|
||||
*/
|
||||
private void acceptMatch(FunctionPair bridge) {
|
||||
FunctionNode sourceNode = bridge.getSourceNode();
|
||||
FunctionNode destNode = bridge.getDestNode();
|
||||
sourceNode.setAcceptedMatch(true);
|
||||
destNode.setAcceptedMatch(true);
|
||||
matches.add(bridge);
|
||||
|
||||
// Given the pair, remove the source and destination as a potential matches from any other node.
|
||||
Iterator<Entry<FunctionNode, FunctionPair>> iter = sourceNode.getAssociateIterator();
|
||||
while (iter.hasNext()) {
|
||||
iter.next().getKey().removeAssociate(sourceNode);
|
||||
}
|
||||
iter = destNode.getAssociateIterator();
|
||||
while (iter.hasNext()) {
|
||||
iter.next().getKey().removeAssociate(destNode);
|
||||
}
|
||||
sourceNode.clearAssociates(); // Clear old potential matches
|
||||
destNode.clearAssociates();
|
||||
}
|
||||
|
||||
/**
|
||||
* Do vector comparisons between the source and destination FunctionNodes.
|
||||
* Anything discovered that exceeds {@link #potentialSimThreshold} is placed into {@link #discoveredMatches}
|
||||
* A {@link BinningSystem} is built, then individual FunctionNodes are searched in parallel.
|
||||
* @param monitor is the TaskMonitor
|
||||
* @throws Exception for user cancellation or other problems
|
||||
*/
|
||||
public void discoverPotentialMatches(TaskMonitor monitor) throws Exception {
|
||||
|
||||
BinningSystem binning = new BinningSystem(memoryModel);
|
||||
monitor.setMessage("Binning source functions...");
|
||||
monitor.initialize(sourceNodes.size());
|
||||
binning.add(sourceNodes.iterator(), monitor);
|
||||
|
||||
monitor.setMessage("Zealously over-pairing matches...");
|
||||
monitor.initialize(destNodes.size());
|
||||
|
||||
//
|
||||
// Queue setup
|
||||
//
|
||||
GThreadPool pool = GThreadPool.getPrivateThreadPool("BSimProgramCorrelatorMatching");
|
||||
QCallback<FunctionNode, List<FunctionPair>> callback =
|
||||
new MatchingCallback(binning, potentialSimThreshold);
|
||||
|
||||
// @formatter:off
|
||||
ConcurrentQ<FunctionNode, List<FunctionPair>> queue =
|
||||
new ConcurrentQBuilder<FunctionNode, List<FunctionPair>>()
|
||||
.setThreadPool(pool)
|
||||
.setCollectResults(true)
|
||||
.setMonitor(monitor)
|
||||
.build(callback);
|
||||
// @formatter:on
|
||||
|
||||
//
|
||||
// Submit and wait for results
|
||||
//
|
||||
queue.add(destNodes.iterator());
|
||||
|
||||
Collection<QResult<FunctionNode, List<FunctionPair>>> results;
|
||||
try {
|
||||
results = queue.waitForResults();
|
||||
}
|
||||
finally {
|
||||
queue.dispose();
|
||||
}
|
||||
|
||||
discoveredMatches = new LinkedList<FunctionPair>();
|
||||
for (QResult<FunctionNode, List<FunctionPair>> result : results) {
|
||||
monitor.checkCancelled();
|
||||
List<FunctionPair> pieces = result.getResult();
|
||||
if (pieces == null) {
|
||||
continue;
|
||||
}
|
||||
for (FunctionPair bridge : pieces) {
|
||||
monitor.checkCancelled();
|
||||
if (bridge != null) {
|
||||
FunctionNode sourceNode = bridge.getSourceNode();
|
||||
FunctionNode destNode = bridge.getDestNode();
|
||||
sourceNode.addAssociate(destNode, bridge);
|
||||
destNode.addAssociate(sourceNode, bridge);
|
||||
discoveredMatches.add(bridge);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Find the last index in the (sorted) list where the confidence is >= threshold
|
||||
* @param pairs is the sorted list
|
||||
* @param threshold to find
|
||||
* @return the index
|
||||
*/
|
||||
private static int findIndexMatchingThreshold(ArrayList<FunctionPair> pairs, double threshold) {
|
||||
int min = 0;
|
||||
int max = pairs.size() - 1;
|
||||
while (min < max) {
|
||||
int mid = (min + max + 1) / 2; // Guarantee if min != max, then mid != min
|
||||
FunctionPair pair = pairs.get(mid);
|
||||
if (pair.getConfResult() < threshold) {
|
||||
max = mid - 1;
|
||||
}
|
||||
else {
|
||||
min = mid;
|
||||
}
|
||||
}
|
||||
return min;
|
||||
}
|
||||
|
||||
/**
|
||||
* Choose seed FunctionNode pairs with the highest confidence from among {@link #discoveredMatches}
|
||||
* making sure there are no conflicts, (a FunctionNode that is involved in multiple matches).
|
||||
* Selection happens in rounds. During a round:
|
||||
* a) "Accept" all pairs for which there is no immediate conflict
|
||||
* b) If a pair has conflicts, throw it out if either:
|
||||
* 1) The number of children is different between source and dest (difference > threshold)
|
||||
* 2) The function length is different between source and dest (difference > threshold)
|
||||
*
|
||||
* Between rounds the "accepted" pairs and the "thrown out" pairs may remove conflicts from the
|
||||
* remaining pairs. Each round the threshold for throwing out a conflict is tightened.
|
||||
*
|
||||
* The process terminates when no new pairs are accepted during a round.
|
||||
* The accepted pairs are sorted by confidence, and those exceeding {@link #confThreshold} become
|
||||
* the final seed set.
|
||||
* @param monitor is the TaskMonitor
|
||||
* @throws CancelledException if the user cancels the correlation
|
||||
*/
|
||||
private void chooseSeeds(TaskMonitor monitor) throws CancelledException {
|
||||
monitor.setMessage("Generating seeds...");
|
||||
ArrayList<FunctionPair> finalPairs = new ArrayList<FunctionPair>();
|
||||
HashSet<FunctionNode> matchedSource = new HashSet<FunctionNode>(); // Source functions that are matched
|
||||
HashSet<FunctionNode> matchedDest = new HashSet<FunctionNode>(); // Dest functions that are matched
|
||||
MultiValuedMap<FunctionNode, FunctionPair> sourceHoldOn =
|
||||
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Conflicting source functions held for next round
|
||||
MultiValuedMap<FunctionNode, FunctionPair> destHoldOn =
|
||||
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Conflicting dest functions held for next round
|
||||
MultiValuedMap<FunctionNode, FunctionPair> sourceFormatted =
|
||||
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Current set of potential pairs, indexed by source
|
||||
MultiValuedMap<FunctionNode, FunctionPair> destFormatted =
|
||||
new HashSetValuedHashMap<FunctionNode, FunctionPair>(); // Current set of potential pairs, indexed by dest
|
||||
|
||||
for (FunctionPair pair : discoveredMatches) { // Copy putative matches into the "current" set of potential pairs
|
||||
sourceFormatted.put(pair.getSourceNode(), pair);
|
||||
destFormatted.put(pair.getDestNode(), pair);
|
||||
}
|
||||
discoveredMatches = null; // The raw match list is no longer needed beyond this point
|
||||
|
||||
int keepLen = sourceFormatted.size();
|
||||
if (keepLen == 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
boolean changed = true;
|
||||
double ratioThresh = .5; // Initial threshold for throwing out pairs. Counts can differ by a factor of 2 to 1.
|
||||
while (changed) { // Keep going until no change (no new pairs)
|
||||
monitor.checkCancelled();
|
||||
final Collection<FunctionPair> values = sourceFormatted.values();
|
||||
monitor.initialize(values.size());
|
||||
for (FunctionPair entry : values) {
|
||||
monitor.checkCancelled();
|
||||
monitor.incrementProgress(1);
|
||||
if (!hasConflicts(entry, sourceFormatted, destFormatted)) { // Check for conflicts in our current set
|
||||
finalPairs.add(entry); // Accept immediately if no conflicts
|
||||
matchedSource.add(entry.getSourceNode());
|
||||
matchedDest.add(entry.getDestNode());
|
||||
}
|
||||
else {
|
||||
if (!matchedSource.contains(entry.getSourceNode()) &&
|
||||
!matchedDest.contains(entry.getDestNode())) {
|
||||
// If there is a conflict, but neither side has been matched yet,
|
||||
// decide if we throw out pair by comparing count ratios to ratioThresh
|
||||
|
||||
// Compute "number of children" ratio
|
||||
double leftside =
|
||||
Math.min((double) entry.getSourceNode().getChildren().size(),
|
||||
(double) entry.getDestNode().getChildren().size());
|
||||
double rightside =
|
||||
Math.max((double) entry.getSourceNode().getChildren().size(),
|
||||
(double) entry.getDestNode().getChildren().size());
|
||||
double childRatio = (rightside == 0 ? 0 : leftside / rightside); // Always <= 1.0
|
||||
|
||||
// Compute byte length ratio
|
||||
leftside = (double) entry.getSourceNode().getLen() /
|
||||
(double) entry.getDestNode().getLen();
|
||||
double lenRatio = Math.min(leftside, 1 / leftside); // Always <= 1.0
|
||||
if (lenRatio > ratioThresh && childRatio > ratioThresh) { // Test both ratios against threshold
|
||||
// Keep (don't throw out) if both ratios exceed threshold
|
||||
sourceHoldOn.put(entry.getSourceNode(), entry);
|
||||
destHoldOn.put(entry.getDestNode(), entry);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
sourceFormatted = sourceHoldOn; // Update our "current" set of sources
|
||||
destFormatted = destHoldOn; // Update our "current" set of dests
|
||||
changed = (keepLen != values.size()); // Did we get any new pairs this round?
|
||||
keepLen = sourceHoldOn.values().size();
|
||||
sourceHoldOn = new HashSetValuedHashMap<FunctionNode, FunctionPair>();
|
||||
destHoldOn = new HashSetValuedHashMap<FunctionNode, FunctionPair>();
|
||||
ratioThresh = (2 + ratioThresh) / 3; // Tighten the ratio threshold for next round
|
||||
// Move closer to 1.0 threshold (counts are exactly equal)
|
||||
}
|
||||
if (finalPairs.isEmpty()) {
|
||||
return; // found no seeds
|
||||
}
|
||||
Collections.sort(finalPairs, CONF_COMPARATOR);
|
||||
|
||||
double curConf = finalPairs.get(0).getConfResult();
|
||||
if (curConf < confThreshold) {
|
||||
Msg.warn(this, "Initial value of seed confidence too high (" + confThreshold +
|
||||
")...resetting seed confidence to " + curConf);
|
||||
confThreshold = curConf;
|
||||
}
|
||||
int lastIndex = findIndexMatchingThreshold(finalPairs, confThreshold); // Last index that still meets threshold
|
||||
for (int i = 0; i < lastIndex + 1; ++i) {
|
||||
FunctionPair pair = finalPairs.get(i);
|
||||
seeds.add(pair);
|
||||
}
|
||||
}
|
||||
|
||||
private static boolean hasConflicts(FunctionPair entry,
|
||||
MultiValuedMap<FunctionNode, FunctionPair> sourceFormatted,
|
||||
MultiValuedMap<FunctionNode, FunctionPair> destFormatted) {
|
||||
Collection<FunctionPair> sources = sourceFormatted.get(entry.getSourceNode());
|
||||
if (sources != null && sources.size() > 1) {
|
||||
return true;
|
||||
}
|
||||
Collection<FunctionPair> dests = destFormatted.get(entry.getDestNode());
|
||||
if (dests != null && dests.size() > 1) {
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate seed matches, placing the FunctionPair into the {@link #seeds} container.
|
||||
* Seeds come from a) previously accepted matches and b) the {@link #discoveredMatches}
|
||||
* @param matchSet is used to identify already accepted matches
|
||||
* @param useAcceptedMatchesAsSeeds is true if previously accepted matches are considered seeds
|
||||
* @param monitor is the TaskMonitor
|
||||
* @return true if at least one seed was identified
|
||||
* @throws CancelledException if the user cancels the correlation
|
||||
*/
|
||||
public boolean generateSeeds(VTMatchSet matchSet, boolean useAcceptedMatchesAsSeeds,
|
||||
TaskMonitor monitor) throws CancelledException {
|
||||
seeds = new HashSet<FunctionPair>();
|
||||
if (useAcceptedMatchesAsSeeds) {
|
||||
findAcceptedSeeds(matchSet, monitor);
|
||||
}
|
||||
chooseSeeds(monitor);
|
||||
return !seeds.isEmpty();
|
||||
}
|
||||
|
||||
/**
|
||||
* Establish what neighborhood generation strategy will be used
|
||||
* @param round - which round to build a strategy for
|
||||
* @return an array of NeighborGenerators
|
||||
*/
|
||||
private NeighborGenerator[] buildNeighborGenerators(int round) {
|
||||
ArrayList<NeighborGenerator> generatorList = new ArrayList<NeighborGenerator>();
|
||||
if (round == 0) {
|
||||
// For first round only collect new matches from "close" relationships (i.e. parent/child)
|
||||
// of the seed match.
|
||||
generatorList.add(new NeighborGenerator.Children(vectorFactory, impThreshold));
|
||||
generatorList.add(new NeighborGenerator.Parents(vectorFactory, impThreshold));
|
||||
// If the format includes explicit namespace information for functions,
|
||||
// use it when generating new matches.
|
||||
if (useNamespaceNeighbors) {
|
||||
generatorList.add(
|
||||
new NamespaceNeighborhood(vectorFactory, impThreshold, sourceNodes, destNodes));
|
||||
}
|
||||
}
|
||||
else {
|
||||
// For later rounds, also collect matches from more distant relationships (grandparent, grandchild, etc.)
|
||||
generatorList.add(new NeighborGenerator.Children(vectorFactory, impThreshold));
|
||||
generatorList.add(new NeighborGenerator.Parents(vectorFactory, impThreshold));
|
||||
generatorList.add(new NeighborGenerator.GrandChildren(vectorFactory, impThreshold));
|
||||
generatorList.add(new NeighborGenerator.Siblings(vectorFactory, impThreshold));
|
||||
generatorList.add(new NeighborGenerator.Spouses(vectorFactory, impThreshold));
|
||||
generatorList.add(new NeighborGenerator.GrandParents(vectorFactory, impThreshold));
|
||||
if (useNamespaceNeighbors) {
|
||||
generatorList.add(
|
||||
new NamespaceNeighborhood(vectorFactory, impThreshold, sourceNodes, destNodes));
|
||||
}
|
||||
}
|
||||
NeighborGenerator[] res = new NeighborGenerator[generatorList.size()];
|
||||
generatorList.toArray(res);
|
||||
return res;
|
||||
}
|
||||
|
||||
/**
|
||||
* Given a set of -seeds- iteratively extend the set of matches
|
||||
* Loop greedily picking the best relative match, maintaining score sorts and other bookkeeping
|
||||
* @param monitor is the TaskMonitor
|
||||
* @return the final list of FunctionPairs as official matches
|
||||
* @throws CancelledException if the user cancels the correlation
|
||||
*/
|
||||
public List<FunctionPair> doMatching(TaskMonitor monitor) throws CancelledException {
|
||||
matches = new LinkedList<FunctionPair>();
|
||||
|
||||
for (int round = 0; round < 2; round++) {
|
||||
monitor.checkCancelled();
|
||||
NeighborGenerator[] generatorList = buildNeighborGenerators(round);
|
||||
if (round == 0) {
|
||||
monitor.setMessage("Matching round 1...");
|
||||
monitor.initialize(seeds.size());
|
||||
for (FunctionPair bridge : seeds) {
|
||||
monitor.checkCancelled();
|
||||
monitor.incrementProgress(1);
|
||||
acceptMatch(bridge);
|
||||
PotentialPair impliedPair = analyze(bridge, generatorList);
|
||||
if (impliedPair != null) {
|
||||
implications.add(impliedPair);
|
||||
}
|
||||
}
|
||||
seeds = null; // seeds are no longer needed, free up memory
|
||||
}
|
||||
else {
|
||||
implications.clear();
|
||||
monitor.setMessage("Matching round 2...");
|
||||
monitor.initialize(matches.size());
|
||||
for (FunctionPair bridge : matches) {
|
||||
monitor.checkCancelled();
|
||||
monitor.incrementProgress(1);
|
||||
PotentialPair impliedPair = analyze(bridge, generatorList);
|
||||
if (impliedPair != null) {
|
||||
implications.add(impliedPair);
|
||||
}
|
||||
}
|
||||
}
|
||||
monitor.setMessage("Gathering matches for round " + (round + 1) + "...");
|
||||
int maxSize = implications.size();
|
||||
monitor.initialize(maxSize + 1);
|
||||
while (true) {
|
||||
monitor.checkCancelled();
|
||||
int size = implications.size();
|
||||
if (size > maxSize) {
|
||||
maxSize = size;
|
||||
monitor.setMaximum(maxSize + 1);
|
||||
}
|
||||
monitor.setProgress((maxSize - size) + 1);
|
||||
if (size == 0) {
|
||||
break;
|
||||
}
|
||||
PotentialPair bestImplied = implications.last();
|
||||
implications.remove(bestImplied);
|
||||
FunctionPair bridge =
|
||||
bestImplied.getSource().findEdge(bestImplied.getDestination());
|
||||
if (bridge != null) {
|
||||
acceptMatch(bridge);
|
||||
PotentialPair impliedPair = analyze(bridge, generatorList);
|
||||
if (impliedPair != null) {
|
||||
implications.add(impliedPair);
|
||||
}
|
||||
}
|
||||
// Let pair that produced this new match select a new PotentialPair
|
||||
PotentialPair impliedPair = analyze(bestImplied.getOrigin(), generatorList);
|
||||
if (impliedPair != null) {
|
||||
implications.add(impliedPair);
|
||||
}
|
||||
if (implications.isEmpty() || implications.last().getScore() < impThreshold) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//Hole Patching
|
||||
LinkedList<FunctionPair> matchCopy = new LinkedList<FunctionPair>(matches);
|
||||
VectorCompare veccompare = new VectorCompare();
|
||||
monitor.setMessage("Patching holes...");
|
||||
monitor.initialize(matches.size());
|
||||
for (FunctionPair bridge : matchCopy) {
|
||||
monitor.checkCancelled();
|
||||
monitor.incrementProgress(1);
|
||||
if (bridge.getSourceNode().getParents().size() == 1 &&
|
||||
bridge.getDestNode().getParents().size() == 1) {
|
||||
FunctionNode sp = bridge.getSourceNode().getParents().iterator().next();
|
||||
FunctionNode dp = bridge.getDestNode().getParents().iterator().next();
|
||||
if (sp.findEdge(dp) == null && !sp.isAcceptedMatch() && !dp.isAcceptedMatch()) {
|
||||
double similarity = sp.getVector().compare(dp.getVector(), veccompare);
|
||||
double confidence = vectorFactory.calculateSignificance(veccompare);
|
||||
FunctionPair rentBridge = new FunctionPair(sp, dp, similarity, confidence);
|
||||
acceptMatch(rentBridge);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return matches;
|
||||
}
|
||||
|
||||
//Compare pairs by confidence.
|
||||
private static final Comparator<FunctionPair> CONF_COMPARATOR = new Comparator<FunctionPair>() {
|
||||
@Override
|
||||
public int compare(FunctionPair o1, FunctionPair o2) {
|
||||
return Double.compare(o2.getConfResult(), o1.getConfResult());
|
||||
}
|
||||
};
|
||||
|
||||
/**
|
||||
* Run through the VersionTrack match-set looking for matches between functions
|
||||
* that have been formally marked as "accepted"
|
||||
* @param myMatchSet is the match-set to examine
|
||||
* @param monitor is the TaskMonitor
|
||||
* @throws CancelledException if the user cancels the correlation
|
||||
*/
|
||||
private void findAcceptedSeeds(VTMatchSet myMatchSet, TaskMonitor monitor)
|
||||
throws CancelledException {
|
||||
monitor.setMessage("Using accepted matches as seeds...");
|
||||
VTSession session = myMatchSet.getSession();
|
||||
VTAssociationManager associationManager = session.getAssociationManager();
|
||||
int associationCount = associationManager.getAssociationCount();
|
||||
monitor.initialize(associationCount);
|
||||
List<VTAssociation> associations = associationManager.getAssociations();
|
||||
Program sourceProgram = sourceNodes.getProgram();
|
||||
Program destinationProgram = destNodes.getProgram();
|
||||
|
||||
for (VTAssociation association : associations) {
|
||||
monitor.checkCancelled();
|
||||
if (association.getType().equals(VTAssociationType.FUNCTION) &&
|
||||
association.getStatus() == VTAssociationStatus.ACCEPTED) {
|
||||
|
||||
Address sourceAddress = association.getSourceAddress();
|
||||
Function sourceFunction = sourceProgram.getListing().getFunctionAt(sourceAddress);
|
||||
Address destinationAddress = association.getDestinationAddress();
|
||||
Function destinationFunction =
|
||||
destinationProgram.getListing().getFunctionAt(destinationAddress);
|
||||
|
||||
if (sourceFunction != null && destinationFunction != null) {
|
||||
FunctionNode sn = sourceNodes.get(sourceAddress);
|
||||
if (sn != null) {
|
||||
FunctionNode dn = destNodes.get(destinationAddress);
|
||||
if (dn != null) {
|
||||
FunctionPair bridge = sn.findEdge(dn);
|
||||
if (bridge != null) {
|
||||
seeds.add(bridge);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
monitor.incrementProgress(1);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Given an accepted FunctionPair and methods for generating neighborhoods,
|
||||
* For each generation method, generate a source neighborhood and a dest neighborhood
|
||||
* and search for pairs between the two neighborhoods with the highest confidence score.
|
||||
*
|
||||
* @param pair is the accepted FunctionPair
|
||||
* @param generatorList is the list of neighborhood generators
|
||||
* @return the highest confidence pair across all pairs of neighborhoods
|
||||
*/
|
||||
private PotentialPair analyze(FunctionPair pair, NeighborGenerator[] generatorList) {
|
||||
FunctionNode sourceNode = pair.getSourceNode();
|
||||
FunctionNode destNode = pair.getDestNode();
|
||||
double confResult = pair.getConfResult();
|
||||
|
||||
double implicationScore = 0;
|
||||
PotentialPair bestImplied = null;
|
||||
|
||||
for (NeighborGenerator generator : generatorList) {
|
||||
NeighborhoodPair nPair = generator.generate(sourceNode, destNode);
|
||||
PotentialPair srcToDestPair =
|
||||
calculateBestNeighbor(nPair.srcNeighbors, nPair.destNeighbors, confResult);
|
||||
if (srcToDestPair.getScore() > implicationScore) {
|
||||
implicationScore = srcToDestPair.getScore();
|
||||
bestImplied = srcToDestPair;
|
||||
}
|
||||
PotentialPair destToSrcPair =
|
||||
calculateBestNeighbor(nPair.destNeighbors, nPair.srcNeighbors, confResult);
|
||||
destToSrcPair.swap(); // PotentialPair is returned with opposite from and to nodes
|
||||
if (destToSrcPair.getScore() > implicationScore) {
|
||||
implicationScore = destToSrcPair.getScore();
|
||||
bestImplied = destToSrcPair;
|
||||
}
|
||||
}
|
||||
if (bestImplied != null) {
|
||||
bestImplied.setOrigin(pair);
|
||||
}
|
||||
return bestImplied;
|
||||
}
|
||||
|
||||
/**
|
||||
* Among a -range- of pairs with the same score, return a pair that does not conflict with
|
||||
* any other pair in the range, i.e. the source and destination of the pair or not
|
||||
* involved in another pair (with the same score).
|
||||
* @param potentialPairs is the (ordered) set of pairs
|
||||
* @param firstIndex is the start index of the range
|
||||
* @param lastIndex is the last index of the range
|
||||
* @return an unconflicted pair or null if none exist
|
||||
*/
|
||||
private static PotentialPair unconflictedPair(ArrayList<PotentialPair> potentialPairs,
|
||||
int firstIndex, int lastIndex) {
|
||||
for (int i = firstIndex; i <= lastIndex; i++) {
|
||||
FunctionNode myFrom = potentialPairs.get(i).getSource();
|
||||
FunctionNode myTo = potentialPairs.get(i).getDestination();
|
||||
boolean useMe = true;
|
||||
for (int j = firstIndex; j <= lastIndex; j++) { // Look for conflicts in entries with same score
|
||||
if (i == j) {
|
||||
continue;
|
||||
}
|
||||
FunctionNode yourFrom = potentialPairs.get(j).getSource();
|
||||
FunctionNode yourTo = potentialPairs.get(j).getDestination();
|
||||
if (myFrom == yourFrom || myTo == yourTo) {
|
||||
useMe = false; // Conflict found. Can't use this one.
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (useMe) { // No conflict found
|
||||
return potentialPairs.get(i); // Use this entry
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Adjust an original confidence score between functions -a- and -b-
|
||||
* based on the likelihood of children matching and parents matching.
|
||||
* @param conf is the original confidence
|
||||
* @param a is one side of the function pair
|
||||
* @param b is the other side
|
||||
* @return the adjusted score
|
||||
*/
|
||||
private static double adjustConfidenceScore(double conf, FunctionNode a, FunctionNode b) {
|
||||
final int childrenSize = b.getChildren().size();
|
||||
double ratio = (childrenSize == 0 ? 0 : (double) a.getChildren().size() / childrenSize);
|
||||
final double kidRatio = Math.min(ratio, 1 / ratio);
|
||||
final int parentsSize = b.getParents().size();
|
||||
ratio = (parentsSize == 0 ? 0 : (double) a.getParents().size() / parentsSize);
|
||||
final double rentRatio = Math.min(ratio, 1 / ratio);
|
||||
|
||||
ratio = (double) a.getLen() / b.getLen();
|
||||
final double lenRatio = Math.min(ratio, 1 / ratio);
|
||||
return 0.25 * conf * lenRatio * (1 + kidRatio) * (1 + rentRatio);
|
||||
}
|
||||
|
||||
/**
|
||||
* Find the first PotentialPair where there is no conflict.
|
||||
* Sort the pairs based on score, and divide them into ranges of equal score.
|
||||
* Look for the first PotentialPair whose source and dest are not involved with any
|
||||
* other pair within an equal score range.
|
||||
* @param potentialPairs is the array of pairs
|
||||
* @return the first (highest scoring) unconflicted pair (or null)
|
||||
*/
|
||||
private static PotentialPair findFirstUnconflictedPair(
|
||||
ArrayList<PotentialPair> potentialPairs) {
|
||||
Collections.sort(potentialPairs); // Sort pairs based on score
|
||||
int lastIndex = potentialPairs.size() - 1;
|
||||
while (lastIndex >= 0) {
|
||||
double score = potentialPairs.get(lastIndex).getScore();
|
||||
int firstIndex = lastIndex - 1;
|
||||
while (firstIndex >= 0 && potentialPairs.get(firstIndex).getScore() >= score) {
|
||||
firstIndex -= 1;
|
||||
}
|
||||
PotentialPair bestPair = unconflictedPair(potentialPairs, firstIndex + 1, lastIndex);
|
||||
if (bestPair != null) {
|
||||
return bestPair;
|
||||
}
|
||||
lastIndex = firstIndex;
|
||||
}
|
||||
|
||||
return PotentialPair.EMPTY_PAIR; // No match found. We get here in the case of conflict-only matrices.
|
||||
}
|
||||
|
||||
/**
|
||||
* Given matching neighborhoods, look at "matrix" of scores for pairs across them.
|
||||
* Return the most likely pair.
|
||||
* @param aNeighbors is the first neighborhood
|
||||
* @param bNeighbors is the second neighborhood
|
||||
* @param confResult is the confidence score associated with the accepted match
|
||||
* @return the most likely pair as a PotentialPair
|
||||
*/
|
||||
private PotentialPair calculateBestNeighbor(Set<FunctionNode> aNeighbors,
|
||||
Set<FunctionNode> bNeighbors, double confResult) {
|
||||
ArrayList<PotentialPair> potentialPairs = new ArrayList<PotentialPair>();
|
||||
PotentialPair bestPair = PotentialPair.EMPTY_PAIR;
|
||||
int bestCount = 0; // Number of pairs with the same (currently) best score
|
||||
|
||||
// CRITICAL LOOP
|
||||
for (FunctionNode relative : aNeighbors) { // For every function in the source neighborhood
|
||||
if (relative.isAcceptedMatch()) {
|
||||
continue;
|
||||
}
|
||||
double bestAdjustedScore = 0; // Best score you're seeing for just this relative.
|
||||
double relSum = 0; // Sum of relative's scores for associates...for normalizing.
|
||||
double bestOriginalScore = 0; // So that we can recover the entry without computation.
|
||||
FunctionNode bestRelAssoc = null; // The highest scoring associate
|
||||
// CRITICAL INNER LOOP
|
||||
Iterator<Entry<FunctionNode, FunctionPair>> iter = relative.getAssociateIterator();
|
||||
while (iter.hasNext()) { // Run through every putative match to -relative-
|
||||
Entry<FunctionNode, FunctionPair> entry = iter.next();
|
||||
final FunctionNode associate = entry.getKey();
|
||||
final double value = entry.getValue().getConfResult();
|
||||
if (bNeighbors.contains(associate)) { // Does the dest side of the match lie in dest neighborhood
|
||||
double entryAdjusted = adjustConfidenceScore(value, relative, associate);
|
||||
relSum += entryAdjusted; // Keep track of score sum for normalization
|
||||
if (entryAdjusted >= bestAdjustedScore) { // Keep track of highest scoring pair
|
||||
bestAdjustedScore = entryAdjusted;
|
||||
bestRelAssoc = associate;
|
||||
bestOriginalScore = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (relSum > 0) {
|
||||
// Compute a final score that takes into account the dimensions of the neighborhoods
|
||||
// and scores of other potential pairs across the neighborhoods
|
||||
double tempMax = bNeighbors.size() * (bestOriginalScore + confResult) *
|
||||
bestAdjustedScore / relSum;
|
||||
|
||||
PotentialPair newPair = new PotentialPair(relative, bestRelAssoc, tempMax);
|
||||
potentialPairs.add(newPair);
|
||||
if (tempMax > bestPair.getScore()) { // We have seen a new maximum.
|
||||
bestPair = newPair; // Keep track of the new best
|
||||
bestCount = 1; // Restart the counter
|
||||
}
|
||||
else if (tempMax == bestPair.getScore()) { // A tie score with the current best
|
||||
bestCount += 1;
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
if (bestCount == 0 || bestPair.getScore() == 0) {
|
||||
return PotentialPair.EMPTY_PAIR; // The default null object passed for nothing found.
|
||||
}
|
||||
|
||||
if (bestCount == 1) { // There is a unique best entry. Use it.
|
||||
return bestPair;
|
||||
}
|
||||
|
||||
return findFirstUnconflictedPair(potentialPairs); // The best pair is a tie, we need to go deeper into the list
|
||||
}
|
||||
}
|
|
@ -0,0 +1,119 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.util.*;
|
||||
|
||||
import generic.lsh.*;
|
||||
import generic.lsh.vector.HashEntry;
|
||||
import ghidra.util.exception.CancelledException;
|
||||
import ghidra.util.task.TaskMonitor;
|
||||
|
||||
/**
|
||||
* Container for FunctionNodes so that nodes that are "near" each other
|
||||
* (meaning the nodes' feature vectors have high cosine-similarity)
|
||||
* can be discovered. As nodes are added, they are distributed across
|
||||
* bins, where similar nodes tend to be placed into the same bins.
|
||||
*/
|
||||
class BinningSystem {
|
||||
private final int L; // Number of distinct binnings
|
||||
|
||||
private int[][] partitionIdentities;
|
||||
private TreeMap<Integer, TreeSet<FunctionNode>>[] binSys;
|
||||
|
||||
/**
|
||||
* Construct a container that holds the FunctionNodes. If model is not null, then the FunctionNodes will be indexed
|
||||
* @param model is the particular configuration model to use for this
|
||||
*/
|
||||
@SuppressWarnings("unchecked")
|
||||
public BinningSystem(LSHMemoryModel model) {
|
||||
int k = model.getK(); // k = #of hyperplanes comprising the each binning.
|
||||
L = KandL.memoryModelToL(model);
|
||||
this.partitionIdentities = new int[L][];
|
||||
this.binSys = new TreeMap[L]; // A system of L binnings.
|
||||
Random random = new Random(23);
|
||||
for (int ii = 0; ii < L; ++ii) {
|
||||
this.partitionIdentities[ii] = new int[k];
|
||||
for (int jj = 0; jj < k; ++jj) {
|
||||
this.partitionIdentities[ii][jj] = random.nextInt();
|
||||
}
|
||||
this.binSys[ii] = new TreeMap<Integer, TreeSet<FunctionNode>>();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Add a list of {@link FunctionNode} objects into the bins
|
||||
* @param iter is an iterator over the raw FunctionNodes to add
|
||||
* @param monitor is the TaskMonitor
|
||||
* @throws CancelledException for user cancellation of the correlator
|
||||
*/
|
||||
public void add(Iterator<FunctionNode> iter, TaskMonitor monitor) throws CancelledException {
|
||||
|
||||
while (iter.hasNext()) {
|
||||
FunctionNode node = iter.next();
|
||||
monitor.checkCancelled();
|
||||
monitor.incrementProgress(1);
|
||||
if (node.getVector() == null) {
|
||||
continue;
|
||||
}
|
||||
int[] features = getBinIds(node);
|
||||
for (int ii = 0; ii < features.length; ++ii) {
|
||||
TreeSet<FunctionNode> list = binSys[ii].get(features[ii]);
|
||||
if (list == null) {
|
||||
list = new TreeSet<FunctionNode>();
|
||||
binSys[ii].put(features[ii], list);
|
||||
}
|
||||
list.add(node);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns the union of all the bins containing the exemplar FunctionNode.
|
||||
* These nodes are likely to similar to the exemplar, but need secondary testing.
|
||||
* @param node is the exemplar
|
||||
* @return a set of FunctionNodes
|
||||
*/
|
||||
public Set<FunctionNode> lookup(FunctionNode node) {
|
||||
TreeSet<FunctionNode> result = new TreeSet<FunctionNode>();
|
||||
int[] features = getBinIds(node);
|
||||
for (int ii = 0; ii < features.length; ++ii) {
|
||||
TreeSet<FunctionNode> list = binSys[ii].get(features[ii]);
|
||||
if (list != null) {
|
||||
result.addAll(list);
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* Given a node, calculate the binId for each binning in this system
|
||||
* @param node is the FunctionNode to label
|
||||
* @return an array of ids
|
||||
*/
|
||||
private int[] getBinIds(FunctionNode node) {
|
||||
if (node.getVector() == null) {
|
||||
return null;
|
||||
}
|
||||
int[] result = new int[L];
|
||||
HashEntry[] entries = node.getVector().getEntries();
|
||||
for (int ii = 0; ii < L; ++ii) {
|
||||
int hash = Partition.hash(partitionIdentities[ii], entries);
|
||||
result[ii] = hash;
|
||||
}
|
||||
return result;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,200 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.util.*;
|
||||
import java.util.Map.Entry;
|
||||
|
||||
import generic.lsh.vector.LSHVector;
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.listing.Function;
|
||||
|
||||
/**
|
||||
* Information about a single function the correlator is attempting to match
|
||||
*/
|
||||
public class FunctionNode implements Comparable<FunctionNode> {
|
||||
|
||||
private final Address addr; // Address of the function represented, also unique identifier
|
||||
private final String name; // Name of the function this node represents.
|
||||
private final LSHVector vec; // Feature vector
|
||||
private ArrayList<Address> callAddresses; // Addresses of functions this node calls.
|
||||
private final Set<FunctionNode> children; // Who do I call in the call graph?
|
||||
private final Set<FunctionNode> parents; // Who calls me in the call graph?
|
||||
private Map<FunctionNode, FunctionPair> associates; // Potential matches on the other side? And what's our conf?
|
||||
private final int len; // Number of addresses in the body of this function
|
||||
private boolean acceptedMatch; // Has this node been formally matched with something
|
||||
|
||||
/**
|
||||
* Allocate a container for FunctionNodes as needed by the NeighborGenerators. These are generally small sets
|
||||
* where we need to check containment constantly.
|
||||
* @return the container
|
||||
*/
|
||||
public static Set<FunctionNode> neigborhoodAllocate() {
|
||||
return new HashSet<FunctionNode>();
|
||||
}
|
||||
|
||||
public FunctionNode(Function function, LSHVector vector, ArrayList<Address> callAddresses) {
|
||||
this.addr = function.getEntryPoint();
|
||||
this.name = function.getName();
|
||||
this.vec = vector;
|
||||
this.callAddresses = callAddresses; //It will take a second pass through the data to figure out how the call graph fits together.
|
||||
this.associates = new HashMap<FunctionNode, FunctionPair>();
|
||||
this.children = neigborhoodAllocate();
|
||||
this.parents = neigborhoodAllocate();
|
||||
int val = (int) function.getBody().getNumAddresses();
|
||||
this.len = (val == 0) ? 1 : val; // Guarantee a non-zero length
|
||||
this.acceptedMatch = false;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
return ((addr == null) ? 0 : addr.hashCode());
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object obj) {
|
||||
if (this == obj) {
|
||||
return true;
|
||||
}
|
||||
if (obj == null) {
|
||||
return false;
|
||||
}
|
||||
if (getClass() != obj.getClass()) {
|
||||
return false;
|
||||
}
|
||||
FunctionNode other = (FunctionNode) obj;
|
||||
if (addr == null) {
|
||||
if (other.addr != null) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
else if (!addr.equals(other.addr)) {
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int compareTo(FunctionNode other) {
|
||||
return addr.compareTo(other.addr); // Compare by address
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return name;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the Address of the entry point of the Function represented by this node
|
||||
*/
|
||||
public Address getAddress() {
|
||||
return addr;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the feature vector associated with this node (function)
|
||||
*/
|
||||
public LSHVector getVector() {
|
||||
return vec;
|
||||
}
|
||||
|
||||
/**
|
||||
* Grab the raw call addresses, releasing the memory in the process
|
||||
* @return the list of addresses
|
||||
*/
|
||||
public List<Address> releaseCallAddresses() {
|
||||
List<Address> res = callAddresses;
|
||||
callAddresses = null; // Release our reference to addresses
|
||||
return res;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the set of functions (FunctionNodes) called by this function
|
||||
*/
|
||||
public Set<FunctionNode> getChildren() {
|
||||
return children;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the set of functions (FunctionNodes) that call this function
|
||||
*/
|
||||
public Set<FunctionNode> getParents() {
|
||||
return parents;
|
||||
}
|
||||
|
||||
/**
|
||||
* Add a (potential) match for this node. The match
|
||||
* is stored with a FunctionPair object holding similarity information
|
||||
* @param other is the potentially matching FunctionNode
|
||||
* @param pair is the FunctionPair describing the similarity
|
||||
*/
|
||||
public void addAssociate(FunctionNode other, FunctionPair pair) {
|
||||
associates.put(other, pair);
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove what was previously considered a potential match.
|
||||
* @param other is the matching FunctionNode
|
||||
*/
|
||||
public void removeAssociate(FunctionNode other) {
|
||||
associates.remove(other);
|
||||
}
|
||||
|
||||
/**
|
||||
* Clear all potential matches.
|
||||
*/
|
||||
public void clearAssociates() {
|
||||
associates.clear();
|
||||
}
|
||||
|
||||
/**
|
||||
* @return an iterator over all potential matches for this node
|
||||
*/
|
||||
public Iterator<Entry<FunctionNode, FunctionPair>> getAssociateIterator() {
|
||||
return associates.entrySet().iterator();
|
||||
}
|
||||
|
||||
/**
|
||||
* If -other- is a potential match, return the FunctionPair describing the similarity
|
||||
* @param other is the possible potential match
|
||||
* @return the FunctionPair describing the match or null, if -other- is not a potential match
|
||||
*/
|
||||
public FunctionPair findEdge(FunctionNode other) {
|
||||
return associates.get(other);
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the number of addresses in the function body represented by this node
|
||||
*/
|
||||
public int getLen() {
|
||||
return len;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return true if this node has been formally matched by the correlator
|
||||
*/
|
||||
public boolean isAcceptedMatch() {
|
||||
return acceptedMatch;
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark that this node has been matched (not matched) by the correlator
|
||||
* @param used is true if this node has been matched
|
||||
*/
|
||||
public void setAcceptedMatch(boolean used) {
|
||||
this.acceptedMatch = used;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,101 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.util.*;
|
||||
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.listing.*;
|
||||
|
||||
/**
|
||||
* Container of FunctionNodes corresponding to functions in a single Program
|
||||
*/
|
||||
public class FunctionNodeContainer {
|
||||
private Program program; // Program containing all the functions
|
||||
private Map<Address, FunctionNode> addrToNode; // Map from Address to FunctionNode representing the function
|
||||
|
||||
public FunctionNodeContainer(Program program, List<FunctionNode> nodeList) {
|
||||
this.program = program;
|
||||
addrToNode = new TreeMap<Address, FunctionNode>();
|
||||
for (FunctionNode node : nodeList) {
|
||||
addrToNode.put(node.getAddress(), node);
|
||||
}
|
||||
generateCallGraph();
|
||||
}
|
||||
|
||||
public Program getProgram() {
|
||||
return program;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the FunctionNode associated with a specific address
|
||||
* @param addr the Address to search for
|
||||
* @return the corresponding FunctionNode (or null if addr maps to nothing)
|
||||
*/
|
||||
public FunctionNode get(Address addr) {
|
||||
return addrToNode.get(addr);
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the number of FunctionNodes held in this container
|
||||
*/
|
||||
public int size() {
|
||||
return addrToNode.size();
|
||||
}
|
||||
|
||||
/**
|
||||
* @return an iterator over all FunctionNodes in this container, in address order
|
||||
*/
|
||||
public Iterator<FunctionNode> iterator() {
|
||||
return addrToNode.values().iterator();
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate program call-graph in terms of FunctionNodes
|
||||
* Uses the call address attached to each raw FunctionNode
|
||||
* Once the xrefs are built, the original call address arrays are released
|
||||
*/
|
||||
private void generateCallGraph() {
|
||||
FunctionManager mgr = program.getFunctionManager();
|
||||
for (FunctionNode node : addrToNode.values()) { //Addresses are associated to nodes.
|
||||
if (node != null) {
|
||||
List<Address> callAddresses = node.releaseCallAddresses();
|
||||
for (Address addr : callAddresses) {
|
||||
FunctionNode kid;
|
||||
for (;;) {
|
||||
kid = addrToNode.get(addr); //These nodes are the vertices in the call graph.
|
||||
if (kid != null) {
|
||||
break;
|
||||
}
|
||||
Function f = mgr.getFunctionAt(addr); // If addr does not link to a node, it is most likely a thunk
|
||||
if (f == null) {
|
||||
break;
|
||||
}
|
||||
if (!f.isThunk()) {
|
||||
break;
|
||||
}
|
||||
addr = f.getThunkedFunction(false).getEntryPoint(); // Replace with address of thunked function
|
||||
}
|
||||
if (kid != null) {
|
||||
node.getChildren().add(kid);
|
||||
kid.getParents().add(node);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,133 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import ghidra.feature.vt.api.main.*;
|
||||
|
||||
/**
|
||||
* A possible match between source and destination.
|
||||
*/
|
||||
public class FunctionPair {
|
||||
|
||||
private FunctionNode sourceNode; // Function from the source program
|
||||
private FunctionNode destNode; // Function from the destination program
|
||||
private double simResult; // Similarity of the pair (0.0 to 1.0)
|
||||
private double confResult; // Confidence score of the pair
|
||||
|
||||
/**
|
||||
* Constructor
|
||||
* @param source the source function
|
||||
* @param dest the destination function
|
||||
* @param simRes the computed similarity score
|
||||
* @param confRes the computed confidence score
|
||||
*/
|
||||
public FunctionPair(FunctionNode source, FunctionNode dest, double simRes, double confRes) {
|
||||
this.sourceNode = source;
|
||||
this.destNode = dest;
|
||||
this.simResult = simRes;
|
||||
this.confResult = confRes;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode() {
|
||||
final int prime = 31;
|
||||
int result = 1;
|
||||
result = prime * result + ((destNode == null) ? 0 : destNode.hashCode());
|
||||
result = prime * result + ((sourceNode == null) ? 0 : sourceNode.hashCode());
|
||||
return result;
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object obj) {
|
||||
if (this == obj) {
|
||||
return true;
|
||||
}
|
||||
if (obj == null) {
|
||||
return false;
|
||||
}
|
||||
if (getClass() != obj.getClass()) {
|
||||
return false;
|
||||
}
|
||||
FunctionPair other = (FunctionPair) obj;
|
||||
if (destNode == null) {
|
||||
if (other.destNode != null) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
else if (!destNode.equals(other.destNode)) {
|
||||
return false;
|
||||
}
|
||||
if (sourceNode == null) {
|
||||
if (other.sourceNode != null) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
else if (!sourceNode.equals(other.sourceNode)) {
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute the formal Version Tracking match record corresponding to this pair
|
||||
* @param matchSet is the match set the record should be added to
|
||||
* @return the match record
|
||||
*/
|
||||
public VTMatchInfo getMatch(VTMatchSet matchSet) {
|
||||
VTMatchInfo result = new VTMatchInfo(matchSet);
|
||||
result.setSimilarityScore(new VTScore(simResult));
|
||||
result.setConfidenceScore(new VTScore(confResult));
|
||||
result.setAssociationType(VTAssociationType.FUNCTION);
|
||||
result.setSourceAddress(sourceNode.getAddress());
|
||||
result.setDestinationAddress(destNode.getAddress());
|
||||
result.setSourceLength(sourceNode.getLen());
|
||||
result.setDestinationLength(destNode.getLen());
|
||||
return result;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString() {
|
||||
return sourceNode.toString() + "," + destNode.toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* @return info about the source function
|
||||
*/
|
||||
public FunctionNode getSourceNode() {
|
||||
return sourceNode;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return info about the destination function
|
||||
*/
|
||||
public FunctionNode getDestNode() {
|
||||
return destNode;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the similarity score of the pair
|
||||
*/
|
||||
public double getSimResult() {
|
||||
return simResult;
|
||||
}
|
||||
|
||||
/**
|
||||
* @return the confidence score of the pair
|
||||
*/
|
||||
public double getConfResult() {
|
||||
return confResult;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,136 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.util.Set;
|
||||
import java.util.TreeMap;
|
||||
|
||||
import generic.lsh.vector.LSHVectorFactory;
|
||||
import ghidra.program.model.listing.Function;
|
||||
import ghidra.program.model.symbol.*;
|
||||
|
||||
/**
|
||||
* A neighborhood generator that, for a given function, generates all functions
|
||||
* in the same namespace. For efficiency, it caches the namespace sets it generates.
|
||||
*/
|
||||
public class NamespaceNeighborhood extends NeighborGenerator {
|
||||
|
||||
private FunctionNodeContainer sourceNodes; // Reference to global set of source functions
|
||||
private FunctionNodeContainer destNodes; // Reference to global set of destination functions
|
||||
private TreeMap<Long, Set<FunctionNode>> sourceSets; // Map from namespace ID to matching set of source functions
|
||||
private TreeMap<Long, Set<FunctionNode>> destSets; // Map from namespace ID to matching set of dest functions
|
||||
private TreeMap<PairLabel, NeighborhoodPair> namespacePair; // Map from pair of namespace IDs to pair of namespace sets
|
||||
private PairLabel cacheKey; // internal key for quick lookups into namespacePair map
|
||||
|
||||
private static class PairLabel implements Comparable<PairLabel> {
|
||||
public Long srcLabel;
|
||||
public Long destLabel;
|
||||
|
||||
@Override
|
||||
public int compareTo(PairLabel o) {
|
||||
int srcCmp = Long.compare(srcLabel.longValue(), o.srcLabel.longValue());
|
||||
if (srcCmp != 0) {
|
||||
return srcCmp;
|
||||
}
|
||||
return Long.compare(destLabel.longValue(), o.destLabel.longValue());
|
||||
}
|
||||
}
|
||||
|
||||
public NamespaceNeighborhood(LSHVectorFactory vectorFactory, double impThreshold,
|
||||
FunctionNodeContainer sourceNodes, FunctionNodeContainer destNodes) {
|
||||
super(vectorFactory, impThreshold);
|
||||
this.sourceNodes = sourceNodes;
|
||||
this.destNodes = destNodes;
|
||||
sourceSets = new TreeMap<Long, Set<FunctionNode>>();
|
||||
destSets = new TreeMap<Long, Set<FunctionNode>>();
|
||||
namespacePair = new TreeMap<PairLabel, NeighborhoodPair>();
|
||||
cacheKey = new PairLabel();
|
||||
}
|
||||
|
||||
private Namespace getNamespace(FunctionNode root, FunctionNodeContainer container) {
|
||||
Function function =
|
||||
container.getProgram().getFunctionManager().getFunctionAt(root.getAddress());
|
||||
if (function == null) {
|
||||
return null;
|
||||
}
|
||||
Namespace namespace = function.getParentNamespace();
|
||||
return namespace;
|
||||
}
|
||||
|
||||
private Set<FunctionNode> buildNeighborhood(Namespace namespace, Long namespaceKey,
|
||||
FunctionNodeContainer container, TreeMap<Long, Set<FunctionNode>> sets) {
|
||||
Set<FunctionNode> resultSet = sets.get(namespaceKey);
|
||||
if (resultSet == null) {
|
||||
resultSet = FunctionNode.neigborhoodAllocate();
|
||||
SymbolTable symbolTable = container.getProgram().getSymbolTable();
|
||||
SymbolIterator iter = symbolTable.getSymbols(namespace);
|
||||
while (iter.hasNext()) {
|
||||
Symbol sym = iter.next();
|
||||
if (sym.getSymbolType() != SymbolType.FUNCTION) {
|
||||
continue;
|
||||
}
|
||||
FunctionNode node = container.get(sym.getAddress());
|
||||
if (node != null) {
|
||||
resultSet.add(node);
|
||||
}
|
||||
}
|
||||
sets.put(namespaceKey, resultSet);
|
||||
}
|
||||
return resultSet;
|
||||
}
|
||||
|
||||
private NeighborhoodPair findPair(Long srcKey, Long destKey) {
|
||||
cacheKey.srcLabel = srcKey;
|
||||
cacheKey.destLabel = destKey;
|
||||
return namespacePair.get(cacheKey);
|
||||
}
|
||||
|
||||
private void cachePair(Long srcKey, Long destKey, NeighborhoodPair pair) {
|
||||
PairLabel newLabel = new PairLabel();
|
||||
newLabel.srcLabel = srcKey;
|
||||
newLabel.destLabel = destKey;
|
||||
namespacePair.put(newLabel, pair);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
Namespace srcNamespace = getNamespace(srcRoot, sourceNodes);
|
||||
Namespace destNamespace = getNamespace(destRoot, destNodes);
|
||||
if (srcNamespace == null || destNamespace == null) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
return pair; // Empty pair
|
||||
}
|
||||
Long srcNamespaceKey = srcNamespace.getID();
|
||||
Long destNamespaceKey = destNamespace.getID();
|
||||
NeighborhoodPair pair = findPair(srcNamespaceKey, destNamespaceKey);
|
||||
if (pair == null) {
|
||||
pair = new NeighborhoodPair();
|
||||
pair.srcNeighbors =
|
||||
buildNeighborhood(srcNamespace, srcNamespaceKey, sourceNodes, sourceSets);
|
||||
pair.destNeighbors =
|
||||
buildNeighborhood(destNamespace, destNamespaceKey, destNodes, destSets);
|
||||
cachePair(srcNamespaceKey, destNamespaceKey, pair);
|
||||
}
|
||||
if (!pair.isFilledOut) {
|
||||
if (fillOutPairs(pair, 10000)) {
|
||||
pair.isFilledOut = true;
|
||||
}
|
||||
}
|
||||
return pair;
|
||||
}
|
||||
}
|
|
@ -0,0 +1,287 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import java.util.ArrayList;
|
||||
import java.util.Set;
|
||||
|
||||
import generic.lsh.vector.*;
|
||||
|
||||
/**
|
||||
* Class(es) for constructing a "neighborhood" of functions around a function
|
||||
* that we know has a match. Comparing across neighborhoods provides a large
|
||||
* cut-down in both search time and uncertainty when trying to find additional matches.
|
||||
*/
|
||||
public abstract class NeighborGenerator {
|
||||
|
||||
public static final int RELATIVE_COMPARES = 25; // Maximum number of extra compares between "relative" sets
|
||||
private double impThreshold; // Confidence threshold for extending to additional matches
|
||||
private LSHVectorFactory vectorFactory;
|
||||
|
||||
public static class NeighborhoodPair {
|
||||
public Set<FunctionNode> srcNeighbors;
|
||||
public Set<FunctionNode> destNeighbors;
|
||||
public boolean isFilledOut = false;
|
||||
}
|
||||
|
||||
public NeighborGenerator(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
this.vectorFactory = vectorFactory;
|
||||
this.impThreshold = impThreshold;
|
||||
}
|
||||
|
||||
/**
|
||||
* Given roots from the source program and the destination program,
|
||||
* generate a neighborhood of functions related to each root.
|
||||
* @param srcRoot is the root from the source program
|
||||
* @param destRoot is the root from the destination program
|
||||
* @return a pair of "neighborhoods" as a set of FunctionNodes
|
||||
*/
|
||||
public abstract NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot);
|
||||
|
||||
/**
|
||||
* Do the feature vector comparison of every source to every destination and create
|
||||
* new putative matches (associates) if the comparison score exceeds {@link #impThreshold}
|
||||
* @param unmatchedSource is the list of sources
|
||||
* @param unmatchedDest is the list of destinations
|
||||
*/
|
||||
private void searchForNewMatches(ArrayList<FunctionNode> unmatchedSource,
|
||||
ArrayList<FunctionNode> unmatchedDest) {
|
||||
VectorCompare veccompare = new VectorCompare();
|
||||
for (FunctionNode src : unmatchedSource) {
|
||||
LSHVector srcvec = src.getVector();
|
||||
for (FunctionNode dst : unmatchedDest) {
|
||||
if (src.findEdge(dst) != null) {
|
||||
continue; // This pair has already been compared
|
||||
}
|
||||
// Feature vector computations
|
||||
double similarity = srcvec.compare(dst.getVector(), veccompare);
|
||||
double confidence = vectorFactory.calculateSignificance(veccompare);
|
||||
if (confidence < impThreshold) {
|
||||
continue;
|
||||
}
|
||||
FunctionPair newPair = new FunctionPair(src, dst, similarity, confidence);
|
||||
src.addAssociate(dst, newPair);
|
||||
dst.addAssociate(src, newPair);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* If nodes haven't been compared before, compare them and add an associate if it passes threshold
|
||||
* @param pair is the two sets of nodes that we are comparing between
|
||||
* @param maxCompares is the maximum number of comparisons to perform
|
||||
* @return true is comparisons were actually performed
|
||||
*/
|
||||
protected boolean fillOutPairs(NeighborhoodPair pair, int maxCompares) {
|
||||
ArrayList<FunctionNode> unmatchedSource = new ArrayList<FunctionNode>();
|
||||
ArrayList<FunctionNode> unmatchedDest = null;
|
||||
|
||||
for (FunctionNode src : pair.srcNeighbors) {
|
||||
if (src.isAcceptedMatch()) {
|
||||
continue;
|
||||
}
|
||||
if (src.getVector() == null) {
|
||||
continue;
|
||||
}
|
||||
unmatchedSource.add(src);
|
||||
}
|
||||
|
||||
if (unmatchedSource.isEmpty()) {
|
||||
return false;
|
||||
}
|
||||
if (unmatchedSource.size() > maxCompares) {
|
||||
return false;
|
||||
}
|
||||
unmatchedDest = new ArrayList<FunctionNode>();
|
||||
|
||||
for (FunctionNode dst : pair.destNeighbors) {
|
||||
if (dst.isAcceptedMatch()) {
|
||||
continue;
|
||||
}
|
||||
if (dst.getVector() == null) {
|
||||
continue;
|
||||
}
|
||||
unmatchedDest.add(dst);
|
||||
}
|
||||
if (unmatchedDest.isEmpty()) {
|
||||
return false;
|
||||
}
|
||||
if (unmatchedSource.size() * unmatchedDest.size() > maxCompares) {
|
||||
return false;
|
||||
}
|
||||
|
||||
searchForNewMatches(unmatchedSource, unmatchedDest);
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Parents of -root-
|
||||
*/
|
||||
public static class Parents extends NeighborGenerator {
|
||||
|
||||
public Parents(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
super(vectorFactory, impThreshold);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
pair.srcNeighbors = srcRoot.getParents();
|
||||
pair.destNeighbors = destRoot.getParents();
|
||||
fillOutPairs(pair, RELATIVE_COMPARES);
|
||||
return pair;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Children of -root-
|
||||
*/
|
||||
public static class Children extends NeighborGenerator {
|
||||
|
||||
public Children(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
super(vectorFactory, impThreshold);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
pair.srcNeighbors = srcRoot.getChildren();
|
||||
pair.destNeighbors = destRoot.getChildren();
|
||||
fillOutPairs(pair, RELATIVE_COMPARES);
|
||||
return pair;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Grand parents of -root-
|
||||
*/
|
||||
public static class GrandParents extends NeighborGenerator {
|
||||
|
||||
public GrandParents(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
super(vectorFactory, impThreshold);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
Set<FunctionNode> tempRels = srcRoot.getParents();
|
||||
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.srcNeighbors.addAll(rel.getParents());
|
||||
}
|
||||
pair.srcNeighbors.remove(srcRoot);
|
||||
|
||||
tempRels = destRoot.getParents();
|
||||
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.destNeighbors.addAll(rel.getParents());
|
||||
}
|
||||
pair.destNeighbors.remove(destRoot);
|
||||
fillOutPairs(pair, RELATIVE_COMPARES);
|
||||
return pair;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Grandchildren of -root-
|
||||
*/
|
||||
public static class GrandChildren extends NeighborGenerator {
|
||||
|
||||
public GrandChildren(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
super(vectorFactory, impThreshold);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
Set<FunctionNode> tempRels = srcRoot.getChildren();
|
||||
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.srcNeighbors.addAll(rel.getChildren());
|
||||
}
|
||||
pair.srcNeighbors.remove(srcRoot);
|
||||
|
||||
tempRels = destRoot.getChildren();
|
||||
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.destNeighbors.addAll(rel.getChildren());
|
||||
}
|
||||
pair.destNeighbors.remove(destRoot);
|
||||
fillOutPairs(pair, RELATIVE_COMPARES);
|
||||
return pair;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Functions that share a parent with -root-
|
||||
*/
|
||||
public static class Siblings extends NeighborGenerator {
|
||||
|
||||
public Siblings(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
super(vectorFactory, impThreshold);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
Set<FunctionNode> tempRels = srcRoot.getParents();
|
||||
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.srcNeighbors.addAll(rel.getChildren());
|
||||
}
|
||||
pair.srcNeighbors.remove(srcRoot);
|
||||
|
||||
tempRels = destRoot.getParents();
|
||||
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.destNeighbors.addAll(rel.getChildren());
|
||||
}
|
||||
pair.destNeighbors.remove(destRoot);
|
||||
fillOutPairs(pair, RELATIVE_COMPARES);
|
||||
return pair;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Functions that share a child with -root-
|
||||
*/
|
||||
public static class Spouses extends NeighborGenerator {
|
||||
|
||||
public Spouses(LSHVectorFactory vectorFactory, double impThreshold) {
|
||||
super(vectorFactory, impThreshold);
|
||||
}
|
||||
|
||||
@Override
|
||||
public NeighborhoodPair generate(FunctionNode srcRoot, FunctionNode destRoot) {
|
||||
NeighborhoodPair pair = new NeighborhoodPair();
|
||||
Set<FunctionNode> tempRels = srcRoot.getChildren();
|
||||
pair.srcNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.srcNeighbors.addAll(rel.getParents());
|
||||
}
|
||||
pair.srcNeighbors.remove(srcRoot);
|
||||
|
||||
tempRels = destRoot.getChildren();
|
||||
pair.destNeighbors = FunctionNode.neigborhoodAllocate();
|
||||
for (FunctionNode rel : tempRels) {
|
||||
pair.destNeighbors.addAll(rel.getParents());
|
||||
}
|
||||
pair.destNeighbors.remove(destRoot);
|
||||
fillOutPairs(pair, RELATIVE_COMPARES);
|
||||
return pair;
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,66 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
/**
|
||||
* Given a matching FunctionPair, this object represents a different
|
||||
* potential match taken from neighborhoods of the match endpoints.
|
||||
*/
|
||||
public class PotentialPair implements Comparable<PotentialPair> {
|
||||
private FunctionPair originBridge; // Accepted match that induced this potential match
|
||||
private FunctionNode fromNode; // Source node of potential match
|
||||
private FunctionNode toNode; // Destination node of potential match
|
||||
private double score; // implication score associated with potential match
|
||||
|
||||
public static final PotentialPair EMPTY_PAIR = new PotentialPair(null, null, 0.0);
|
||||
|
||||
public PotentialPair(FunctionNode src, FunctionNode dest, double sc) {
|
||||
fromNode = src;
|
||||
toNode = dest;
|
||||
score = sc;
|
||||
}
|
||||
|
||||
public double getScore() {
|
||||
return score;
|
||||
}
|
||||
|
||||
public FunctionNode getSource() {
|
||||
return fromNode;
|
||||
}
|
||||
|
||||
public FunctionNode getDestination() {
|
||||
return toNode;
|
||||
}
|
||||
|
||||
public FunctionPair getOrigin() {
|
||||
return originBridge;
|
||||
}
|
||||
|
||||
public void setOrigin(FunctionPair pair) {
|
||||
originBridge = pair;
|
||||
}
|
||||
|
||||
public void swap() {
|
||||
FunctionNode tmp = fromNode;
|
||||
fromNode = toNode;
|
||||
toNode = tmp;
|
||||
}
|
||||
|
||||
@Override
|
||||
public int compareTo(PotentialPair o) {
|
||||
return Double.compare(score, o.score);
|
||||
}
|
||||
}
|
|
@ -0,0 +1,35 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.gui.validator;
|
||||
|
||||
import ghidra.app.plugin.core.analysis.validator.PostAnalysisValidator;
|
||||
import ghidra.app.plugin.core.decompiler.validator.DecompilerParameterIDValidator;
|
||||
import ghidra.feature.vt.api.main.VTSession;
|
||||
import ghidra.program.model.listing.Program;
|
||||
|
||||
public class DecompilerParameterIDVTPreconditionValidator extends
|
||||
VTPostAnalysisPreconditionValidatorAdaptor {
|
||||
|
||||
public DecompilerParameterIDVTPreconditionValidator(Program sourceProgram,
|
||||
Program destinationProgram, VTSession existingResults) {
|
||||
super(sourceProgram, destinationProgram, existingResults);
|
||||
}
|
||||
|
||||
@Override
|
||||
protected PostAnalysisValidator createPostAnalysisPreconditionValidator(Program program) {
|
||||
return new DecompilerParameterIDValidator(program);
|
||||
}
|
||||
}
|
|
@ -0,0 +1,50 @@
|
|||
/* ###
|
||||
* IP: GHIDRA
|
||||
* EXCLUDE: YES
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
package ghidra.feature.vt.api;
|
||||
|
||||
import ghidra.program.model.address.Address;
|
||||
import ghidra.program.model.address.AddressFactory;
|
||||
import ghidra.program.model.address.AddressSet;
|
||||
import ghidra.program.model.address.AddressSetView;
|
||||
import ghidra.program.model.address.AddressSpace;
|
||||
|
||||
import org.junit.Test;
|
||||
|
||||
public class BSimSelfSimilarCorrelatorTest extends AbstractSelfSimilarCorrelatorTest {
|
||||
public BSimSelfSimilarCorrelatorTest( ) {
|
||||
super();
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testFlow() throws Exception {
|
||||
exerciseFunctionsForFactory(new BSimProgramCorrelatorFactory(),
|
||||
// with default settings these three functions won't get matched
|
||||
getSourceMinus(0x010031ee, 0x01003ac0, 0x01004c1d));
|
||||
}
|
||||
|
||||
private AddressSetView getSourceMinus(long... addresses) {
|
||||
AddressFactory addressFactory = sourceProgram.getAddressFactory();
|
||||
AddressSpace addressSpace = addressFactory.getDefaultAddressSpace();
|
||||
AddressSet set =
|
||||
new AddressSet(sourceProgram.getMemory().getInitializedAddressSet());
|
||||
for (long l : addresses) {
|
||||
Address address = addressSpace.getAddress(l);
|
||||
set = set.subtract(new AddressSet(address, address));
|
||||
}
|
||||
return set;
|
||||
}
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue