GP-5018: Some updated PyGhidra docs

This commit is contained in:
Ryan Kurtz 2024-12-18 09:11:17 -05:00
parent 7fbf64ea70
commit 66a43cd6ed
2 changed files with 223 additions and 87 deletions

View file

@ -1 +1,11 @@
# PyGhidra # PyGhidra
This module provides the following capabilities:
* The [PyGhidra Python library](src/main/py/README.md) and its dependencies.
* A [Plugin](src/main/java/ghidra/pyghidra/PyGhidraPlugin.java) that provides a CPython interpreter.
* A [ScriptProvider](src/main/java/ghidra/pyghidra/PyGhidraScriptProvider.java) capable of running
GhidraScripts written in native CPython 3.
* An [interactive python script](support/pyghidra_launcher.py) that Ghidra uses to install
and launch PyGhidra. This script handles
[virtual environments](https://docs.python.org/3/tutorial/venv.html) and
[externally managed environments](https://packaging.python.org/en/latest/specifications/externally-managed-environments/).

View file

@ -1,22 +1,59 @@
# PyGhidra # PyGhidra
PyGhidra is a Python library that provides direct access to the Ghidra API within a native CPython interpreter using [jpype](https://jpype.readthedocs.io/en/latest). As well, PyGhidra contains some conveniences for setting up analysis on a given sample and running a Ghidra script locally. It also contains a Ghidra plugin to allow the use of CPython from the Ghidra user interface. The PyGhidra Python library, originally developed by the
[Department of Defense Cyber Crime Center (DC3)](https://www.dc3.mil) under the name "Pyhidra", is a
Python library that provides direct access to the Ghidra API within a native CPython 3 interpreter
using [JPype](https://jpype.readthedocs.io/en/latest). PyGhidra contains some conveniences for
setting up analysis on a given sample and running a Ghidra script locally. It also contains a Ghidra
plugin to allow the use of CPython 3 from the Ghidra GUI.
PyGhidra was initially developed for use with Dragodis and is designed to be installable without requiring Java or Ghidra. This allows other Python projects ## Installation and Setup
have PyGhidra as a dependency and provide optional Ghidra functionality without requiring all users to install Java and Ghidra. It is recommended to recommend that users set the `GHIDRA_INSTALL_DIR` environment variable to simplify locating Ghidra. Ghidra provides an out-of-the box integraton with the PyGhidra Python library which makes
installation and usage fairly straighforward. This enables the Ghidra GUI and headless Ghidra to run
GhidraScript's written in native CPython 3, as well as interact with the Ghidra GUI through a
built-in REPL. To launch Ghidra in PyGhidra-mode, see Ghidra's latest
[Installation Guide](https://github.com/NationalSecurityAgency/ghidra/blob/master/GhidraDocs/InstallationGuide.md#pyghidra-mode).
It is also possible (and encouraged!) to use PyGhidra as a standalone Python library for usage
in reverse engineering workflows where Ghidra may be one of many components involved. The following
instructions in this document focus on this type of usage.
## Usage To install the PyGhidra Python library:
1. Download and install
[Ghidra 11.3 or later](https://github.com/NationalSecurityAgency/ghidra/releases) to a desired
location.
2. Set the `GHIDRA_INSTALL_DIR` environment variable to point to the directory where Ghidra is
installed.
3. Install PyGhidra:
* Online: `pip install pyghidra`
* Offline: `python3 -m pip install --no-index -f
<GhidraInstallDir>/Ghidra/Features/PyGhidra/pypkg/dist pyghidra`
## API
The current version of PyGhidra inherits an API from the original "Pyhidra" project that provides an
excellent starting point for interacting with a Ghidra installation. __NOTE:__ These functions are
subject to change in the future as more thought and feedback is collected on PyGhidra's role in the
greater Ghidra ecosystem:
### Raw Connection ### pyghidra.start()
To get a raw connection to Ghidra use the `start()` function. This will setup a JPype connection and
initialize Ghidra in headless mode, which will allow you to directly import `ghidra` and `java`.
To get a raw connection to Ghidra use the `start()` function. __NOTE:__ No projects or programs get setup in this mode.
This will setup a Jpype connection and initialize Ghidra in headless mode,
which will allow you to directly import `ghidra` and `java`.
*NOTE: No projects or programs get setup in this mode.* ```python
def start(verbose=False, *, install_dir: Path = None) -> "PyGhidraLauncher":
"""
Starts the JVM and fully initializes Ghidra in Headless mode.
:param verbose: Enable verbose output during JVM startup (Defaults to False)
:param install_dir: The path to the Ghidra installation directory.
(Defaults to the GHIDRA_INSTALL_DIR environment variable)
:return: The PhyidraLauncher used to start the JVM
"""
```
#### Example:
```python ```python
import pyghidra import pyghidra
pyghidra.start() pyghidra.start()
@ -30,78 +67,63 @@ from java.lang import String
# do things # do things
``` ```
### Customizing Java and Ghidra initialization ### pyghidra.started()
To check to see if PyGhidra has been started, use the `started()` function.
JVM configuration for the classpath and vmargs may be done through a `PyGhidraLauncher`.
```python ```python
from pyghidra.launcher import HeadlessPyGhidraLauncher def started() -> bool:
"""
launcher = HeadlessPyGhidraLauncher() Whether the PyGhidraLauncher has already started.
launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar") """
launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true")
launcher.start()
``` ```
### Registering an Entry Point #### Example:
The `PyGhidraLauncher` can also be configured through the use of a registered entry point on your own python project.
This is useful for installing your own Ghidra plugin which uses PyGhidra and self-compiles.
First create an [entry_point](https://setuptools.pypa.io/en/latest/userguide/entry_point.html) for `pyghidra.setup`
pointing to a single argument function which accepts the launcher instance.
```python ```python
# setup.py
from setuptools import setup
setup(
# ...,
entry_points={
'pyghidra.setup': [
'acme_plugin = acme.ghidra_plugin.install:setup',
]
}
)
```
Then we create the target function.
This function will be called every time a user starts a PyGhidra launcher.
In the same fashion, another entry point `pyghidra.pre_launch` may be registered and will be called after Ghidra and all
plugins have been loaded.
```python
# acme/ghidra_plugin/install.py
from pathlib import Path
import pyghidra import pyghidra
def setup(launcher): if pyghidra.started():
""" ...
Run by PyGhidra launcher to install our plugin.
"""
launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar")
launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true")
# Install our plugin.
source_path = Path(__file__).parent / "java" / "plugin" # path to uncompiled .java code
details = pyghidra.ExtensionDetails(
name="acme_plugin",
description="My Cool Plugin",
author="acme",
plugin_version="1.2",
)
launcher.install_plugin(source_path, details) # install plugin (if not already)
``` ```
### pyghidra.open_program()
### Analyze a File To have PyGhidra setup a binary file for you, use the `open_program()` function. This will setup a
Ghidra project and import the given binary file as a program for you.
To have PyGhidra setup a binary file for you, use the `open_program()` function.
This will setup a Ghidra project and import the given binary file as a program for you.
Again, this will also allow you to import `ghidra` and `java` to perform more advanced processing. Again, this will also allow you to import `ghidra` and `java` to perform more advanced processing.
```python
def open_program(
binary_path: Union[str, Path],
project_location: Union[str, Path] = None,
project_name: str = None,
analyze=True,
language: str = None,
compiler: str = None,
loader: Union[str, JClass] = None
) -> ContextManager["FlatProgramAPI"]: # type: ignore
"""
Opens given binary path in Ghidra and returns FlatProgramAPI object.
:param binary_path: Path to binary file, may be None.
:param project_location: Location of Ghidra project to open/create.
(Defaults to same directory as binary file)
:param project_name: Name of Ghidra project to open/create.
(Defaults to name of binary file suffixed with "_ghidra")
:param analyze: Whether to run analysis before returning.
:param language: The LanguageID to use for the program.
(Defaults to Ghidra's detected LanguageID)
:param compiler: The CompilerSpecID to use for the program. Requires a provided language.
(Defaults to the Language's default compiler)
:param loader: The `ghidra.app.util.opinion.Loader` class to use when importing the program.
This may be either a Java class or its path. (Defaults to None)
:return: A Ghidra FlatProgramAPI object.
:raises ValueError: If the provided language, compiler or loader is invalid.
:raises TypeError: If the provided loader does not implement `ghidra.app.util.opinion.Loader`.
"""
```
#### Example:
```python ```python
import pyghidra import pyghidra
@ -113,11 +135,12 @@ with pyghidra.open_program("binary_file.exe") as flat_api:
# We are also free to import ghidra while in this context to do more advanced things. # We are also free to import ghidra while in this context to do more advanced things.
from ghidra.app.decompiler.flatapi import FlatDecompilerAPI from ghidra.app.decompiler.flatapi import FlatDecompilerAPI
decomp_api = FlatDecompilerAPI(flat_api) decomp_api = FlatDecompilerAPI(flat_api)
# ... ...
decomp_api.dispose() decomp_api.dispose()
``` ```
By default, PyGhidra will run analysis for you. If you would like to do this yourself, set `analyze` to `False`. By default, PyGhidra will run analysis for you. If you would like to do this yourself, set `analyze`
to `False`.
```python ```python
import pyghidra import pyghidra
@ -130,28 +153,65 @@ with pyghidra.open_program("binary_file.exe", analyze=False) as flat_api:
flat_api.analyzeAll(program) flat_api.analyzeAll(program)
``` ```
The `open_program()` function can also accept optional arguments to control the project name and
The `open_program()` function can also accept optional arguments to control the project name and location that gets created. location that gets created (helpful for opening up a sample in an already existing project).
(Helpful for opening up a sample in an already existing project.)
```python ```python
import pyghidra import pyghidra
with pyghidra.open_program("binary_file.exe", project_name="EXAM_231", project_location=r"C:\exams\231") as flat_api: with pyghidra.open_program("binary_file.exe", project_name="MyProject", project_location=r"C:\projects") as flat_api:
... ...
``` ```
### pyghidra.run_script()
### Run a Script PyGhidra can also be used to run an existing Ghidra Python script directly in your native CPython
interpreter using the `run_script()` function. However, while you can technically run an existing
PyGhidra can also be used to run an existing Ghidra Python script directly in your native python interpreter Ghidra script unmodified, you may run into issues due to differences between Jython 2 and
using the `run_script()` command. CPython 3/JPype. Therefore, some modification to the script may be needed.
However, while you can technically run an existing Ghidra script unmodified, you may
run into issues due to differences between Jython 2 and CPython 3.
Therefore, some modification to the script may be needed.
```python ```python
def run_script(
binary_path: Optional[Union[str, Path]],
script_path: Union[str, Path],
project_location: Union[str, Path] = None,
project_name: str = None,
script_args: List[str] = None,
verbose=False,
analyze=True,
lang: str = None,
compiler: str = None,
loader: Union[str, JClass] = None,
*,
install_dir: Path = None
):
"""
Runs a given script on a given binary path.
:param binary_path: Path to binary file, may be None.
:param script_path: Path to script to run.
:param project_location: Location of Ghidra project to open/create.
(Defaults to same directory as binary file if None)
:param project_name: Name of Ghidra project to open/create.
(Defaults to name of binary file suffixed with "_ghidra" if None)
:param script_args: Command line arguments to pass to script.
:param verbose: Enable verbose output during Ghidra initialization.
:param analyze: Whether to run analysis, if a binary_path is provided, before running the script.
:param lang: The LanguageID to use for the program.
(Defaults to Ghidra's detected LanguageID)
:param compiler: The CompilerSpecID to use for the program. Requires a provided language.
(Defaults to the Language's default compiler)
:param loader: The `ghidra.app.util.opinion.Loader` class to use when importing the program.
This may be either a Java class or its path. (Defaults to None)
:param install_dir: The path to the Ghidra installation directory. This parameter is only
used if Ghidra has not been started yet.
(Defaults to the GHIDRA_INSTALL_DIR environment variable)
:raises ValueError: If the provided language, compiler or loader is invalid.
:raises TypeError: If the provided loader does not implement `ghidra.app.util.opinion.Loader`.
"""
```
#### Example:
```python
import pyghidra import pyghidra
pyghidra.run_script(r"C:\input.exe", r"C:\some_ghidra_script.py") pyghidra.run_script(r"C:\input.exe", r"C:\some_ghidra_script.py")
@ -163,11 +223,77 @@ This can also be done on the command line using `pyghidra`.
> pyghidra C:\input.exe C:\some_ghidra_script.py <CLI ARGS PASSED TO SCRIPT> > pyghidra C:\input.exe C:\some_ghidra_script.py <CLI ARGS PASSED TO SCRIPT>
``` ```
### Handling Package Name Conflicts ### pyghidra.launcher.PyGhidraLauncher()
JVM configuration for the classpath and vmargs may be done through a `PyGhidraLauncher`.
There may be some Python modules and Java packages with the same import path. When this occurs the Python module takes precedence. ```python
While jpype has its own mechanism for handling this situation, PyGhidra automatically makes the Java package accessible by allowing class PyGhidraLauncher:
it to be imported with an underscore appended to the package name. """
Base pyghidra launcher
"""
def add_classpaths(self, *args):
"""
Add additional entries to the classpath when starting the JVM
"""
self.class_path += args
def add_vmargs(self, *args):
"""
Add additional vmargs for launching the JVM
"""
self.vm_args += args
def add_class_files(self, *args):
"""
Add additional entries to be added the classpath after Ghidra has been fully loaded.
This ensures that all of Ghidra is available so classes depending on it can be properly loaded.
"""
self.class_files += args
def start(self, **jpype_kwargs):
"""
Starts Jpype connection to Ghidra (if not already started).
"""
```
The following `PyGhidraLauncher`s are available:
```python
class HeadlessPyGhidraLauncher(PyGhidraLauncher):
"""
Headless pyghidra launcher
"""
```
```python
class DeferredPyGhidraLauncher(PyGhidraLauncher):
"""
PyGhidraLauncher which allows full Ghidra initialization to be deferred.
initialize_ghidra must be called before all Ghidra classes are fully available.
"""
```
```python
class GuiPyGhidraLauncher(PyGhidraLauncher):
"""
GUI pyghidra launcher
"""
```
#### Example:
```python
from pyghidra.launcher import HeadlessPyGhidraLauncher
launcher = HeadlessPyGhidraLauncher()
launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar")
launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true")
launcher.start()
```
## Handling Package Name Conflicts
There may be some Python modules and Java packages with the same import path. When this occurs the
Python module takes precedence. While JPype has its own mechanism for handling this situation,
PyGhidra automatically makes the Java package accessible by allowing it to be imported with an
underscore appended to the package name:
```python ```python
import pdb # imports Python's pdb import pdb # imports Python's pdb