diff --git a/Ghidra/Features/PyGhidra/README.md b/Ghidra/Features/PyGhidra/README.md index c485eb46a5..3485a6048d 100644 --- a/Ghidra/Features/PyGhidra/README.md +++ b/Ghidra/Features/PyGhidra/README.md @@ -1 +1,11 @@ # PyGhidra + +This module provides the following capabilities: +* The [PyGhidra Python library](src/main/py/README.md) and its dependencies. +* A [Plugin](src/main/java/ghidra/pyghidra/PyGhidraPlugin.java) that provides a CPython interpreter. +* A [ScriptProvider](src/main/java/ghidra/pyghidra/PyGhidraScriptProvider.java) capable of running + GhidraScripts written in native CPython 3. +* An [interactive python script](support/pyghidra_launcher.py) that Ghidra uses to install + and launch PyGhidra. This script handles + [virtual environments](https://docs.python.org/3/tutorial/venv.html) and + [externally managed environments](https://packaging.python.org/en/latest/specifications/externally-managed-environments/). \ No newline at end of file diff --git a/Ghidra/Features/PyGhidra/src/main/py/README.md b/Ghidra/Features/PyGhidra/src/main/py/README.md index 861b6ff5f3..2dc38c4e29 100644 --- a/Ghidra/Features/PyGhidra/src/main/py/README.md +++ b/Ghidra/Features/PyGhidra/src/main/py/README.md @@ -1,22 +1,59 @@ # PyGhidra -PyGhidra is a Python library that provides direct access to the Ghidra API within a native CPython interpreter using [jpype](https://jpype.readthedocs.io/en/latest). As well, PyGhidra contains some conveniences for setting up analysis on a given sample and running a Ghidra script locally. It also contains a Ghidra plugin to allow the use of CPython from the Ghidra user interface. +The PyGhidra Python library, originally developed by the +[Department of Defense Cyber Crime Center (DC3)](https://www.dc3.mil) under the name "Pyhidra", is a +Python library that provides direct access to the Ghidra API within a native CPython 3 interpreter +using [JPype](https://jpype.readthedocs.io/en/latest). PyGhidra contains some conveniences for +setting up analysis on a given sample and running a Ghidra script locally. It also contains a Ghidra +plugin to allow the use of CPython 3 from the Ghidra GUI. -PyGhidra was initially developed for use with Dragodis and is designed to be installable without requiring Java or Ghidra. This allows other Python projects -have PyGhidra as a dependency and provide optional Ghidra functionality without requiring all users to install Java and Ghidra. It is recommended to recommend that users set the `GHIDRA_INSTALL_DIR` environment variable to simplify locating Ghidra. +## Installation and Setup +Ghidra provides an out-of-the box integraton with the PyGhidra Python library which makes +installation and usage fairly straighforward. This enables the Ghidra GUI and headless Ghidra to run +GhidraScript's written in native CPython 3, as well as interact with the Ghidra GUI through a +built-in REPL. To launch Ghidra in PyGhidra-mode, see Ghidra's latest +[Installation Guide](https://github.com/NationalSecurityAgency/ghidra/blob/master/GhidraDocs/InstallationGuide.md#pyghidra-mode). +It is also possible (and encouraged!) to use PyGhidra as a standalone Python library for usage +in reverse engineering workflows where Ghidra may be one of many components involved. The following +instructions in this document focus on this type of usage. -## Usage +To install the PyGhidra Python library: +1. Download and install + [Ghidra 11.3 or later](https://github.com/NationalSecurityAgency/ghidra/releases) to a desired + location. +2. Set the `GHIDRA_INSTALL_DIR` environment variable to point to the directory where Ghidra is + installed. +3. Install PyGhidra: + * Online: `pip install pyghidra` + * Offline: `python3 -m pip install --no-index -f + /Ghidra/Features/PyGhidra/pypkg/dist pyghidra` +## API +The current version of PyGhidra inherits an API from the original "Pyhidra" project that provides an +excellent starting point for interacting with a Ghidra installation. __NOTE:__ These functions are +subject to change in the future as more thought and feedback is collected on PyGhidra's role in the +greater Ghidra ecosystem: -### Raw Connection +### pyghidra.start() +To get a raw connection to Ghidra use the `start()` function. This will setup a JPype connection and +initialize Ghidra in headless mode, which will allow you to directly import `ghidra` and `java`. -To get a raw connection to Ghidra use the `start()` function. -This will setup a Jpype connection and initialize Ghidra in headless mode, -which will allow you to directly import `ghidra` and `java`. +__NOTE:__ No projects or programs get setup in this mode. -*NOTE: No projects or programs get setup in this mode.* +```python +def start(verbose=False, *, install_dir: Path = None) -> "PyGhidraLauncher": + """ + Starts the JVM and fully initializes Ghidra in Headless mode. + :param verbose: Enable verbose output during JVM startup (Defaults to False) + :param install_dir: The path to the Ghidra installation directory. + (Defaults to the GHIDRA_INSTALL_DIR environment variable) + :return: The PhyidraLauncher used to start the JVM + """ +``` + +#### Example: ```python import pyghidra pyghidra.start() @@ -30,78 +67,63 @@ from java.lang import String # do things ``` -### Customizing Java and Ghidra initialization - -JVM configuration for the classpath and vmargs may be done through a `PyGhidraLauncher`. +### pyghidra.started() +To check to see if PyGhidra has been started, use the `started()` function. ```python -from pyghidra.launcher import HeadlessPyGhidraLauncher - -launcher = HeadlessPyGhidraLauncher() -launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar") -launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true") -launcher.start() +def started() -> bool: + """ + Whether the PyGhidraLauncher has already started. + """ ``` -### Registering an Entry Point - -The `PyGhidraLauncher` can also be configured through the use of a registered entry point on your own python project. -This is useful for installing your own Ghidra plugin which uses PyGhidra and self-compiles. - -First create an [entry_point](https://setuptools.pypa.io/en/latest/userguide/entry_point.html) for `pyghidra.setup` -pointing to a single argument function which accepts the launcher instance. - +#### Example: ```python -# setup.py -from setuptools import setup - -setup( - # ..., - entry_points={ - 'pyghidra.setup': [ - 'acme_plugin = acme.ghidra_plugin.install:setup', - ] - } -) -``` - - -Then we create the target function. -This function will be called every time a user starts a PyGhidra launcher. -In the same fashion, another entry point `pyghidra.pre_launch` may be registered and will be called after Ghidra and all -plugins have been loaded. - -```python -# acme/ghidra_plugin/install.py -from pathlib import Path import pyghidra -def setup(launcher): - """ - Run by PyGhidra launcher to install our plugin. - """ - launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar") - launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true") - - # Install our plugin. - source_path = Path(__file__).parent / "java" / "plugin" # path to uncompiled .java code - details = pyghidra.ExtensionDetails( - name="acme_plugin", - description="My Cool Plugin", - author="acme", - plugin_version="1.2", - ) - launcher.install_plugin(source_path, details) # install plugin (if not already) +if pyghidra.started(): + ... ``` - -### Analyze a File - -To have PyGhidra setup a binary file for you, use the `open_program()` function. -This will setup a Ghidra project and import the given binary file as a program for you. +### pyghidra.open_program() +To have PyGhidra setup a binary file for you, use the `open_program()` function. This will setup a +Ghidra project and import the given binary file as a program for you. Again, this will also allow you to import `ghidra` and `java` to perform more advanced processing. +```python +def open_program( + binary_path: Union[str, Path], + project_location: Union[str, Path] = None, + project_name: str = None, + analyze=True, + language: str = None, + compiler: str = None, + loader: Union[str, JClass] = None +) -> ContextManager["FlatProgramAPI"]: # type: ignore + """ + Opens given binary path in Ghidra and returns FlatProgramAPI object. + + :param binary_path: Path to binary file, may be None. + :param project_location: Location of Ghidra project to open/create. + (Defaults to same directory as binary file) + :param project_name: Name of Ghidra project to open/create. + (Defaults to name of binary file suffixed with "_ghidra") + :param analyze: Whether to run analysis before returning. + :param language: The LanguageID to use for the program. + (Defaults to Ghidra's detected LanguageID) + :param compiler: The CompilerSpecID to use for the program. Requires a provided language. + (Defaults to the Language's default compiler) + :param loader: The `ghidra.app.util.opinion.Loader` class to use when importing the program. + This may be either a Java class or its path. (Defaults to None) + :return: A Ghidra FlatProgramAPI object. + :raises ValueError: If the provided language, compiler or loader is invalid. + :raises TypeError: If the provided loader does not implement `ghidra.app.util.opinion.Loader`. + """ +``` + +#### Example: + ```python import pyghidra @@ -113,11 +135,12 @@ with pyghidra.open_program("binary_file.exe") as flat_api: # We are also free to import ghidra while in this context to do more advanced things. from ghidra.app.decompiler.flatapi import FlatDecompilerAPI decomp_api = FlatDecompilerAPI(flat_api) - # ... + ... decomp_api.dispose() ``` -By default, PyGhidra will run analysis for you. If you would like to do this yourself, set `analyze` to `False`. +By default, PyGhidra will run analysis for you. If you would like to do this yourself, set `analyze` +to `False`. ```python import pyghidra @@ -130,28 +153,65 @@ with pyghidra.open_program("binary_file.exe", analyze=False) as flat_api: flat_api.analyzeAll(program) ``` - -The `open_program()` function can also accept optional arguments to control the project name and location that gets created. -(Helpful for opening up a sample in an already existing project.) +The `open_program()` function can also accept optional arguments to control the project name and +location that gets created (helpful for opening up a sample in an already existing project). ```python import pyghidra -with pyghidra.open_program("binary_file.exe", project_name="EXAM_231", project_location=r"C:\exams\231") as flat_api: +with pyghidra.open_program("binary_file.exe", project_name="MyProject", project_location=r"C:\projects") as flat_api: ... ``` - -### Run a Script - -PyGhidra can also be used to run an existing Ghidra Python script directly in your native python interpreter -using the `run_script()` command. -However, while you can technically run an existing Ghidra script unmodified, you may -run into issues due to differences between Jython 2 and CPython 3. -Therefore, some modification to the script may be needed. +### pyghidra.run_script() +PyGhidra can also be used to run an existing Ghidra Python script directly in your native CPython +interpreter using the `run_script()` function. However, while you can technically run an existing +Ghidra script unmodified, you may run into issues due to differences between Jython 2 and +CPython 3/JPype. Therefore, some modification to the script may be needed. ```python +def run_script( + binary_path: Optional[Union[str, Path]], + script_path: Union[str, Path], + project_location: Union[str, Path] = None, + project_name: str = None, + script_args: List[str] = None, + verbose=False, + analyze=True, + lang: str = None, + compiler: str = None, + loader: Union[str, JClass] = None, + *, + install_dir: Path = None +): + """ + Runs a given script on a given binary path. + :param binary_path: Path to binary file, may be None. + :param script_path: Path to script to run. + :param project_location: Location of Ghidra project to open/create. + (Defaults to same directory as binary file if None) + :param project_name: Name of Ghidra project to open/create. + (Defaults to name of binary file suffixed with "_ghidra" if None) + :param script_args: Command line arguments to pass to script. + :param verbose: Enable verbose output during Ghidra initialization. + :param analyze: Whether to run analysis, if a binary_path is provided, before running the script. + :param lang: The LanguageID to use for the program. + (Defaults to Ghidra's detected LanguageID) + :param compiler: The CompilerSpecID to use for the program. Requires a provided language. + (Defaults to the Language's default compiler) + :param loader: The `ghidra.app.util.opinion.Loader` class to use when importing the program. + This may be either a Java class or its path. (Defaults to None) + :param install_dir: The path to the Ghidra installation directory. This parameter is only + used if Ghidra has not been started yet. + (Defaults to the GHIDRA_INSTALL_DIR environment variable) + :raises ValueError: If the provided language, compiler or loader is invalid. + :raises TypeError: If the provided loader does not implement `ghidra.app.util.opinion.Loader`. + """ +``` + +#### Example: +```python import pyghidra pyghidra.run_script(r"C:\input.exe", r"C:\some_ghidra_script.py") @@ -163,11 +223,77 @@ This can also be done on the command line using `pyghidra`. > pyghidra C:\input.exe C:\some_ghidra_script.py ``` -### Handling Package Name Conflicts +### pyghidra.launcher.PyGhidraLauncher() +JVM configuration for the classpath and vmargs may be done through a `PyGhidraLauncher`. -There may be some Python modules and Java packages with the same import path. When this occurs the Python module takes precedence. -While jpype has its own mechanism for handling this situation, PyGhidra automatically makes the Java package accessible by allowing -it to be imported with an underscore appended to the package name. +```python +class PyGhidraLauncher: + """ + Base pyghidra launcher + """ + + def add_classpaths(self, *args): + """ + Add additional entries to the classpath when starting the JVM + """ + self.class_path += args + + def add_vmargs(self, *args): + """ + Add additional vmargs for launching the JVM + """ + self.vm_args += args + + def add_class_files(self, *args): + """ + Add additional entries to be added the classpath after Ghidra has been fully loaded. + This ensures that all of Ghidra is available so classes depending on it can be properly loaded. + """ + self.class_files += args + + def start(self, **jpype_kwargs): + """ + Starts Jpype connection to Ghidra (if not already started). + """ +``` + +The following `PyGhidraLauncher`s are available: + +```python +class HeadlessPyGhidraLauncher(PyGhidraLauncher): + """ + Headless pyghidra launcher + """ +``` +```python +class DeferredPyGhidraLauncher(PyGhidraLauncher): + """ + PyGhidraLauncher which allows full Ghidra initialization to be deferred. + initialize_ghidra must be called before all Ghidra classes are fully available. + """ +``` +```python +class GuiPyGhidraLauncher(PyGhidraLauncher): + """ + GUI pyghidra launcher + """ +``` + +#### Example: +```python +from pyghidra.launcher import HeadlessPyGhidraLauncher + +launcher = HeadlessPyGhidraLauncher() +launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar") +launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true") +launcher.start() +``` + +## Handling Package Name Conflicts +There may be some Python modules and Java packages with the same import path. When this occurs the +Python module takes precedence. While JPype has its own mechanism for handling this situation, +PyGhidra automatically makes the Java package accessible by allowing it to be imported with an +underscore appended to the package name: ```python import pdb # imports Python's pdb