Skip to content

Commit 716b57a

Browse files
committed
Create LLM.txt
Introduced LLM.txt containing a comprehensive summary of the Pattern Analyzer framework, including its architecture, tech stack, project structure, key modules, usage instructions, contribution guidelines, and plugin development steps.
1 parent 3b00c58 commit 716b57a

File tree

1 file changed

+227
-0
lines changed

1 file changed

+227
-0
lines changed

LLM.txt

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
# Pattern Analyzer
2+
3+
## Excerpt / Summary
4+
5+
Pattern Analyzer is a comprehensive, plugin-based framework written in Python for binary data analysis. Its core purpose is to apply a wide range of analytical techniques to any binary data source to detect non-random patterns, identify data structures, and uncover cryptographic properties. The framework is highly extensible, allowing developers to easily add new statistical tests, data transformations, and visualizers. It offers multiple user interfaces—a command-line interface (CLI) for automation, a web UI (Streamlit) for interactive analysis, a text-based UI (TUI) for terminal use, and a REST API for integration into other services.
6+
7+
## Core Concepts
8+
9+
- **Plugin Architecture**: The framework's strength lies in its extensibility. The core `engine.py` discovers and runs plugins. New functionalities (tests, transforms, visuals) can be added without modifying the core engine.
10+
- **Separation of Concerns**: The analysis engine (`engine.py`) is decoupled from the user interfaces (`cli.py`, `app.py`, `tui.py`, `api.py`).
11+
- **Multiple Interfaces**: The tool is designed to be used in various environments: automated scripts (CLI), interactive sessions (Web UI, TUI), or as part of a larger system (Python API, REST API).
12+
- **Data Abstraction**: The `BytesView` class provides a memory-efficient wrapper around binary data, offering unified access methods like `.bit_view()` to plugins.
13+
14+
## Tech Stack
15+
16+
- **Core Language**: Python (>=3.10)
17+
- **CLI**: `click`
18+
- **Core Libraries**: `numpy`, `scipy` for statistical computations.
19+
- **Web UI**: `streamlit`
20+
- **TUI**: `textual`
21+
- **REST API**: `fastapi`
22+
- **Machine Learning Plugins (`[ml]` extra)**: `tensorflow`, `scikit-learn`, `pandas`
23+
- **Packaging & Dependencies**: `setuptools`, `pyproject.toml`
24+
25+
## Project Structure
26+
27+
- `pattern-analyzer/`
28+
- `patternanalyzer/`: Main source code.
29+
- `__init__.py`: Package definition.
30+
- `engine.py`: **The core analysis engine**. Discovers plugins, applies transforms, runs tests, and generates reports.
31+
- `plugin_api.py`: Defines the base classes for plugins: `TestPlugin`, `TransformPlugin`, `VisualPlugin`, `BytesView`, and `TestResult`. This is the contract for extensibility.
32+
- `plugins/`: **Directory for all built-in analysis plugins**. Each `.py` file typically contains one `TestPlugin`. This is the library of analytical tools.
33+
- `cli.py`: The `click`-based Command-Line Interface. Entry point is `patternanalyzer`.
34+
- `tui.py`: The `textual`-based Terminal User Interface.
35+
- `api.py`: The `fastapi`-based REST API for programmatic access over HTTP.
36+
- `discovery.py`: Implements the "discover" mode logic (beam search for transforms).
37+
- `sandbox_runner.py`: A script to run plugins in isolated subprocesses for stability and security.
38+
- `app.py`: The `streamlit`-based Web User Interface.
39+
- `docs/`: Project documentation.
40+
- `tests/`: Unit and integration tests for `pytest`.
41+
- `pyproject.toml`: Project metadata, dependencies, and plugin entry points.
42+
- `README.md`: Project overview and quick start guide.
43+
44+
## Key Modules and Functionality
45+
46+
### `patternanalyzer.engine.Engine`
47+
48+
This is the central orchestrator. Its main methods are:
49+
- `analyze()`: Runs a full analysis pipeline on a `bytes` object based on a configuration dictionary. It applies transforms, runs selected tests (sequentially or in parallel), performs False Discovery Rate (FDR) correction, and generates a final report dictionary.
50+
- `analyze_stream()`: Performs analysis on a stream of data for large files. Only plugins that support the streaming API (`update`/`finalize`) will run.
51+
- `discover()`: Instead of running specific tests, it applies a beam search to find likely transformation chains (e.g., single-byte XOR, base64 decode) that make the data look more like plaintext.
52+
- `_discover_plugins()`: Automatically finds and registers all available plugins defined in `pyproject.toml` under the `patternanalyzer.plugins` entry point.
53+
54+
### `patternanalyzer.plugins/`
55+
56+
This directory contains dozens of plugins, categorized as:
57+
- **Statistical Tests**: NIST-like tests (`monobit`, `runs`, `block_frequency`), Dieharder-inspired tests (`diehard_birthday_spacings`), and others (`approximate_entropy`).
58+
- **Cryptographic Analysis**: `ecb_detector`, `frequency_pattern` (for repeating-key XOR), `known_constants_search` (finds AES S-boxes, etc.).
59+
- **Structural Analysis**: Parsers for common formats like `png_structure`, `pdf_structure`, `zip_structure`.
60+
- **Machine Learning**: `autoencoder_anomaly`, `lstm_gru_anomaly`, and `classifier_labeler` for advanced anomaly detection and classification.
61+
62+
## Usage Guide
63+
64+
The application can be used in four main ways: CLI, Web UI, TUI, and Python API.
65+
66+
### 1. Command-Line Interface (CLI)
67+
68+
The primary interface for scripting and automation. The main command is `patternanalyzer`.
69+
70+
**Key Command:** `patternanalyzer analyze <input_file> [options]`
71+
72+
**Modes of Operation:**
73+
1. **Standard Analysis**: Runs a set of tests against the input file.
74+
2. **Discovery Mode**: Uses the `--discover` flag to automatically search for simple transformations (like single-byte XOR) that might reveal hidden plaintext.
75+
76+
**Analysis Profiles (`--profile <name>`):**
77+
Profiles are pre-defined sets of tests for specific use cases.
78+
- `quick`: A very small, fast set of basic tests (e.g., `monobit`, `runs`).
79+
- `nist`: A comprehensive suite of statistical tests inspired by the NIST SP 800-22 randomness test suite.
80+
- `crypto`: A set of tests focused on cryptographic analysis (e.g., `ecb_detector`, `linear_complexity`, `frequency_pattern`).
81+
- `full`: Runs every single test plugin available.
82+
83+
**Terminal Usage Examples:**
84+
85+
- **Basic analysis with default tests:**
86+
```bash
87+
patternanalyzer analyze suspicious.bin -o report.json
88+
```
89+
90+
- **Run a specific profile and generate an HTML report:**
91+
```bash
92+
patternanalyzer analyze encrypted.dat --profile crypto --html-report crypto_report.html
93+
```
94+
95+
- **Use discovery mode to find a potential single-byte XOR key:**
96+
```bash
97+
patternanalyzer analyze mystery_file.txt --discover --out discovery.json
98+
```
99+
100+
- **Use a custom YAML configuration file for full control:**
101+
```bash
102+
# config.yml might define a transform and a specific test
103+
patternanalyzer analyze data.bin --config config.yml
104+
```
105+
106+
- **Run tests in isolated, sandboxed processes for stability:**
107+
```bash
108+
patternanalyzer analyze large_file.bin --profile full --sandbox-mode
109+
```
110+
111+
### 2. Web User Interface (Web UI)
112+
113+
An interactive interface for easy analysis.
114+
- **How to launch:**
115+
```bash
116+
patternanalyzer serve-ui
117+
```
118+
- **Functionality:**
119+
- Upload files or paste Base64-encoded data.
120+
- Select tests and transforms from a checklist.
121+
- Adjust analysis settings like the FDR significance level.
122+
- View results in a clean, tabulated format, including a scorecard and visualizations.
123+
124+
### 3. Terminal User Interface (TUI)
125+
126+
A terminal-based interface for interactive analysis without leaving the console.
127+
- **How to launch:**
128+
```bash
129+
patternanalyzer tui
130+
```
131+
- **Functionality:**
132+
- Navigate the file system to select an input file.
133+
- Select tests to run using checkboxes.
134+
- View a summary of results directly in the terminal.
135+
136+
### 4. Python API
137+
138+
For integration into other Python applications.
139+
140+
```python
141+
from patternanalyzer.engine import Engine
142+
143+
# 1. Initialize the engine
144+
engine = Engine()
145+
146+
# 2. Load data
147+
with open("test.bin", "rb") as f:
148+
data_bytes = f.read()
149+
150+
# 3. Define the analysis configuration
151+
config = {
152+
"transforms": [{"name": "xor_const", "params": {"xor_value": 127}}],
153+
"tests": [{"name": "monobit"}, {"name": "runs"}],
154+
"fdr_q": 0.05
155+
}
156+
157+
# 4. Run the analysis
158+
output = engine.analyze(data_bytes, config)
159+
160+
# 5. Process the results
161+
import json
162+
print(json.dumps(output['scorecard'], indent=2))
163+
```
164+
165+
## How to Contribute
166+
167+
1. **Fork and Clone** the repository.
168+
2. **Set up the environment**:
169+
```bash
170+
python -m venv .venv
171+
source .venv/bin/activate # or .\.venv\Scripts\activate on Windows
172+
pip install -e .[test,ml,ui]
173+
```
174+
3. **Create a new branch** for your feature or bug fix.
175+
4. **Make your changes**. Add or modify plugins in the `patternanalyzer/plugins/` directory.
176+
5. **Add tests** for your changes in the `tests/` directory.
177+
6. **Run the test suite**:
178+
```bash
179+
pytest
180+
```
181+
7. **Submit a Pull Request**.
182+
183+
## Plugin Development
184+
185+
Creating a new plugin is the primary way to extend the framework.
186+
187+
1. **Choose a Plugin Type** (from `plugin_api.py`):
188+
- `TestPlugin`: Analyzes data and returns a `TestResult`. This is the most common type.
189+
- `TransformPlugin`: Modifies data before it's passed to tests (e.g., decryption, decoding).
190+
- `VisualPlugin`: Generates a visualization (e.g., an SVG image) from a `TestResult`.
191+
192+
2. **Create the Plugin File**:
193+
- Create a new file, e.g., `patternanalyzer/plugins/my_new_test.py`.
194+
- Create a class that inherits from the chosen base class (e.g., `TestPlugin`).
195+
- Implement the required methods, primarily `run()`. The `run` method takes `data: BytesView` and `params: dict` and must return a `TestResult` object.
196+
197+
**Example `TestPlugin`:**
198+
```python
199+
from patternanalyzer.plugin_api import TestPlugin, TestResult, BytesView
200+
201+
class MyNewTest(TestPlugin):
202+
def describe(self) -> str:
203+
return "A new test that checks for the byte 0x42."
204+
205+
def run(self, data: BytesView, params: dict) -> TestResult:
206+
input_bytes = data.to_bytes()
207+
found = b'\x42' in input_bytes
208+
return TestResult(
209+
test_name="my_new_test",
210+
passed=not found, # Fails if the byte is found
211+
p_value=None, # This is a diagnostic, not statistical, test
212+
category="diagnostic",
213+
metrics={"found_0x42": found}
214+
)
215+
```
216+
217+
3. **Register the Plugin**:
218+
- Add an entry point for your new plugin in `pyproject.toml` under the `[project.entry-points."patternanalyzer.plugins"]` section.
219+
```toml
220+
[project.entry-points."patternanalyzer.plugins"]
221+
# ... other plugins
222+
my_new_test = "patternanalyzer.plugins.my_new_test:MyNewTest"
223+
```
224+
225+
4. **Re-install**:
226+
- Run `pip install -e .` again to make your new plugin discoverable by the engine.
227+

0 commit comments

Comments
 (0)