EXIA GHOST Results: F1 71.99% — First independent evaluation (1 user)

Hello HaluMem Team,

We are submitting results for **EXIA GHOST V5**, a bio-inspired AI memory system with 5 specialized stores (contextual, semantic, episodic, procedural, prospective).

This is the **first independent evaluation** of HaluMem — all other scores in the paper were evaluated by the HaluMem team.

## Results (1 user, GPT-4o judge)

| Metric | Score |
|---|---|
| **F1 (Extraction)** | **71.99%** |
| Precision | 92.90% |
| Recall | 58.66% |

| Metric | Score |
|---|---|
| **F1 (Update)** | **62.47%** |
| Precision | 73.00% |
| Recall | 54.64% |

| Metric | Score |
|---|---|
| **QA Accuracy** | **68.07%** |

**Note**: This evaluation was run on **1 user** (HaluMem-Medium). A full 20-user evaluation is planned.

## Verification

- **Repository**: https://github.com/francisdu53/exia-ghost-benchmarks
- **Full results & methodology**: [halumem/RESULTS.md](https://github.com/francisdu53/exia-ghost-benchmarks/blob/main/halumem/RESULTS.md)
- **Adapter code**: [halumem/eval_exiaghost.py](https://github.com/francisdu53/exia-ghost-benchmarks/blob/main/halumem/eval_exiaghost.py)
- **Judge**: GPT-4o (official HaluMem standard)
- **Raw results**: included in repository (scores.json + eval_results.jsonl)

Author: Francis BABIN (Solo Developer)

Thank you for this important benchmark. We look forward to your feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXIA GHOST Results: F1 71.99% — First independent evaluation (1 user) #6

Results (1 user, GPT-4o judge)

Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EXIA GHOST Results: F1 71.99% — First independent evaluation (1 user) #6

Description

Results (1 user, GPT-4o judge)

Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions