TernaryCore

An open-source FPGA accelerator for BitNet ternary neural network inference.

BitNet b1.58 encodes every model weight as {-1, 0, +1}. That collapses matrix multiplication — the core operation of every transformer layer — into additions, subtractions, and conditional skips. No multiplies. TernaryCore is hardware built to match that arithmetic natively.

Simulation Status

Module	Tests	Status
`ternary_mac`	8/8	✅ All passing
`ternary_dot`	7/7	✅ All passing
`ternary_gemm`	16/16 (4×4)	✅ All passing

All tests passing! The system has been fully verified with RTL simulation matching Python reference implementation. Recent fixes addressed timing bugs in ternary_dot.v and testbench race conditions.

Recent Fixes (April 2026)

Fixed ternary_dot.v timing bugs:
- valid_out now pulses correctly one cycle after last element
- Fixed vector_done logic to persist through valid_in=0
- Removed debug statements for cleaner output
Fixed testbench race conditions:
- Added #1 delays before clock edges
- Added extra cycle after reset for signal stabilization
Added platform-agnostic documentation
- Support for macOS, Linux, Windows (WSL)
- Multiple waveform viewer options
- Simplified verification without numpy dependency

Waveforms

ternary_mac — 8 test vectors, all passing:

acc_out updates exactly one clock after each valid_in pulse. Sign extension and two's-complement negation handled with no DSP blocks — adders and mux logic only.

ternary_dot — streaming dot product, 7/7 tests passing (VLEN=8 shown):

Eight activations stream in one per clock with weight=+1. acc_out holds zero while the MAC cell accumulates internally, then the final result (36) appears in the same cycle valid_out pulses.

ternary_gemm — 4×4 matrix multiply, 16/16 tests passing:

Four parallel ternary_dot instances (col_0–col_3) receive the same activation broadcast per clock, each with its own weight encoding. One result row lands simultaneously across all four columns when valid_out pulses.

Architecture

graph TD
    subgraph inputs["Inputs (per cycle)"]
        A["activation\n(int8)"]
        W["weight_enc\n(2-bit: 00=0, 01=+1, 10=−1)"]
    end

    subgraph mac["ternary_mac — atomic cell"]
        MUX["2:1 mux\n(add / sub / zero)"]
        REG1["acc register"]
        A --> MUX
        W --> MUX
        MUX --> REG1
    end

    subgraph dot["ternary_dot — streaming dot product"]
        LOOP["× VECTOR_LEN\nmac cells in series"]
        VREG["result register\n(valid_out pulse)"]
        REG1 --> LOOP
        LOOP --> VREG
    end

    subgraph gemm["ternary_gemm — matrix multiply"]
        PAR["× COLS\nparallel dot units"]
        OUT["output row\n(int32 × COLS)"]
        VREG --> PAR
        PAR --> OUT
    end

Three layers, each building on the last:

ternary_mac — the atomic cell. Takes one activation, one 2-bit weight, and a running accumulator. Outputs acc_in ± activation or acc_in (zero weight), registered on the clock edge. No multiplier.

ternary_dot — streaming dot product over VECTOR_LEN elements (default 64). Resets automatically between vectors; asserts valid_out for one cycle when the result is ready.

ternary_gemm — matrix multiply using COLS parallel ternary_dot instances. One activation is broadcast per cycle to all column dots, each receiving its own weight encoding. Produces one output row every DEPTH cycles.

Weight Encoding

`weight_enc`	Ternary value	Operation
`2'b00`	0	No contribution (skip)
`2'b01`	+1	`acc_out = acc_in + activation`
`2'b10`	-1	`acc_out = acc_in - activation`

Getting Started

Prerequisites

All platforms:

Python 3 (for verification scripts)

Verilog Simulator (choose one):

Icarus Verilog (recommended, open source)
- macOS: brew install icarus-verilog
- Ubuntu/Debian: sudo apt-get install iverilog
- Fedora/RHEL: sudo dnf install iverilog
- Windows (WSL2): Use Ubuntu/Debian commands above
- Windows (native): Install from Icarus Verilog Windows builds
Verilator (alternative, faster simulation)
- macOS: brew install verilator
- Ubuntu/Debian: sudo apt-get install verilator
- See verilator.org for other platforms

Setup

git clone https://github.com/shepherdscientific/ternarycore.git
cd ternarycore/sim

Run simulations

make tb_ternary_mac    # ternary_mac — 8 tests
make tb_ternary_dot    # ternary_dot — 7 tests (VLEN=8)
make tb_ternary_gemm   # ternary_gemm — 4×4 matrix multiply
make all               # run all three

Cross-verify with Python

make verify
# or individually:
python3 verify/verify_mac.py
python3 verify/verify_dot.py
python3 verify/verify_gemm_simple.py  # No numpy dependency

View waveforms

For debugging waveforms (.vcd files):

GTKWave (cross-platform, open source)
- macOS: brew install gtkwave
- Ubuntu/Debian: sudo apt-get install gtkwave
- Windows: Available via MSYS2 or WSL
Alternative options:
- WaveTrace (macOS app, free) - Recommended for macOS users
- Verilog HDL VSCode Extension (VSCode plugin with waveform viewer)
- Scansion (macOS, paid)
- ModelSim/QuestaSim (commercial, university licenses available)

Note for macOS users: GTKWave may have issues on newer macOS versions. Consider WaveTrace or Verilog HDL VSCode Extension as alternatives.

Repository Layout

ternarycore/
├── rtl/
│   ├── ternary_mac.v       # single MAC cell
│   ├── ternary_dot.v       # streaming dot product
│   └── ternary_gemm.v      # matrix multiply
├── tb/
│   ├── tb_ternary_mac.v
│   ├── tb_ternary_dot.v
│   └── tb_ternary_gemm.v
├── sim/
│   ├── Makefile
│   └── verify/
│       ├── verify_mac.py
│       ├── verify_dot.py
│       └── verify_gemm.py
├── docs/
│   └── waveform_mac.svg
└── LICENSE                 # CERN-OHL-S v2 (RTL) + MIT (scripts)

Roadmap

ternary_mac — single cell, all tests passing
ternary_dot — 64-element vector dot product, all tests passing
ternary_gemm — 4×4 matrix multiply, all tests passing
Deploy to Xilinx Artix-7 (Arty A7-100T) — board ordered
ternary_dot at 64-element depth on real silicon
Timing closure and resource utilisation report
Head-to-head benchmark: tokens/sec and W vs CPU/GPU baseline
Full transformer layer pipeline

License

RTL source files (rtl/, tb/) are licensed under the CERN Open Hardware Licence v2 — Strongly Reciprocal (CERN-OHL-S v2). Derivative hardware designs must remain open under the same terms.

Software tools and verification scripts (sim/verify/*.py) are licensed under the MIT License.

See LICENSE for full terms.

Acknowledgements

Concept: David Adebiyi and Abu Mohammed — the conversations that sharpened the idea.

The spark: A comment by @Xcc313r4n7 on the llama.cpp thread arguing that biological neurons are themselves ternary — selected by evolution for exactly the same reason we're building this. Contested by the community, but it lodged.

Family & background: Mr Niyi Olowoyo, Mr Fisayo Bejide, My Uncle Tayo Oladapo, my mother, my wife, and my daughters — each of whom contributed something, knowingly or not, to making this possible.

Full credits in the launch article.

Related Work

Benchmark repo (KV cache / local LLM inference): github.com/shepherdscientific/llama-server-tuning
BitNet b1.58: arxiv.org/abs/2402.17764
CERN-OHL-S v2: ohwr.org/cern_ohl_s_v2.txt

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
TernaryCore		TernaryCore
docs		docs
rtl		rtl
sim		sim
tb		tb
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DEBUGGING.md		DEBUGGING.md
LICENSE		LICENSE
README.md		README.md
SIMULATION_GUIDE.md		SIMULATION_GUIDE.md
build.md		build.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TernaryCore

Simulation Status

Recent Fixes (April 2026)

Waveforms

Architecture

Weight Encoding

Getting Started

Prerequisites

Setup

Run simulations

Cross-verify with Python

View waveforms

Repository Layout

Roadmap

License

Acknowledgements

Related Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TernaryCore

Simulation Status

Recent Fixes (April 2026)

Waveforms

Architecture

Weight Encoding

Getting Started

Prerequisites

Setup

Run simulations

Cross-verify with Python

View waveforms

Repository Layout

Roadmap

License

Acknowledgements

Related Work

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages