ifcb-inference

ONNX-based inference system for IFCB (Imaging FlowCytobot) bin data. This tool performs automated plankton classification on IFCB bin files using pre-trained ONNX models.

Features

Flexible model support: Works with both static and dynamic batch size ONNX models
Multiple data loading backends: Supports both PyTorch and non-PyTorch data loading
Configurable output organization: Choose between run-date or model-name subfolder organization
Directory structure preservation: Maintains input directory hierarchies in output
Containerized deployment: Docker/Podman support for consistent environments
GPU acceleration: CUDA support for faster inference (automatic when available)

Installation

Extra	Installs	Use when
`[cpu]`	`onnxruntime` (CPU)	Lightweight/constrained environments — no GPU
`[cuda]`	`onnxruntime-gpu`	GPU inference via CUDA
`[torch]`	PyTorch + torchvision	Faster/more flexible data loading, but more dependancies
`[cuda,torch]`	Both of the above	Full-featured install
`[dev]`	pytest, black, isort, flake8	Development and testing

One of [cpu] or [cuda] must be used to have the appropriate onnxruntime. They are mutually exclusive. If neither are included, at install, ifcb-infer will be unable to run. If in doubt, use [cuda].
Use of [torch] is optional. Without it, a basic data loader is used — suitable for constrained or lite environments where installing PyTorch is impractical (e.g. small containers, edge deployments). The [torch] data loader is recommended otherwise as it supports more image formats and is generally faster.

# Full featured install
pip install "ifcb-infer[cuda,torch] @ git+https://github.com/WHOIGit/ifcb-inference.git"

# GPU enabled, but without pytorch dependencies
pip install "ifcb-infer[cuda] @ git+https://github.com/WHOIGit/ifcb-inference.git"
export LD_LIBRARY_PATH=$(pip show nvidia-cudnn-cu12 | grep Location | awk '{print $2}')/nvidia/cudnn/lib:$LD_LIBRARY_PATH
# see "cuDNN requirement for `[cuda]` without `[torch]`" LD_LIBRARY_PATH note below

# Lightest install
pip install "ifcb-infer[cpu] @ git+https://github.com/WHOIGit/ifcb-inference.git"

If cloning the repo and developing locally:

# Full-featured install (gpu/CUDA + PyTorch)
pip install -e ".[cuda,torch,dev]"

cuDNN requirement for `[cuda]` without `[torch]`

[cuda,torch] works out of the box — PyTorch bundles its own cuDNN libraries and ORT finds them automatically.

[cuda] alone installs nvidia-cudnn-cu12 via pip, but ORT cannot find it without help because the libraries land in site-packages, not a standard system path. If you don't have libcudnn9-cuda-12 installed globally/to a standard location, it must be explicitely set with LD_LIBRARY_PATH.

Setting LD_LIBRARY_PATH to point to the pip-installed cuDNN:**

export LD_LIBRARY_PATH=$(pip show nvidia-cudnn-cu12 | grep Location | awk '{print $2}')/nvidia/cudnn/lib:$LD_LIBRARY_PATH

Add this to your environment profile (.bashrc, .bash_profile, venv/bin/activate script) to make it persistent.

Usage

ifcb-infer [OPTIONS] MODEL BINS [BINS ...]

MODEL is the path to an onnx model file BINS can be a directory, a bin path, or a .txt/.list file of bin paths.

Options

--classes FILE                         Class list file; adds column headers to output CSVs.
                                       Accepts a line-delimited .txt or an index-keyed .json
                                       (e.g. {"0": "class_a", "1": "class_b"})
--batch N                              Required for models without a fixed input batch size
--outdir DIRPATH                       Output directory. Default: ./outputs
--outfile PATTERN                      Output filename pattern. Default: {MODEL_NAME}/{SUBPATH}/{BIN}.csv
                                       Tokens: {MODEL_NAME}, {RUN_DATE}, {SUBPATH} (relative dir), {BIN} (bin name)
--cpuonly                              Force CPU inference even if CUDA is available
--notorch                              Use non-PyTorch data loader even if torch is installed

By default, CUDA is used automatically when available/installed and otherwise falls back to using CPU.
By default, torch-dataloaders are used automatically when available/installed and otherwise falls back to a simpler implementation.
For the output csv to have column names that correspond to human-readable class names, use --classes option.
If a model has a predefined input batch size, that batch size is automatically used and --batch is ignored.
If a model does NOT have a predefined input batch size, --batch must be specified.

Output Organization Examples

The output path for each bin is controlled by the --outfile PATTERN option (default: {MODEL_NAME}/{SUBPATH}/{BIN}.csv), resolved relative to --outdir. The available tokens are:

Token	Value
`{BIN}`	Bin name (e.g. `D20230108T145350_IFCB127`)
`{SUBPATH}`	Directory of the bin relative to the input folder
`{MODEL_NAME}`	Model filename without extension
`{RUN_DATE}`	Date the command was run (`YYYY-MM-DD`)

{SUBPATH} mirrors the input directory hierarchy, so outputs reflect the same structure as the source data. Given:

example-data/bins/
├── MVCO/
│   ├── 2006/
│   │   └── IFCB1_2006_157/
│   │       ├── IFCB1_2006_157_181359   ← bin
│   │       ├── IFCB1_2006_157_183432   ← bin
│   │       └── IFCB1_2006_157_185616   ← bin
│   └── 2023/
│       └── D20230108/
│           ├── D20230108T145350_IFCB127   ← bin
│           ├── D20230108T151529_IFCB127   ← bin
│           └── D20230108T153615_IFCB127   ← bin
└── OTZ/
    └── 2019/
        ├── D20190722/
        │   └── D20190722T155753_IFCB127   ← bin
        └── D20190723/
            ├── D20190723T161602_IFCB127   ← bin
            └── D20190723T171832_IFCB127   ← bin

Default ({MODEL_NAME}/{SUBPATH}/{BIN}.csv):

ifcb-infer my_classifier.onnx example-data/bins/

outputs/
└── my_classifier/
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_181359.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_183432.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_185616.csv
    ├── MVCO/2023/D20230108/D20230108T145350_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T151529_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T153615_IFCB127.csv
    ├── OTZ/2019/D20190722/D20190722T155753_IFCB127.csv
    ├── OTZ/2019/D20190723/D20190723T161602_IFCB127.csv
    └── OTZ/2019/D20190723/D20190723T171832_IFCB127.csv

Flat output — one folder, all bins (--outfile "{BIN}.csv"):

ifcb-infer --outdir "my/custom/output" --outfile "{BIN}.csv" my_classifier.onnx example-data/bins/

my/custom/output/
├── IFCB1_2006_157_181359.csv
├── IFCB1_2006_157_183432.csv
├── IFCB1_2006_157_185616.csv
├── D20230108T145350_IFCB127.csv
├── D20230108T151529_IFCB127.csv
├── D20230108T153615_IFCB127.csv
├── D20190722T155753_IFCB127.csv
├── D20190723T161602_IFCB127.csv
└── D20190723T171832_IFCB127.csv

Run-date prefix (--outfile "{RUN_DATE}/{SUBPATH}/{BIN}.csv"):

ifcb-infer --outfile "{RUN_DATE}/{SUBPATH}/{BIN}.csv" my_classifier.onnx example-data/bins/

outputs/
└── 2025-01-15/
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_181359.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_183432.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_185616.csv
    ├── MVCO/2023/D20230108/D20230108T145350_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T151529_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T153615_IFCB127.csv
    ├── OTZ/2019/D20190722/D20190722T155753_IFCB127.csv
    ├── OTZ/2019/D20190723/D20190723T161602_IFCB127.csv
    └── OTZ/2019/D20190723/D20190723T171832_IFCB127.csv

Container Use

The Dockerfile installs with [cuda,torch] for full GPU support.

Build:

# Podman
podman build . -t ifcb-infer:latest
podman run -it --rm -e CUDA_VISIBLE_DEVICES=1 \
       --device nvidia.com/gpu=all \
       -v $(pwd)/models:/app/models \
       -v $(pwd)/inputs:/app/inputs \
       -v $(pwd)/outputs:/app/outputs \
       ifcb-infer:latest models/classifier.onnx inputs/

To select a specific GPU, use CUDA_VISIBLE_DEVICES:

All ifcb-infer options can be appended after the image name. MODEL and BINS paths must refer to paths inside the container as mapped by -v.

Development

Running Tests

First install with the [dev] extra:

pip install -e ".[dev]"

Then run:

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=term-missing

Continuous Integration

The project includes GitHub Actions workflows that automatically:

Run tests on Python 3.10, 3.11, and 3.12 when code is pushed or PRs are opened
Check code quality with linting tools (flake8, black, isort)

Tests run automatically on pushes to main branch and on all pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ifcb-inference

Features

Installation

cuDNN requirement for `[cuda]` without `[torch]`

Usage

Options

Output Organization Examples

Container Use

Development

Running Tests

Continuous Integration

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ifcb-inference

Features

Installation

cuDNN requirement for [cuda] without [torch]

Usage

Options

Output Organization Examples

Container Use

Development

Running Tests

Continuous Integration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

cuDNN requirement for `[cuda]` without `[torch]`

Packages