Skip to content

WHOIGit/ifcb-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ifcb-inference

Tests Lint

ONNX-based inference system for IFCB (Imaging FlowCytobot) bin data. This tool performs automated plankton classification on IFCB bin files using pre-trained ONNX models.

Features

  • Flexible model support: Works with both static and dynamic batch size ONNX models
  • Multiple data loading backends: Supports both PyTorch and non-PyTorch data loading
  • Configurable output organization: Choose between run-date or model-name subfolder organization
  • Directory structure preservation: Maintains input directory hierarchies in output
  • Containerized deployment: Docker/Podman support for consistent environments
  • GPU acceleration: CUDA support for faster inference (automatic when available)

Installation

Extra Installs Use when
[cpu] onnxruntime (CPU) Lightweight/constrained environments — no GPU
[cuda] onnxruntime-gpu GPU inference via CUDA
[torch] PyTorch + torchvision Faster/more flexible data loading, but more dependancies
[cuda,torch] Both of the above Full-featured install
[dev] pytest, black, isort, flake8 Development and testing
  • One of [cpu] or [cuda] must be used to have the appropriate onnxruntime. They are mutually exclusive. If neither are included, at install, ifcb-infer will be unable to run. If in doubt, use [cuda].
  • Use of [torch] is optional. Without it, a basic data loader is used — suitable for constrained or lite environments where installing PyTorch is impractical (e.g. small containers, edge deployments). The [torch] data loader is recommended otherwise as it supports more image formats and is generally faster.
# Full featured install
pip install "ifcb-infer[cuda,torch] @ git+https://github.com/WHOIGit/ifcb-inference.git"

# GPU enabled, but without pytorch dependencies
pip install "ifcb-infer[cuda] @ git+https://github.com/WHOIGit/ifcb-inference.git"
export LD_LIBRARY_PATH=$(pip show nvidia-cudnn-cu12 | grep Location | awk '{print $2}')/nvidia/cudnn/lib:$LD_LIBRARY_PATH
# see "cuDNN requirement for `[cuda]` without `[torch]`" LD_LIBRARY_PATH note below

# Lightest install
pip install "ifcb-infer[cpu] @ git+https://github.com/WHOIGit/ifcb-inference.git"

If cloning the repo and developing locally:

# Full-featured install (gpu/CUDA + PyTorch)
pip install -e ".[cuda,torch,dev]"

cuDNN requirement for [cuda] without [torch]

[cuda,torch] works out of the box — PyTorch bundles its own cuDNN libraries and ORT finds them automatically.

[cuda] alone installs nvidia-cudnn-cu12 via pip, but ORT cannot find it without help because the libraries land in site-packages, not a standard system path. If you don't have libcudnn9-cuda-12 installed globally/to a standard location, it must be explicitely set with LD_LIBRARY_PATH.

Setting LD_LIBRARY_PATH to point to the pip-installed cuDNN:**

export LD_LIBRARY_PATH=$(pip show nvidia-cudnn-cu12 | grep Location | awk '{print $2}')/nvidia/cudnn/lib:$LD_LIBRARY_PATH

Add this to your environment profile (.bashrc, .bash_profile, venv/bin/activate script) to make it persistent.

Usage

ifcb-infer [OPTIONS] MODEL BINS [BINS ...]

MODEL is the path to an onnx model file BINS can be a directory, a bin path, or a .txt/.list file of bin paths.

Options

--classes FILE                         Class list file; adds column headers to output CSVs.
                                       Accepts a line-delimited .txt or an index-keyed .json
                                       (e.g. {"0": "class_a", "1": "class_b"})
--batch N                              Required for models without a fixed input batch size
--outdir DIRPATH                       Output directory. Default: ./outputs
--outfile PATTERN                      Output filename pattern. Default: {MODEL_NAME}/{SUBPATH}/{BIN}.csv
                                       Tokens: {MODEL_NAME}, {RUN_DATE}, {SUBPATH} (relative dir), {BIN} (bin name)
--cpuonly                              Force CPU inference even if CUDA is available
--notorch                              Use non-PyTorch data loader even if torch is installed
  • By default, CUDA is used automatically when available/installed and otherwise falls back to using CPU.
  • By default, torch-dataloaders are used automatically when available/installed and otherwise falls back to a simpler implementation.
  • For the output csv to have column names that correspond to human-readable class names, use --classes option.
  • If a model has a predefined input batch size, that batch size is automatically used and --batch is ignored.
  • If a model does NOT have a predefined input batch size, --batch must be specified.

Output Organization Examples

The output path for each bin is controlled by the --outfile PATTERN option (default: {MODEL_NAME}/{SUBPATH}/{BIN}.csv), resolved relative to --outdir. The available tokens are:

Token Value
{BIN} Bin name (e.g. D20230108T145350_IFCB127)
{SUBPATH} Directory of the bin relative to the input folder
{MODEL_NAME} Model filename without extension
{RUN_DATE} Date the command was run (YYYY-MM-DD)

{SUBPATH} mirrors the input directory hierarchy, so outputs reflect the same structure as the source data. Given:

example-data/bins/
├── MVCO/
│   ├── 2006/
│   │   └── IFCB1_2006_157/
│   │       ├── IFCB1_2006_157_181359   ← bin
│   │       ├── IFCB1_2006_157_183432   ← bin
│   │       └── IFCB1_2006_157_185616   ← bin
│   └── 2023/
│       └── D20230108/
│           ├── D20230108T145350_IFCB127   ← bin
│           ├── D20230108T151529_IFCB127   ← bin
│           └── D20230108T153615_IFCB127   ← bin
└── OTZ/
    └── 2019/
        ├── D20190722/
        │   └── D20190722T155753_IFCB127   ← bin
        └── D20190723/
            ├── D20190723T161602_IFCB127   ← bin
            └── D20190723T171832_IFCB127   ← bin

Default ({MODEL_NAME}/{SUBPATH}/{BIN}.csv):

ifcb-infer my_classifier.onnx example-data/bins/
outputs/
└── my_classifier/
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_181359.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_183432.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_185616.csv
    ├── MVCO/2023/D20230108/D20230108T145350_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T151529_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T153615_IFCB127.csv
    ├── OTZ/2019/D20190722/D20190722T155753_IFCB127.csv
    ├── OTZ/2019/D20190723/D20190723T161602_IFCB127.csv
    └── OTZ/2019/D20190723/D20190723T171832_IFCB127.csv

Flat output — one folder, all bins (--outfile "{BIN}.csv"):

ifcb-infer --outdir "my/custom/output" --outfile "{BIN}.csv" my_classifier.onnx example-data/bins/
my/custom/output/
├── IFCB1_2006_157_181359.csv
├── IFCB1_2006_157_183432.csv
├── IFCB1_2006_157_185616.csv
├── D20230108T145350_IFCB127.csv
├── D20230108T151529_IFCB127.csv
├── D20230108T153615_IFCB127.csv
├── D20190722T155753_IFCB127.csv
├── D20190723T161602_IFCB127.csv
└── D20190723T171832_IFCB127.csv

Run-date prefix (--outfile "{RUN_DATE}/{SUBPATH}/{BIN}.csv"):

ifcb-infer --outfile "{RUN_DATE}/{SUBPATH}/{BIN}.csv" my_classifier.onnx example-data/bins/
outputs/
└── 2025-01-15/
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_181359.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_183432.csv
    ├── MVCO/2006/IFCB1_2006_157/IFCB1_2006_157_185616.csv
    ├── MVCO/2023/D20230108/D20230108T145350_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T151529_IFCB127.csv
    ├── MVCO/2023/D20230108/D20230108T153615_IFCB127.csv
    ├── OTZ/2019/D20190722/D20190722T155753_IFCB127.csv
    ├── OTZ/2019/D20190723/D20190723T161602_IFCB127.csv
    └── OTZ/2019/D20190723/D20190723T171832_IFCB127.csv

Container Use

The Dockerfile installs with [cuda,torch] for full GPU support.

Build:

# Podman
podman build . -t ifcb-infer:latest
podman run -it --rm -e CUDA_VISIBLE_DEVICES=1 \
       --device nvidia.com/gpu=all \
       -v $(pwd)/models:/app/models \
       -v $(pwd)/inputs:/app/inputs \
       -v $(pwd)/outputs:/app/outputs \
       ifcb-infer:latest models/classifier.onnx inputs/

To select a specific GPU, use CUDA_VISIBLE_DEVICES:

All ifcb-infer options can be appended after the image name. MODEL and BINS paths must refer to paths inside the container as mapped by -v.

Development

Running Tests

First install with the [dev] extra:

pip install -e ".[dev]"

Then run:

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=term-missing

Continuous Integration

The project includes GitHub Actions workflows that automatically:

  • Run tests on Python 3.10, 3.11, and 3.12 when code is pushed or PRs are opened
  • Check code quality with linting tools (flake8, black, isort)

Tests run automatically on pushes to main branch and on all pull requests.

About

ONNX-based inference for plankton classification on IFCB data

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors