Skip to content

lab-rasool/HoneyBee

Repository files navigation

HoneyBee Logo

HoneyBee

A Scalable Modular Framework for Multimodal AI in Oncology

Nature Digital Medicine PyPI version PyPI Downloads GitHub stars Python PyTorch

Documentation & Examples | Paper

Publication

HoneyBee has been officially published in Nature Digital Medicine!

Tripathi, A., Waqas, A., Schabath, M.B. et al. HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings. npj Digit. Med. 8, 622 (2025). https://doi.org/10.1038/s41746-025-02003-4

Overview

HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.

Warning

Alpha Release: This framework is currently in alpha. APIs may change, and some features are still under development.

Key Features

  • Multimodal data support: clinical text, radiology (DICOM/NIFTI), pathology (WSI), and molecular data
  • 3-layer modular architecture: clean separation between loaders, processors, and embedding models
  • Clinical NLP pipeline: OCR, cancer entity extraction, temporal parsing, and medical ontology mapping
  • Whole Slide Image processing: tissue detection, patch extraction, stain normalization, and quality filtering
  • State-of-the-art embedding models: GatorTron, BioBERT, PubMedBERT, UNI, REMEDIS, RadImageNet, and more
  • Cross-modal integration: unified patient-level representations from multiple data modalities
  • Survival analysis: Cox PH, Random Survival Forest, and DeepSurv
  • Similar patient retrieval: find patients with matching clinical profiles
  • Interactive visualization: t-SNE dashboards for embedding exploration
  • GPU-accelerated: CuCIM backend for WSI processing with OpenSlide fallback

Quick Start

System Dependencies

# Ubuntu/Debian
sudo apt-get install -y openslide-tools tesseract-ocr

# macOS
brew install openslide tesseract

Installation

pip install honeybee-ml
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"

Optional Extras

Extra Command Includes
Clinical pip install honeybee-ml[clinical] NLP, OCR, and text processing dependencies
Pathology pip install honeybee-ml[pathology] WSI loading and image processing
Molecular pip install honeybee-ml[molecular] Genomics and expression data support
All pip install honeybee-ml[all] Everything above

Research Applications

HoneyBee has been successfully applied to:

  • Cancer Subtype Classification: Automated identification of cancer subtypes from multimodal data
  • Survival Prediction: Risk stratification and outcome prediction for treatment planning
  • Similar Patient Retrieval: Finding patients with similar clinical profiles for precision medicine
  • Biomarker Discovery: Identifying multimodal patterns associated with treatment response

License

See the LICENSE file for details.

Citation

If you use HoneyBee in your research, please cite our paper:

Tripathi, A., Waqas, A., Schabath, M.B. et al. HONeYBEE: enabling scalable multimodal AI in
oncology through foundation model-driven embeddings. npj Digit. Med. 8, 622 (2025).
https://doi.org/10.1038/s41746-025-02003-4

About

🐝 | From Data to Prognosis: Embedding Multimodal Oncology Data for Precision Medicine

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages