Homomorphic Encryption for Private ML Inference

Project Overview

This project demonstrates Homomorphic Encryption (HE) applied to a Machine Learning inference task. A Logistic Regression model classifies breast cancer diagnoses (malignant vs. benign) entirely on encrypted patient data — the compute server never sees the raw input at any point.

Library used: TenSEAL (Python wrapper over Microsoft SEAL) HE Scheme: CKKS (Cheon-Kim-Kim-Song) Dataset: Wisconsin Breast Cancer Dataset (sklearn) — 569 samples, 30 features, binary label Language: Python 3.12

Motivation — Why Homomorphic Encryption?

In traditional cloud computing, a hospital wanting to run an ML model on patient data must send the raw (decrypted) data to the server. The server learns everything.

Homomorphic Encryption solves this:

Patient data is encrypted locally before being sent
The server computes the ML inference directly on ciphertext
Only the encrypted result is returned — the server learns nothing
The patient/hospital decrypts the result locally

This enables privacy-preserving machine learning — a critical need in healthcare, finance, and any domain with sensitive data.

What is Homomorphic Encryption?

Homomorphic Encryption is a form of encryption that allows computation on ciphertexts. The result, when decrypted, matches what you would have gotten by computing on the plaintext directly.

Mathematically:

Decrypt( F(Encrypt(x)) ) = F(x)

Where F is some function (in our case: a dot product).

Why CKKS?

There are several HE schemes. We use CKKS because:

Scheme	Supports	Used For
BFV / BGV	Exact integers	Counting, voting
CKKS	Approximate real numbers	ML, statistics

Machine learning models work with floating-point numbers (weights, features). CKKS is the only practical HE scheme for this — it introduces a tiny, controlled approximation error (on the order of 1e-6 in our results), which is negligible for classification.

CKKS Parameters — Technical Explanation

poly_modulus_degree = 8192
coeff_mod_bit_sizes = [60, 40, 60]
scale               = 2^40

poly_modulus_degree: Defines the size of the polynomial ring that ciphertexts live in. Larger values allow deeper computation (more sequential operations) but are slower and use more memory. 8192 is standard for shallow inference tasks.

coeff_mod_bit_sizes: Defines the "noise budget" chain. Each multiplication consumes one level. [60, 40, 60] gives 1 usable multiplication level — sufficient for our dot product (which only requires additions after element-wise multiply with plaintext weights).

scale = 2^40: Controls the precision of encoded real numbers. Roughly 40 bits of mantissa precision — more than enough for ML inference.

Galois keys: Required for the vector rotation operations used internally by TenSEAL when computing dot products on encrypted vectors.

Noise Budget — A Key HE Concept

Every HE operation introduces a small amount of noise into the ciphertext. CKKS manages this through the coefficient modulus chain — each multiplication "consumes" one level of the chain. When levels are exhausted, the noise overwhelms the signal and results become unreliable.

For our project:

We only perform 1 multiplication depth (the dot product)
Our budget [60, 40, 60] provides exactly 1 usable level
Average noise observed: 1.14 × 10⁻⁶ — negligible for classification

Bootstrapping is an advanced technique that can refresh the noise budget, allowing arbitrarily deep computation. It was not needed here but is a major active research area for deep neural network inference under HE.

Why We Don't Use Encrypted Sigmoid

A full logistic regression pipeline computes:

z = X · weights + bias
p = sigmoid(z) = 1 / (1 + exp(-z))
label = 1 if p >= 0.5 else 0

The problem: HE only supports addition and multiplication. sigmoid(z) requires exp() and division — impossible to compute on ciphertexts directly.

The standard solution: Approximate sigmoid with a polynomial:

sigmoid(z) ≈ 0.5 + 0.197z - 0.004z³

This only uses multiplications and additions — HE-compatible.

Why we still don't use it: Degree-3 polynomial approximations are only accurate for |z| ≤ 5. Our logistic regression on 30 standardized features regularly produces |z| > 10, causing the approximation to diverge badly.

Our correct solution: For binary classification, sigmoid is unnecessary. The decision rule sigmoid(z) >= 0.5 is mathematically identical to z >= 0. We compute the encrypted dot product z, decrypt only z, and threshold at 0. The server never sees the features — only the encrypted z is transmitted back.

Real-world solutions to this challenge (future work):

Minimax polynomial approximation (higher degree, requires more CKKS levels)
Composite polynomial approximation (piecewise, more stable over wider range)
Training the model to constrain z to a small range
Using a different activation function that approximates more easily

Results

Accuracy

	Plaintext	HE Encrypted
Malignant	97.6%	97.6%
Benign	98.6%	98.6%
Overall	98.25%	98.25%

Prediction agreement: 114/114 (100%) — HE inference matched plaintext on every sample
CKKS noise on z: avg 1.14 × 10⁻⁶, max 7.94 × 10⁻⁶ — never affected a prediction

Performance

	Total (114 samples)	Per Sample
Plaintext (sklearn)	0.35 ms	0.003 ms
HE Encrypted	1,711 ms	15 ms
Slowdown	—	~4,893×

Per-operation breakdown (HE, per sample):

Encryption: 3.86 ms
Dot product: 10.80 ms ← dominant cost (Galois key rotations)
Decryption: 0.36 ms

The dot product dominates because it requires expensive Galois key rotation operations internally — this cost scales with the number of features (30 in our case).

Technical Challenges & Solutions

Challenge 1: Polynomial sigmoid divergence

Problem: Standard degree-3 sigmoid approximation breaks down for |z| > 5, which logistic regression on high-dimensional data commonly produces. Solution: Classified directly using sign(z), which is mathematically equivalent to the sigmoid threshold for binary classification.

Challenge 2: CKKS scale overflow

Problem: Initial parameters (poly_modulus_degree=8192, [60,40,60]) caused scale overflow when attempting polynomial sigmoid due to insufficient multiplication levels. Solution: Removed the polynomial sigmoid (see Challenge 1). Parameters were kept minimal since only 1 multiplication level was ultimately needed.

Challenge 3: TenSEAL on Windows

Problem: TenSEAL (built on Microsoft SEAL) requires a C++ build toolchain that is difficult to configure on Windows. Solution: Developed on Ubuntu (WSL) where pip install tenseal works out of the box.

File Structure

he_project/
├── train_model.py       # Train LR model, extract weights, save artifacts
├── encrypt_infer.py     # HE inference on a single sample (demo)
├── benchmark.py         # HE vs plaintext across all 114 test samples
├── model_params.json    # Extracted weights + bias (generated)
├── model.pkl            # Sklearn model object (generated)
├── scaler.pkl           # StandardScaler (generated)
├── X_test_scaled.npy    # Scaled test features (generated)
└── y_test.npy           # Test labels (generated)

Run order:

python3 -m venv venv        # create environment
source venv/bin/activate    # activate environment
python3 train_model.py      # generates all .pkl and .npy files
python3 encrypt_infer.py    # single-sample demo
python3 benchmark.py        # full 114-sample benchmark

Workload Distribution

Member	Responsibilities
Member A	Software architecture, encryption pipeline, ML model training, benchmarking, technical analysis
Member B	Presentation design, written report, objectives and findings documentation

References

Microsoft SEAL: https://www.microsoft.com/en-us/research/project/microsoft-seal/
TenSEAL: https://github.com/OpenMined/TenSEAL
CKKS Scheme: Cheon, J.H., Kim, A., Kim, M., Song, Y. (2017). Homomorphic Encryption for Arithmetic of Approximate Numbers.
Dataset: Breast Cancer Wisconsin (Diagnostic) — UCI ML Repository / sklearn

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
benchmark.py		benchmark.py
encrypt_infer.py		encrypt_infer.py
model_params.json		model_params.json
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homomorphic Encryption for Private ML Inference

Project Overview

Motivation — Why Homomorphic Encryption?

What is Homomorphic Encryption?

Why CKKS?

CKKS Parameters — Technical Explanation

Noise Budget — A Key HE Concept

Why We Don't Use Encrypted Sigmoid

Results

Accuracy

Performance

Technical Challenges & Solutions

Challenge 1: Polynomial sigmoid divergence

Challenge 2: CKKS scale overflow

Challenge 3: TenSEAL on Windows

File Structure

Workload Distribution

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Homomorphic Encryption for Private ML Inference

Project Overview

Motivation — Why Homomorphic Encryption?

What is Homomorphic Encryption?

Why CKKS?

CKKS Parameters — Technical Explanation

Noise Budget — A Key HE Concept

Why We Don't Use Encrypted Sigmoid

Results

Accuracy

Performance

Technical Challenges & Solutions

Challenge 1: Polynomial sigmoid divergence

Challenge 2: CKKS scale overflow

Challenge 3: TenSEAL on Windows

File Structure

Workload Distribution

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages