Support to express Q8_0 tensors as Tornado ByteArray #754

orionpapadakis · 2025-12-04T19:24:21Z

Description

This patch enhances the ByteArray class with support for HalfFloat operations and adds Q8_0 matrix-vector multiplication kernels that express the Q8_0 type as a unified ByteArray to the MatrixVectorRowMajor example . The new methods in ByteArray class enhance the manipulation (efficient loading, faster and simpler inference) for quantized transformer models with TornadoVM.

Key Features:

ByteArray HalfFloat Support: Added getHalfFloat() and setHalfFloat() methods to ByteArray for efficient HalfFloat data access with proper TornadoVM header offset handling.
Q8_0 Quantized Matrix-Vector Kernels : New GPU kernels for Q8_0 quantized matrix-vector multiplication with unified ByteArray memory layout.
Performance Improvement: The new kernels perform slightly better (1.05x-1.11x) than the previous approach in ROG laptop.

Problem description

The existing ByteArray class lacked support for operations, making it difficult to efficiently process quantized model weights that use mixed-precision attributes such as the Q8_0 quantization format that uses HalfFloat for scales and Int8 for quants.

Technical Details:

Q8_0 format uses 2-byte HalfFloat scales + 32-byte quant values per 32-element block.
The previous approach required separate arrays for scales and quants.
Unified ByteArray approach reduces memory overhead and improves cache efficiency.

Generated Native Code Verification

The new getHalfFloat() and setHalfFloat() methods correctly generate native half-precision loads for scales while maintaining efficient byte loads for quantized values:

OpenCL Kernel Snippet:

// HalfFloat scale loading - generates native half load
half_29  =  *((__global half *) ul_28);

// Quantized byte values - generates char loads  
ch_35  =  *((__global char *) ul_34);
ch_39  =  *((__global char *) ul_38);
ch_43  =  *((__global char *) ul_42);
ch_47  =  *((__global char *) ul_46);

PTX Kernel Snippet:

// HalfFloat scale loading - generates native b16 load
ld.global.b16	rfh0, [rud4];

// Quantized byte values - generates s8 loads
ld.global.s8	rsb0, [rud5];
ld.global.s8	rsb1, [rud6];
ld.global.s8	rsb2, [rud7];
ld.global.s8	rsb3, [rud8];

Backend/s tested

Mark the backends affected by this PR.

OpenCL
PTX
SPIRV

OS tested

Mark the OS where this PR is tested.

Linux
OSx
Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

Yes
No

How to test the new patch?

Run MatrixVectorRowMajor Benchmark:

tornado -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixVectorRowMajor

Expected output should show:

Validation PASSED ✓
Q8 ByteArray performance > Q8 Vectorized performance

…port and alignments

…n and benchmarks

mikepapadim · 2025-12-04T19:43:46Z

\rerun help

mikepapadim · 2025-12-04T19:44:05Z

/rerun help

github-actions · 2025-12-04T19:44:23Z

🔄 Rerun Workflow Commands

Command	Description
`/rerun`	Rerun only failed/cancelled/timed-out workflows
`/rerun all`	Rerun all workflows for this PR
`/rerun failed`	Same as `/rerun`
`/rerun <name>`	Rerun workflows matching `<name>` (e.g. `/rerun ci`, `/rerun build`)
`/rerun help`	Show this help message

Note: Only completed workflows can be rerun. In-progress workflows are skipped.

mikepapadim · 2025-12-04T19:44:59Z

/rerun

github-actions · 2025-12-04T19:45:10Z

🚀 Workflow rerun started

Mode: failed
Triggered by: @mikepapadim

View Actions

github-actions · 2025-12-04T19:45:11Z

✅ Workflow rerun success

View Actions

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java

mikepapadim

Add also a uni-test, except the example to be tested in CI.

Copilot

Pull request overview

This PR enhances TornadoVM's quantization support by adding HalfFloat operations to the ByteArray class and implementing Q8_0 matrix-vector multiplication kernels that use a unified ByteArray memory layout. This eliminates the need for separate arrays for scales and quantized values, improving memory efficiency and cache utilization for quantized transformer models.

Key Changes:

Added getHalf() and setHalf() methods to ByteArray for reading/writing HalfFloat values at byte-aligned offsets
Implemented new Q8_0 ByteArray kernel (matrixVectorGenericQ8Byte) that stores scales and quantized values in a single contiguous array
Extended the quantizeWeightsToQ8() method to populate both the existing vectorized format and the new unified ByteArray format

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File	Description
`tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java`	Adds HalfFloat support with `getHalf()`/`setHalf()` methods including 2-byte alignment validation and proper memory segment indexing
`tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java`	Implements Q8_0 ByteArray kernels, updates quantization function to support both formats, adds benchmark setup and validation for the new approach

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java

...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java

stratika · 2025-12-05T10:29:53Z

I tested the PR in macOS and Linux (OpenCL, PTX) and it works. Can we add a unit-test that evaluates the new set/get methods?

orionpapadakis · 2025-12-05T11:31:31Z

I added some unit tests which are passing for both opencl , ptx and spirv.

To reproduce:

tornado-test -V uk.ac.manchester.tornado.unittests.api.TestByteArrayTypedAccess

...unittests/src/main/java/uk/ac/manchester/tornado/unittests/api/TestByteArrayTypedAccess.java

…ests/api/TestByteArrayTypedAccess.java

orionpapadakis added 3 commits December 4, 2025 20:27

Add setHalf and getHalf methods to ByteArray for half-float sup…

1613fa7

…port and alignments

Add unified Q8 ByteArray implementation for matrix-vector quantizatio…

3a97ae5

…n and benchmarks

Add JAVA_SHORT import and HalfFloat type association in ByteArray

7812fc4

mikepapadim requested review from Copilot, mairooni and mikepapadim and removed request for Copilot and mairooni December 4, 2025 19:45

mikepapadim requested review from Copilot and stratika December 4, 2025 19:45

mikepapadim self-assigned this Dec 4, 2025

Copilot started reviewing on behalf of mikepapadim December 4, 2025 19:45 View session

mikepapadim reviewed Dec 4, 2025

View reviewed changes

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java Outdated Show resolved Hide resolved

mikepapadim reviewed Dec 4, 2025

View reviewed changes

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java Outdated Show resolved Hide resolved

mikepapadim reviewed Dec 4, 2025

View reviewed changes

Copilot finished reviewing on behalf of mikepapadim December 4, 2025 19:48

Copilot AI reviewed Dec 4, 2025

View reviewed changes

orionpapadakis added 3 commits December 5, 2025 10:47

Cleanup

af03a86

Add javadoc

615bb9b

Rename setHalf to setHalfFloat in ByteArray for consistency.

70ab86b

stratika added the enhancement New feature or request label Dec 5, 2025

Rename getHalf to getHalfFloat in ByteArray for consistency.

a7f9564

stratika reviewed Dec 5, 2025

View reviewed changes

...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java Outdated Show resolved Hide resolved

tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java Outdated Show resolved Hide resolved

orionpapadakis added 4 commits December 5, 2025 13:04

Add unit tests for HalfFloat typed operations in ByteArray

eb7999e

Unfold imported packages

41917f1

Add license header to TestByteArrayTypedAccess

6cf592c

Register TestByteArrayTypedAccess to test suite

cb7bf28

stratika reviewed Dec 5, 2025

View reviewed changes

...unittests/src/main/java/uk/ac/manchester/tornado/unittests/api/TestByteArrayTypedAccess.java Outdated Show resolved Hide resolved

mikepapadim approved these changes Dec 5, 2025

View reviewed changes

Update tornado-unittests/src/main/java/uk/ac/manchester/tornado/unitt…

c52f3b8

…ests/api/TestByteArrayTypedAccess.java

stratika approved these changes Dec 5, 2025

View reviewed changes

mikepapadim merged commit f3e6694 into beehive-lab:develop Dec 5, 2025
5 checks passed

Support to express Q8_0 tensors as Tornado ByteArray #754

Support to express Q8_0 tensors as Tornado ByteArray #754

Uh oh!

Conversation

orionpapadakis commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Features:

Problem description

Technical Details:

Generated Native Code Verification

OpenCL Kernel Snippet:

PTX Kernel Snippet:

Backend/s tested

OS tested

Did you check on FPGAs?

How to test the new patch?

Uh oh!

mikepapadim commented Dec 4, 2025

Uh oh!

mikepapadim commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

🔄 Rerun Workflow Commands

Uh oh!

mikepapadim commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

mikepapadim left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stratika commented Dec 5, 2025

Uh oh!

orionpapadakis commented Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

orionpapadakis commented Dec 4, 2025 •

edited

Loading