Skip to content

Conversation

@orionpapadakis
Copy link
Contributor

@orionpapadakis orionpapadakis commented Dec 4, 2025

Description

This patch enhances the ByteArray class with support for HalfFloat operations and adds Q8_0 matrix-vector multiplication kernels that express the Q8_0 type as a unified ByteArray to the MatrixVectorRowMajor example . The new methods in ByteArray class enhance the manipulation (efficient loading, faster and simpler inference) for quantized transformer models with TornadoVM.

Key Features:
  1. ByteArray HalfFloat Support: Added getHalfFloat() and setHalfFloat() methods to ByteArray for efficient HalfFloat data access with proper TornadoVM header offset handling.
  2. Q8_0 Quantized Matrix-Vector Kernels : New GPU kernels for Q8_0 quantized matrix-vector multiplication with unified ByteArray memory layout.
  3. Performance Improvement: The new kernels perform slightly better (1.05x-1.11x) than the previous approach in ROG laptop.

Problem description

The existing ByteArray class lacked support for operations, making it difficult to efficiently process quantized model weights that use mixed-precision attributes such as the Q8_0 quantization format that uses HalfFloat for scales and Int8 for quants.

Technical Details:

  • Q8_0 format uses 2-byte HalfFloat scales + 32-byte quant values per 32-element block.
  • The previous approach required separate arrays for scales and quants.
  • Unified ByteArray approach reduces memory overhead and improves cache efficiency.
Generated Native Code Verification

The new getHalfFloat() and setHalfFloat() methods correctly generate native half-precision loads for scales while maintaining efficient byte loads for quantized values:

OpenCL Kernel Snippet:
// HalfFloat scale loading - generates native half load
half_29  =  *((__global half *) ul_28);

// Quantized byte values - generates char loads  
ch_35  =  *((__global char *) ul_34);
ch_39  =  *((__global char *) ul_38);
ch_43  =  *((__global char *) ul_42);
ch_47  =  *((__global char *) ul_46);
PTX Kernel Snippet:
// HalfFloat scale loading - generates native b16 load
ld.global.b16	rfh0, [rud4];

// Quantized byte values - generates s8 loads
ld.global.s8	rsb0, [rud5];
ld.global.s8	rsb1, [rud6];
ld.global.s8	rsb2, [rud7];
ld.global.s8	rsb3, [rud8];

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

  1. Run MatrixVectorRowMajor Benchmark:
tornado -m tornado.examples/uk.ac.manchester.tornado.examples.compute.MatrixVectorRowMajor

Expected output should show:

  • Validation PASSED ✓
  • Q8 ByteArray performance > Q8 Vectorized performance

@mikepapadim
Copy link
Member

\rerun help

@mikepapadim
Copy link
Member

/rerun help

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

🔄 Rerun Workflow Commands

Command Description
/rerun Rerun only failed/cancelled/timed-out workflows
/rerun all Rerun all workflows for this PR
/rerun failed Same as /rerun
/rerun <name> Rerun workflows matching <name> (e.g. /rerun ci, /rerun build)
/rerun help Show this help message

Note: Only completed workflows can be rerun. In-progress workflows are skipped.

@mikepapadim
Copy link
Member

/rerun

@mikepapadim mikepapadim requested review from Copilot, mairooni and mikepapadim and removed request for Copilot and mairooni December 4, 2025 19:45
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

🚀 Workflow rerun started

Mode: failed
Triggered by: @mikepapadim

View Actions

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

Workflow rerun success

View Actions

Copy link
Member

@mikepapadim mikepapadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also a uni-test, except the example to be tested in CI.

Copilot finished reviewing on behalf of mikepapadim December 4, 2025 19:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances TornadoVM's quantization support by adding HalfFloat operations to the ByteArray class and implementing Q8_0 matrix-vector multiplication kernels that use a unified ByteArray memory layout. This eliminates the need for separate arrays for scales and quantized values, improving memory efficiency and cache utilization for quantized transformer models.

Key Changes:

  • Added getHalf() and setHalf() methods to ByteArray for reading/writing HalfFloat values at byte-aligned offsets
  • Implemented new Q8_0 ByteArray kernel (matrixVectorGenericQ8Byte) that stores scales and quantized values in a single contiguous array
  • Extended the quantizeWeightsToQ8() method to populate both the existing vectorized format and the new unified ByteArray format

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File Description
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java Adds HalfFloat support with getHalf()/setHalf() methods including 2-byte alignment validation and proper memory segment indexing
tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java Implements Q8_0 ByteArray kernels, updates quantization function to support both formats, adds benchmark setup and validation for the new approach

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@stratika stratika added the enhancement New feature or request label Dec 5, 2025
@stratika
Copy link
Collaborator

stratika commented Dec 5, 2025

I tested the PR in macOS and Linux (OpenCL, PTX) and it works. Can we add a unit-test that evaluates the new set/get methods?

@orionpapadakis
Copy link
Contributor Author

I added some unit tests which are passing for both opencl , ptx and spirv.

To reproduce:

tornado-test -V uk.ac.manchester.tornado.unittests.api.TestByteArrayTypedAccess

@mikepapadim mikepapadim merged commit f3e6694 into beehive-lab:develop Dec 5, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants