-
Notifications
You must be signed in to change notification settings - Fork 125
Support to express Q8_0 tensors as Tornado ByteArray #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to express Q8_0 tensors as Tornado ByteArray #754
Conversation
|
\rerun help |
|
/rerun help |
🔄 Rerun Workflow Commands
Note: Only completed workflows can be rerun. In-progress workflows are skipped. |
|
/rerun |
|
🚀 Workflow rerun started Mode: |
|
✅ Workflow rerun success |
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java
Outdated
Show resolved
Hide resolved
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java
Outdated
Show resolved
Hide resolved
mikepapadim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add also a uni-test, except the example to be tested in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances TornadoVM's quantization support by adding HalfFloat operations to the ByteArray class and implementing Q8_0 matrix-vector multiplication kernels that use a unified ByteArray memory layout. This eliminates the need for separate arrays for scales and quantized values, improving memory efficiency and cache utilization for quantized transformer models.
Key Changes:
- Added
getHalf()andsetHalf()methods toByteArrayfor reading/writing HalfFloat values at byte-aligned offsets - Implemented new Q8_0 ByteArray kernel (
matrixVectorGenericQ8Byte) that stores scales and quantized values in a single contiguous array - Extended the
quantizeWeightsToQ8()method to populate both the existing vectorized format and the new unified ByteArray format
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java |
Adds HalfFloat support with getHalf()/setHalf() methods including 2-byte alignment validation and proper memory segment indexing |
tornado-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java |
Implements Q8_0 ByteArray kernels, updates quantization function to support both formats, adds benchmark setup and validation for the new approach |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java
Outdated
Show resolved
Hide resolved
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java
Outdated
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Outdated
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Show resolved
Hide resolved
...o-examples/src/main/java/uk/ac/manchester/tornado/examples/compute/MatrixVectorRowMajor.java
Outdated
Show resolved
Hide resolved
tornado-api/src/main/java/uk/ac/manchester/tornado/api/types/arrays/ByteArray.java
Outdated
Show resolved
Hide resolved
|
I tested the PR in |
|
I added some unit tests which are passing for both To reproduce: tornado-test -V uk.ac.manchester.tornado.unittests.api.TestByteArrayTypedAccess |
...unittests/src/main/java/uk/ac/manchester/tornado/unittests/api/TestByteArrayTypedAccess.java
Outdated
Show resolved
Hide resolved
…ests/api/TestByteArrayTypedAccess.java
Description
This patch enhances the
ByteArrayclass with support forHalfFloatoperations and adds Q8_0 matrix-vector multiplication kernels that express the Q8_0 type as a unifiedByteArrayto theMatrixVectorRowMajorexample . The new methods inByteArrayclass enhance the manipulation (efficient loading, faster and simpler inference) for quantized transformer models with TornadoVM.Key Features:
getHalfFloat()andsetHalfFloat()methods toByteArrayfor efficientHalfFloatdata access with proper TornadoVM header offset handling.ByteArraymemory layout.Problem description
The existing
ByteArrayclass lacked support for operations, making it difficult to efficiently process quantized model weights that use mixed-precision attributes such as the Q8_0 quantization format that usesHalfFloatfor scales andInt8for quants.Technical Details:
HalfFloatscales + 32-byte quant values per 32-element block.Generated Native Code Verification
The new
getHalfFloat()andsetHalfFloat()methods correctly generate native half-precision loads for scales while maintaining efficient byte loads for quantized values:OpenCL Kernel Snippet:
PTX Kernel Snippet:
Backend/s tested
Mark the backends affected by this PR.
OS tested
Mark the OS where this PR is tested.
Did you check on FPGAs?
If it is applicable, check your changes on FPGAs.
How to test the new patch?
Expected output should show: