Tensara Logo

tensara

All Problems

MXFP4 Quantization

MEDIUM

Quantize an input FP32 matrix into MXFP4 (Microscaling FP4) using TorchAO's MXTensor reference path.

The quantization contract uses:

  • Block size of 32 elements along the K dimension.
  • Per-block E8M0 scales.
  • FP4 E2M1 data bytes.

For more information regarding the MXFP4 format, check out the MXFP4 specification.

Input

  • aa: fp32 pointer to a row-major tensor of shape M×KM \times K
  • MM, KK: dimensions of aa (with KK divisible by 32)

Output

  • qq: uint8 pointer, MXFP4 payload bytes (packed E2M1 values) of shape M×K/2M \times K/2
  • scalescale: uint8 pointer, per-block E8M0 scale bytes in row-major layout of shape M×K/32M \times K/32

Notes

  • The required layout is row-major blocked order (no additional swizzle).
  • Verification dequantizes both reference and submitted outputs via TorchAO MXTensor dequantization and checks closeness.

Test Case Sizes

  • 1024 x 1024
  • 2048 x 2048
  • 4096 x 8192
  • 8192 x 4096
Console

Sample Run Results

Hit "Run" to test your code with sample inputs

Loading...

Loading editor...

CUDA C++ environment

Desktop Required for Code Submission

For the best coding experience, please switch to a desktop device to write and submit your solution.