name: onnx-tensor-math description: "Reference guide and workflow for highly optimized image-to-tensor preprocessing for ONNX models (specifically RT-DETR) in .NET."
onnx-tensor-math Skill
Context
Slice 7 requires extracting a PDF page as an image and converting it into a normalized [1, 3, H, W] float32 tensor for ONNX inference (RT-DETR). Image processing in .NET can be a massive performance bottleneck if done naively (e.g., GetPixel()). This skill defines the strict, high-performance workflow for creating these tensors.
Trigger
Use this skill when implementing the apply_layout_inference pipeline stage, specifically when building the image extraction and tensor normalization logic for OnnxLayoutProvider.
The RT-DETR Preprocessing Specification
According to the upstream preprocessor_config.json for docling-layout-heron-onnx:
- Target Size:
H = 640,W = 640 - Format: RGB (Channels First:
[Batch, Channel, Height, Width]) - Rescale Factor:
1/255.0(Scale pixel values from[0, 255]to[0.0, 1.0]) - Mean (RGB):
[0.485, 0.456, 0.406] - Std (RGB):
[0.229, 0.224, 0.225] - Padding: None (
do_pad = false) - Formula:
tensor_val = ((pixel_val / 255.0) - mean) / std
Workflow
1. Choose the Image Library
Do NOT use System.Drawing.Common (it is Windows-only and slow).
DO use SkiaSharp. It is cross-platform, fast, and provides direct memory access.
2. The Vectorized Extraction Pattern
To convert a SKBitmap to a DenseTensor<float> efficiently, you must use unsafe pointers to bypass bounds checking and object allocation overhead.
// Example high-performance pattern
using SkiaSharp;
using Microsoft.ML.OnnxRuntime.Tensors;
public static DenseTensor<float> CreateNormalizedTensor(SKBitmap bitmap)
{
// 1. Resize to target (640x640)
using var resized = bitmap.Resize(new SKImageInfo(640, 640), SKFilterQuality.Medium);
// Ensure format is exactly 8888 (RGBA or BGRA)
// ...
var tensor = new DenseTensor<float>(new[] { 1, 3, 640, 640 });
// Pre-calculate constants
const float scale = 1f / 255f;
float meanR = 0.485f, meanG = 0.456f, meanB = 0.406f;
float stdR = 0.229f, stdG = 0.224f, stdB = 0.225f;
int width = 640;
int height = 640;
int channelStride = width * height;
unsafe
{
// 2. Get direct pointer to pixel memory
byte* srcPtr = (byte*)resized.GetPixels().ToPointer();
// 3. Get direct pointer to tensor memory (using Span/MemoryMarshal or indexing)
// ... (Implementation specific)
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
// Calculate source index (assuming 4 bytes per pixel: R, G, B, A)
int srcIdx = (y * width + x) * 4;
// Extract R, G, B (Handling BGRA vs RGBA depending on Skia platform defaults!)
byte r = srcPtr[srcIdx]; // (Or srcIdx+2 if BGRA)
byte g = srcPtr[srcIdx + 1];
byte b = srcPtr[srcIdx + 2]; // (Or srcIdx if BGRA)
// Normalize and assign to contiguous channel planes
int destIdxR = 0 * channelStride + (y * width + x);
int destIdxG = 1 * channelStride + (y * width + x);
int destIdxB = 2 * channelStride + (y * width + x);
// Assuming tensor is backed by a 1D array we can index into
// tensorBuffer[destIdxR] = ((r * scale) - meanR) / stdR;
}
}
}
return tensor;
}
3. Critical Edge Cases to Validate
- Color Type (BGRA vs RGBA):
SkiaSharpdefaults toSKColorType.Bgra8888on Windows butRgba8888on Android/Linux. Always explicitly checkbitmap.ColorTypeor convert it to a known type before blindly indexing[srcIdx + 2]. - Memory Leaks: Always
Dispose()or useusingstatements forSKBitmap,SKImage, andSKData. - Bounding Box Descaling: The ONNX model will output bounding boxes relative to the
[640, 640]space. You must multiply these coordinates by(OriginalWidth / 640.0)and(OriginalHeight / 640.0)to map them back to the PDF's native coordinate system before passing them to theLayoutPostprocessor.