apache-arrow-js

star 0

This skill should be used when working with Apache Arrow in JavaScript/TypeScript — Table, Vector, Schema, IPC serialization, columnar data, RecordBatch, Flight SQL, Flechette, Arquero, Mosaic, DuckDB-WASM, or imports from 'apache-arrow', '@uwdata/flechette', '@uwdata/vgplot', 'arquero'. Covers zero-copy patterns, typed arrays, builder API, Arrow Flight, and the Arrow JS ecosystem.

AI-Riksarkivet By AI-Riksarkivet schedule Updated 3/24/2026

name: Apache Arrow JS description: This skill should be used when working with Apache Arrow in JavaScript/TypeScript — Table, Vector, Schema, IPC serialization, columnar data, RecordBatch, Flight SQL, Flechette, Arquero, Mosaic, DuckDB-WASM, or imports from 'apache-arrow', '@uwdata/flechette', '@uwdata/vgplot', 'arquero'. Covers zero-copy patterns, typed arrays, builder API, Arrow Flight, and the Arrow JS ecosystem. user-invocable: true context: current

Apache Arrow JS (v21+)

Columnar in-memory data format for high-performance analytics in JavaScript/TypeScript.

Core Concepts

Arrow stores data in columns (Vectors), not rows. A Table is a collection of named Vectors sharing a Schema. Data is stored as flat TypedArrays — no per-row object allocation, zero-copy slicing, and direct GPU/WASM interop.

Table (Schema + RecordBatches)
  ├── Vector<Int32>     "id"      → Int32Array
  ├── Vector<Utf8>      "name"    → Uint8Array (UTF-8 bytes) + offsets
  └── Vector<Float32>   "score"   → Float32Array

Creating Tables

From JS arrays (copies data, infers types)

import { tableFromArrays } from "apache-arrow"

const table = tableFromArrays({
  id: [1, 2, 3],
  name: ["Alice", "Bob", "Carol"],
  score: new Float32Array([0.9, 0.7, 0.85]),
})

From typed arrays (zero-copy)

import { makeTable } from "apache-arrow"

const table = makeTable({
  x: new Float32Array([10, 20, 30]),
  y: new Float32Array([40, 50, 60]),
})

From JSON objects

import { tableFromJSON } from "apache-arrow"

const table = tableFromJSON([
  { a: "foo", b: 42 },
  { a: "bar", b: 12 },
])

From IPC binary (deserialization)

import { tableFromIPC } from "apache-arrow"

// From Uint8Array
const table = tableFromIPC(bytes)

// From fetch Response (async)
const table = await tableFromIPC(fetch("/data.arrow"))

// From multiple IPC chunks
const table = tableFromIPC([schemaBytes, recordBytes])

With explicit Schema

import { Table, Schema, Field, Float32, Utf8, tableFromArrays } from "apache-arrow"

const schema = new Schema([
  new Field("x", new Float32()),
  new Field("label", new Utf8()),
])

// Schema can carry metadata (Map<string, string>)
const schemaWithMeta = new Schema(fields, new Map([
  ["image_base64", base64String],
  ["page_id", "page-001"],
]))

Serializing Tables

import { tableToIPC } from "apache-arrow"

// Stream format (default — for network transfer)
const bytes: Uint8Array = tableToIPC(table, "stream")

// File format (for storage — has random access footer)
const bytes: Uint8Array = tableToIPC(table, "file")

Reading Tables

Column access

const col = table.getChild("name")       // Vector<Utf8> | null
const col2 = table.getChildAt(0)         // Vector by index
const arr = col.toArray()                // TypedArray (zero-copy for primitives)

Row access

const row = table.get(0)                 // StructRowProxy (object-like)
const row2 = table.at(-1)               // Negative index support
console.log(row.name, row.score)        // Property access

// Iterate all rows
for (const row of table) {
  console.log(row.id, row.name)
}

Table properties

table.numRows     // number of rows
table.numCols     // number of columns
table.schema      // Schema with fields + metadata
table.batches     // RecordBatch[] chunks
table.nullCount   // total nulls across all columns

Slicing & selecting (zero-copy)

const subset = table.slice(10, 20)              // rows 10-19
const projected = table.select(["name", "score"]) // column subset
const combined = table.concat(otherTable)        // append rows
const merged = table.assign(extraColumns)        // add columns

Modifying columns

// Returns NEW table (immutable)
const updated = table.setChild("name", newNameVector)
const replaced = table.setChildAt(0, newVector)

Vectors

Creating Vectors

import { makeVector, vectorFromArray } from "apache-arrow"

// Zero-copy from TypedArray
const v = makeVector(new Float32Array([1, 2, 3]))

// From JS array (copies, infers type)
const nums = vectorFromArray([1, 2, 3])           // Float64
const ints = vectorFromArray([1, 2, 3], new Int8)  // Int8
const strs = vectorFromArray(["a", "b", "c"])      // Dictionary<Utf8, Int32>
const bools = vectorFromArray([true, false])       // Bool
const dates = vectorFromArray([new Date()])        // TimestampMillisecond
const structs = vectorFromArray([{x: 1}, {x: 2}]) // Struct

Vector API

vector.get(0)           // T["TValue"] | null
vector.at(-1)           // negative index
vector.set(0, value)    // mutate in place
vector.isValid(0)       // null check
vector.length           // element count
vector.nullCount        // null count
vector.type             // DataType instance
vector.toArray()        // TypedArray or JS Array
vector.slice(0, 10)     // zero-copy sub-vector
vector.indexOf(value)   // find index
vector.includes(value)  // containment check

// Memoize decoded values (useful for Utf8 — caches string decode)
const memo = vector.memoize()
const original = memo.unmemoize()

Dictionary Vectors (string encoding)

import { makeVector, vectorFromArray, Dictionary, Utf8, Uint8 } from "apache-arrow"

// Auto-dictionary from string array
const dict = vectorFromArray(["foo", "bar", "foo", "baz"])

// Manual dictionary construction
const keys = vectorFromArray(["foo", "bar", "baz"], new Utf8)
const dictVec = makeVector({
  data: [0, 1, 2, 0, 1],  // indices into dictionary
  dictionary: keys,
  type: new Dictionary(new Utf8, new Uint8),
})

Builder API

For constructing Vectors incrementally:

import { makeBuilder, Utf8 } from "apache-arrow"

const builder = makeBuilder({
  type: new Utf8(),
  nullValues: [null, "n/a"],  // values treated as null
})

builder.append("hello")
builder.append("n/a")    // becomes null
builder.append("world")
builder.append(null)      // null

const vector = builder.finish().toVector()
// ["hello", null, "world", null]

Chunked building with iterables

import { builderThroughIterable, Float64 } from "apache-arrow"

const through = builderThroughIterable({
  type: new Float64(),
  highWaterMark: 1000,
  queueingStrategy: "count",
})

// Feed values, get chunked Vectors
for (const vector of through(valueIterator)) {
  // each vector has up to 1000 elements
}

Schema & Metadata

import { Schema, Field, Float32, Utf8 } from "apache-arrow"

// Schema with metadata
const schema = new Schema(
  [new Field("x", new Float32()), new Field("label", new Utf8())],
  new Map([["version", "1"], ["author", "system"]])
)

// Read metadata
const meta = table.schema.metadata  // Map<string, string>
const version = meta.get("version")

// Preserve metadata when rebuilding tables
const newSchema = new Schema(newFields, existingTable.schema.metadata)
const newTable = new Table(newSchema, newBatches)

RecordBatch Reader/Writer

Reading IPC streams

import { RecordBatchReader } from "apache-arrow"

// From various sources
const reader = await RecordBatchReader.from(uint8Array)
const reader = await RecordBatchReader.from(fetch("/data.arrow"))
const reader = await RecordBatchReader.from(readableStream)

for (const batch of reader) {
  console.log(batch.numRows, batch.schema)
}

Writing IPC streams

import { RecordBatchWriter } from "apache-arrow"

const writer = new RecordBatchWriter()
writer.write(table)
writer.finish()

// Get bytes
const bytes = writer.toUint8Array(true)  // sync

// Stream to DOM ReadableStream
const stream = writer.toDOMStream()

Data Types Reference

Arrow Type JS Type TypedArray
Int8/16/32 number Int8/16/32Array
Uint8/16/32 number Uint8/16/32Array
Int64/Uint64 bigint BigInt64/BigUint64Array
Float16/32/64 number Uint16/Float32/Float64Array
Bool boolean Uint8Array (bit-packed)
Utf8 string Uint8Array + offsets
Binary Uint8Array Uint8Array + offsets
List<T> T[] nested Vector
Struct<{...}> StructRowProxy nested Vectors
Dictionary<V,K> V["TValue"] key array + dictionary Vector
Timestamp* number Int32Array (pair)
Date* number Int32Array
Null null (no buffer)

Type guards

import { DataType } from "apache-arrow"

DataType.isFloat(field.type)   // true for Float16/32/64
DataType.isInt(field.type)     // true for all Int types
DataType.isUtf8(field.type)
DataType.isList(field.type)
DataType.isStruct(field.type)
DataType.isDictionary(field.type)
// ... etc for all types

Performance Patterns

Zero-copy column access

// FAST: Direct typed array view (zero allocation for single-chunk primitives)
const xArr = table.getChild("x")!.toArray() as Float32Array

// SLOW: Row iteration (allocates proxy objects)
for (const row of table) { /* avoid in hot loops */ }

Avoid per-row allocation

// SLOW
const results = []
for (let i = 0; i < table.numRows; i++) {
  results.push(table.get(i))  // allocates StructRowProxy each time
}

// FAST: Column-oriented access
const xArr = table.getChild("x")!.toArray() as Float32Array
const yArr = table.getChild("y")!.toArray() as Float32Array
for (let i = 0; i < table.numRows; i++) {
  process(xArr[i], yArr[i])  // no allocation
}

Memoize string decoding

// Utf8 Vector decodes UTF-8 bytes on every .get() call
const col = table.getChild("text")!
const memo = col.memoize()  // cache decoded strings
for (let i = 0; i < memo.length; i++) {
  memo.get(i)  // cached after first access
}

Rebuild table without full copy

// Build column arrays, then tableFromArrays (single rebuild)
const cols: Record<string, unknown[]> = {}
for (const field of table.schema.fields) {
  const col = table.getChild(field.name)!
  const arr = new Array(table.numRows)
  for (let i = 0; i < table.numRows; i++) arr[i] = col.get(i)
  cols[field.name] = arr
}
const rebuilt = tableFromArrays(cols)

// Preserve schema metadata
const withMeta = new Table(
  new Schema(rebuilt.schema.fields, originalTable.schema.metadata),
  rebuilt.batches,
)

Reference Documents

  • ./references/zero-copy.md — What is/isn't zero-copy, performance decision tree, golden rules
  • ./references/reading-writing.md — Column access, row access, rebuilding tables, add/delete/update rows, streaming
  • ./references/types.md — Type hierarchy, constructors, type guards, Field/Schema, vectorFromArray inference
  • ./references/anti-patterns.md — 10 common mistakes (row iteration, metadata loss, List gotchas, etc.)
  • ./references/ipc-patterns.md — IPC streaming, chunked loading, HTTP transfer, delta updates
  • ./references/flight-sql.md — Flight SQL client, gRPC, query execution
  • ./references/flechette.md — Flechette: faster/smaller Arrow alternative, useProxy, toColumns, compression
  • ./references/arquero.md — Arquero: dplyr-style data wrangling on Arrow (derive, filter, join, rollup)
  • ./references/mosaic.md — Mosaic: scalable interactive viz framework (Coordinator, vgplot, SQL, cross-filtering)
  • ./references/duckdb-wasm.md — DuckDB-WASM + Arrow in SvelteKit (SQL on Arrow in browser)
Install via CLI
npx skills add https://github.com/AI-Riksarkivet/ra-anno --skill apache-arrow-js
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
AI-Riksarkivet
AI-Riksarkivet Explore all skills →