matlab-optimize-performance

star 604

Read BEFORE optimizing any MATLAB code for speed. Without this workflow, agents commonly optimize the wrong target, fabricate speedup claims without measurement, or introduce regressions. Guides the 7-step workflow: baseline, profile, identify, optimize, measure, verify, report.

matlab By matlab schedule Updated 6/4/2026

name: matlab-optimize-performance description: "Read BEFORE optimizing any MATLAB code for speed. Without this workflow, agents commonly optimize the wrong target, fabricate speedup claims without measurement, or introduce regressions. Guides the 7-step workflow: baseline, profile, identify, optimize, measure, verify, report." license: MathWorks BSD-3-Clause metadata: author: MathWorks version: "1.0"

MATLAB Performance Optimization Workflow

Systematic 7-step workflow for finding and fixing performance bottlenecks in MATLAB code.

When to Use

  • User asks to speed up or optimize MATLAB code
  • User wants to find why their MATLAB code is slow
  • User has a function or script that takes too long to run
  • User asks to benchmark or time MATLAB code
  • User wants to compare performance before and after a change
  • User asks about MATLAB performance best practices

When NOT to Use

  • Optimizing Simulink model simulation speed (use Simulink Profiler)
  • The bottleneck is in compiled C/MEX code that can't be changed at the M-code level
  • The performance issue is purely I/O-bound (file reads, network, database)
  • User wants to write performance tests (use the writing-matlab-perf-tests skill)

The 7-Step Workflow

Step 1: Establish Baseline

Measure current performance so you have a number to improve against.

For a single function:

f = @() targetFunction(input1, input2);
baseline = timeit(f);
fprintf('Baseline: %.4f s\n', baseline);

For GPU code:

f = @() gpuFunction(gpuInput);
baseline = gputimeit(f);

For a script or multi-step workflow:

% Warmup run (first call includes JIT compilation)
myWorkflow(inputs);

% Timed run
tic;
myWorkflow(inputs);
baseline = toc;
fprintf('Baseline: %.4f s\n', baseline);

timeit is preferred because it handles warmup and runs multiple samples automatically.

Step 2: Profile and Analyze

Find where the time is actually spent. Do NOT guess — always profile.

profile on;
targetFunction(input1, input2);
profile off;
profile viewer;

Reading profiler results:

  1. Function summary — shows total time and self-time per function. Self-time is time spent in that function, not its callees. Start with the highest self-time.
  2. Per-line detail — click a function name to see time spent on each line. This reveals the exact bottleneck lines.
  3. Call count — functions called thousands/millions of times are prime optimization targets.

Tips:

  • Run the profiled code multiple times (in a loop) if it's very fast, so the profiler collects enough samples
  • Look at self-time, not total time, to find the true bottleneck
  • Drill into functions — the summary page only tells part of the story

Step 3: Identify Optimization Opportunities

Based on profiling results, identify which patterns apply. Read references/optimization-patterns.md for the full catalog.

High-impact patterns:

Pattern Typical Speedup Look For
Vectorization 2–200x Loops doing element-wise math on arrays
Preallocation 2–100x Arrays growing inside loops (x = [x; newRow])
Unnecessary recomputation 2–50x Same expensive expression computed multiple times
discretize/histcounts 2–50x Loops binning or classifying data
Persistent caching 1.5–95x Repeated load() or expensive object creation
Logical indexing 1.2–5x Using find() just to index into an array
arguments block 1.1–1.8x Functions using inputParser
Algebraic simplification 1.5–3x Redundant sqrt, abs, or matrix ops

Before optimizing, verify the target is worth it:

  • Is self-time > 10% of total? If not, optimizing it won't matter much.
  • Is it called in a tight loop? High call count × small time = big total.
  • Is it M-code or a built-in? You can't make a built-in faster, but you can often call it fewer times (e.g., pass a matrix to filtfilt/filter instead of looping over columns).

Step 4: Implement Optimizations

Apply the patterns identified in Step 3. See references/optimization-patterns.md for the full catalog with before/after code examples.

General principles:

  • Start with the highest-impact pattern from profiling
  • Move invariant work out of loops (object creation, option parsing, constant expressions)
  • Replace element-wise loops with array operations where possible
  • Use purpose-built functions (discretize, cumsum, hypot) instead of hand-written equivalents
  • For large data, batch the vectorization to control memory (see Pattern 9 in catalog)

Example — move invariant work out of loops:

% Before: repeated expensive setup
for i = 1:n
    opts = optimoptions('fminunc', 'Display', 'off');
    result(i) = fminunc(@(x) cost(x, data(i)), x0, opts);
end

% After: setup once
opts = optimoptions('fminunc', 'Display', 'off');
for i = 1:n
    result(i) = fminunc(@(x) cost(x, data(i)), x0, opts);
end

Step 5: Measure Optimized Performance

Re-measure using the same method as Step 1:

f = @() optimizedFunction(input1, input2);
optimized = timeit(f);
speedup = baseline / optimized;
fprintf('Optimized: %.4f s (%.2fx speedup)\n', optimized, speedup);

A speedup of 1.2x or more is considered significant. Below that, measurement noise makes it hard to be confident the change helped.

Step 6: Verify Correctness

Every optimization must produce the same results as the original:

original = originalFunction(input1, input2);
fast = optimizedFunction(input1, input2);

% Numeric comparison (allows floating-point tolerance)
maxErr = max(abs(original(:) - fast(:)));
fprintf('Max error: %.2e\n', maxErr);
assert(maxErr < 1e-10, 'Results differ beyond tolerance!');

For non-numeric outputs:

assert(isequal(original, fast), 'Results differ!');

If results differ slightly due to floating-point reordering (e.g., summing in a different order), that's usually acceptable. Document the expected tolerance.

Step 7: Report Results

Summarize what was done and the improvement achieved:

fprintf('\n=== Performance Optimization Report ===\n');
fprintf('Target: %s\n', funcName);
fprintf('Baseline: %.4f s\n', baseline);
fprintf('Optimized: %.4f s\n', optimized);
fprintf('Speedup: %.2fx\n', speedup);
fprintf('Correctness: max error = %.2e\n', maxErr);
fprintf('Pattern applied: %s\n', patternName);

For multiple optimizations, report each speedup individually and the overall end-to-end improvement.

Key Rules

  1. Always profile before optimizing — never guess where the bottleneck is
  2. One change at a time — measure after each optimization to know what helped
  3. Verify correctness — every optimization must produce equivalent output
  4. 1.2x threshold — speedups below 1.2x are not reliably distinguishable from noise
  5. GPU timing — always wait(gpuDevice) before and after timing GPU code
  6. Use timeit — it handles warmup and averaging; avoid raw tic/toc for benchmarks

Common Mistakes

Mistake Why It's Wrong Do This Instead
Optimizing without profiling You'll fix the wrong thing Profile first (Step 2)
Single tic/toc without warmup Includes JIT compilation time Use timeit or add a warmup call
Timing GPU code without sync GPU ops are async; toc fires early wait(gpuDevice) before and after
Growing arrays in loops Each append copies the entire array Preallocate before the loop
Vectorizing huge arrays blindly May exceed memory Use chunked processing for large data
Reporting only subfunction speedup Misleading if subfunction is 5% of total Always report end-to-end timing
Assuming faster = correct Bugs can make code fast (by skipping work) Always verify results match (Step 6)

Reference Files

  • references/optimization-patterns.md — Full catalog of optimization patterns with code examples and measured speedups
  • references/measurement-templates.md — Ready-to-use MATLAB script templates for each workflow step

Copyright 2026 The MathWorks, Inc.

Install via CLI
npx skills add https://github.com/matlab/matlab-agentic-toolkit --skill matlab-optimize-performance
Repository Details
star Stars 604
call_split Forks 76
navigation Branch main
article Path SKILL.md
More from Creator