latent-chain-of-thought-as-planning - SKILL.md Agent Skill

name: "latent-chain-of-thought-as-planning" description: "Decouple reasoning from verbalization using PLaT-inspired latent planning. Maintains a broad solution space through parallel latent trajectories before committing to a single answer. Use when: 'explore multiple solution paths', 'latent reasoning', 'plan before answering', 'search diverse solutions', 'PLaT reasoning', 'breadth-first problem solving'."

Latent Chain-of-Thought as Planning

This skill applies the core insight from PLaT (Planning with Latent Thoughts) to Claude's reasoning process: decouple planning from verbalization. Instead of committing to a single chain-of-thought narrative early, Claude maintains multiple candidate reasoning trajectories internally, probes each for viability, and only verbalizes the final grounded answer. This produces more robust solutions for complex problems by exploring a broader solution space before collapsing to a single output — trading premature precision for recall over valid approaches.

When to Use

When the user has a multi-step reasoning problem (math, logic, algorithm design) where the first plausible approach may not be optimal
When asked to find multiple valid solutions or explore a design space before committing
When debugging a problem where the root cause is ambiguous and several hypotheses need parallel investigation
When implementing an algorithm where there are competing design trade-offs (e.g., time vs. space, consistency vs. availability)
When the user says "explore options," "think through alternatives," "what are the different ways to solve this," or "don't just give me the first answer"
When building search or planning systems (e.g., game AI, route planning, test generation) that benefit from diverse candidate evaluation

Key Technique

PLaT's central insight is that reasoning and verbalization are distinct processes that should not be fused. Standard chain-of-thought forces the model to commit to a single narrative token by token — each word narrows the solution space irreversibly. PLaT instead models reasoning as a trajectory through continuous latent states (the Planner), and only converts those states to text through a separate Decoder when a decision point is reached. A "lazy decoding" probe checks whether the current latent state is ready to produce a final answer; if not, planning continues without generating intermediate text.

The practical consequence is reasoning diversity: PLaT maintains a higher branching factor throughout the reasoning process. While its single-best (greedy) accuracy can be lower than standard CoT, its Pass@k performance dominates — meaning the correct answer is almost always among the explored paths, even if it isn't the first one generated. On GSM8k with 128 samples, PLaT reaches 74.2% versus Coconut's 66.7%, demonstrating that breadth of exploration compensates for depth on any single trajectory.

For Claude (which cannot literally run latent states), we operationalize this by simulating the Planner-Decoder separation: reason through multiple parallel trajectories silently, evaluate each against a viability probe, and only verbalize the trajectory that best satisfies the user's constraints. This gives the user a well-explored answer without the noise of showing every dead end.

Step-by-Step Workflow

Parse the problem into a structured representation. Extract the goal, constraints, inputs, and expected output format. Identify whether the problem has a single correct answer or a solution space.
Identify the branching dimensions. Determine where the reasoning could fork — which data structure, which algorithm, which decomposition strategy, which API. List 2-4 candidate approaches explicitly (the "latent trajectories").
Advance each trajectory one step silently. For each candidate approach, mentally execute the next reasoning step without verbalizing. Check: does this step lead to a valid continuation, a dead end, or a constraint violation?
Probe for termination. At each step, ask: "Does the current state contain enough information to produce a correct final answer?" If yes for any trajectory, flag it as a candidate solution. If no trajectory is ready, continue advancing all surviving trajectories.
Prune infeasible trajectories. Drop any trajectory that violates a hard constraint, hits a known impossibility, or requires resources unavailable in the user's context (e.g., a library they don't have, a runtime they can't use).
Evaluate surviving trajectories against user priorities. Score remaining candidates on the user's implicit or explicit criteria: simplicity, performance, readability, compatibility, correctness confidence.
Verbalize only the selected trajectory. Present the winning approach as a clean, linear chain-of-thought. Do not dump the exploration process — the user sees a polished answer, not a search log.
Surface alternatives as options, not noise. If 2+ trajectories are comparably strong, briefly mention the runner-up with a one-sentence rationale for why the primary was chosen. Offer to elaborate on the alternative if the user wants.
Validate the verbalized solution. Before finalizing, run the chosen solution through a consistency check: does the output match the input constraints? Does the code compile/run mentally? Are edge cases handled?
If validation fails, backtrack to step 6. Promote the next-best trajectory and re-verbalize. Do not patch a broken trajectory — switch to a clean one.

Concrete Examples

Example 1: Algorithm selection for a coding problem

User: Write a function that finds the k-th largest element in an unsorted array.

Internal Planning (not shown to user):
  Trajectory A: Sort the array, return arr[n-k]. O(n log n) time, O(1) extra space.
  Trajectory B: Use a min-heap of size k. O(n log k) time, O(k) space.
  Trajectory C: Quickselect (partition-based). O(n) average, O(n^2) worst, O(1) space.
  Trajectory D: Use Python's heapq.nlargest. O(n log k), but single-line.

Probe: All trajectories viable. User didn't specify language or constraints.
Evaluate: Trajectory C is optimal for interviews/performance. Trajectory D is
  optimal for production Python. Trajectory A is simplest to understand.

Verbalized output (for a general audience):
  Primary: Quickselect approach with O(n) average time.
  Mention: "For production Python, `heapq.nlargest(k, arr)[-1]` is a clean
  one-liner. For guaranteed O(n log k), use a min-heap of size k."

Output:

import random

def find_kth_largest(nums: list[int], k: int) -> int:
    """Quickselect: O(n) average time, O(1) extra space."""
    def partition(left, right, pivot_idx):
        pivot = nums[pivot_idx]
        nums[pivot_idx], nums[right] = nums[right], nums[pivot_idx]
        store = left
        for i in range(left, right):
            if nums[i] > pivot:
                nums[store], nums[i] = nums[i], nums[store]
                store += 1
        nums[right], nums[store] = nums[store], nums[right]
        return store

    left, right = 0, len(nums) - 1
    target = k - 1
    while left <= right:
        pivot_idx = random.randint(left, right)
        pos = partition(left, right, pivot_idx)
        if pos == target:
            return nums[pos]
        elif pos < target:
            left = pos + 1
        else:
            right = pos - 1

For production Python, heapq.nlargest(k, nums)[-1] is a clean one-liner. For guaranteed O(n log k) worst-case, use a size-k min-heap.

Example 2: Debugging with multiple hypotheses

User: My Flask app returns 500 on POST /api/users but GET works fine.
The endpoint just inserts into a Postgres database.

Internal Planning (not shown to user):
  Trajectory A: Missing CSRF token or Content-Type header mismatch.
  Trajectory B: Database constraint violation (unique, not-null, FK).
  Trajectory C: Request body parsing failure (malformed JSON, wrong field names).
  Trajectory D: Connection pool exhaustion under concurrent writes.

Probe: Need more info. But can rank by frequency.
  Most common for "POST fails, GET works": B > C > A > D.
  Prune D (unlikely for single-user debugging).

Verbalized output:
  "Three likely causes, in order of probability:
   1. Database constraint violation — check your Postgres logs for the
      exact error (unique constraint, NOT NULL, or FK violation).
   2. Request body mismatch — confirm the POST sends JSON with
      Content-Type: application/json, and field names match your model.
   3. Missing request parsing — ensure your route uses request.get_json()
      not request.form for JSON payloads.
   Run: `flask run --debug` and POST with `curl -X POST -H 'Content-Type:
   application/json' -d '{"name":"test"}' http://localhost:5000/api/users`
   to see the full traceback."

Example 3: Designing a search/planning system

User: I need to build a test case generator for a compiler that maximizes
code coverage across different optimization passes.

Internal Planning (not shown to user):
  Trajectory A: Random fuzzing with coverage-guided feedback (AFL-style).
  Trajectory B: Grammar-based generation with weighted production rules.
  Trajectory C: Mutation-based approach starting from existing test corpus.
  Trajectory D: Symbolic execution to enumerate paths through opt passes.

Probe: All viable but different trade-offs.
  A: Good coverage but slow convergence for deep paths.
  B: Structured but needs grammar maintenance.
  C: Quick wins but limited novelty.
  D: Complete in theory but doesn't scale.

Evaluate: User said "maximizes coverage" — combine B + A.
  Generate structurally valid programs (B), then mutate with
  coverage feedback (A). This is the CSmith/libFuzzer hybrid pattern.

Verbalized output:
  Recommend grammar-guided fuzzing with coverage feedback loop.
  Provide architecture: Generator -> Compiler -> Coverage Diff -> Mutator.
  Mention symbolic execution as a complement for critical paths only.

Best Practices

Do:

Explicitly identify 2-4 candidate trajectories before committing to any single approach. Even 30 seconds of breadth prevents premature lock-in.
Use a termination probe: ask "do I have enough information to answer correctly?" at each step. Don't verbalize until the answer is yes.
Prune trajectories on hard constraints first (correctness, feasibility), then rank on soft criteria (performance, simplicity).
Surface the runner-up approach when it's comparably strong — the user may have context that changes the ranking.

Avoid:

Verbalizing every trajectory to the user. The point of latent planning is that exploration happens internally; the user sees a clean result.
Committing to the first plausible approach. PLaT's key finding is that greedy-first is often suboptimal; breadth pays off.
Over-exploring when the problem has an obvious canonical solution. If there's one standard approach (e.g., "reverse a linked list"), skip the multi-trajectory dance.
Using this workflow for simple factual lookups or single-step operations. It's designed for multi-step reasoning where branching matters.

Error Handling

Failure Mode	Detection	Recovery
All trajectories pruned	No surviving candidate after constraint check	Relax soft constraints, or ask the user to clarify which constraints are negotiable
Trajectories converge to same answer	All paths produce identical output	Good — high confidence. Verbalize directly with no alternatives section
Top trajectory fails validation	Output contradicts input constraints or doesn't compile	Promote next-best trajectory; do not patch the failed one
User rejects the chosen trajectory	User says "I don't want X, try something else"	Surface the previously-suppressed alternatives; re-rank with new constraint
Problem is under-specified	Multiple trajectories are viable but incomparable	Ask the user one targeted clarifying question to break the tie

Limitations

Not useful for single-path problems. If there's exactly one way to solve the problem (e.g., "what's 2+2"), multi-trajectory exploration adds overhead with no benefit.
Claude cannot literally run latent states. This skill simulates PLaT's architecture through disciplined internal reasoning. The diversity gains are real but less dramatic than a true continuous-space planner.
Greedy accuracy trade-off applies. By spending reasoning budget on breadth, the very first answer considered may be slightly less polished than a focused single-trajectory approach. The overall solution quality is higher, but only if the evaluation step (step 6) is done carefully.
Scales poorly beyond 4-5 trajectories. Tracking more than ~5 parallel approaches in a single reasoning pass leads to confusion and cross-contamination between trajectories. For larger search spaces, use explicit tree-of-thought with external state tracking.
Mathematical benchmarks only in the original paper. PLaT was validated on GSM8k, SVAMP, and MultiArith. Generalization to code generation, system design, or natural language tasks is extrapolated, not proven.

Reference

PLaT: Planning with Latent Thoughts — Wang, Peng, Liu (2026). Key insight: decoupling reasoning from verbalization yields a broader solution space with superior Pass@k scaling, at the cost of lower single-shot greedy accuracy. Code: github.com/yunsaijc/PLaT.