debug - SKILL.md Agent Skill

name: debug description: Debug container agent issues. Use when things aren't working, container fails, authentication problems, or to understand how the container system works. Covers logs, environment variables, mounts, and common issues.

NanoClaw Container Debugging

This guide covers debugging the containerized agent execution system.

Architecture Overview

Host (macOS)                          Container (Linux VM)
─────────────────────────────────────────────────────────────
src/container-runner.ts               container/agent-runner/
    │                                      │
    │ spawns container                      │ runs iFlow SDK
    │ with volume mounts                   │ with MCP servers
    │                                      │
    ├── data/env/env ──────────────> /workspace/env-dir/env
    ├── groups/{folder} ───────────> /workspace/group
    ├── data/ipc/{folder} ────────> /workspace/ipc
    ├── data/sessions/{folder}/.claude/ ──> /home/node/.claude/ (isolated per-group)
    └── (main only) project root ──> /workspace/project

Important: The container runs as user node with HOME=/home/node. Session files must be mounted to /home/node/.claude/ (not /root/.claude/) for session resumption to work.

Log Locations

Log	Location	Content
Main app logs	`logs/nanoclaw.log`	Host-side WhatsApp, routing, container spawning
Main app errors	`logs/nanoclaw.error.log`	Host-side errors
Container run logs	`groups/{folder}/logs/container-*.log`	Per-run: input, mounts, stderr, stdout
Agent sessions	`~/.claude/projects/` or `~/.iflow/`	Agent session history

Enabling Debug Logging

Set LOG_LEVEL=debug for verbose output:

# For development
LOG_LEVEL=debug npm run dev

# For launchd service (macOS), add to plist EnvironmentVariables:
<key>LOG_LEVEL</key>
<string>debug</string>
# For systemd service (Linux), add to unit [Service] section:
# Environment=LOG_LEVEL=debug

Debug level shows:

Full mount configurations
Container command arguments
Real-time container stderr

Common Issues

1. "agent process exited with code 1"

Check the container log file in groups/{folder}/logs/container-*.log

Common causes:

Missing Authentication

Invalid API key · Please run /login

Fix: Ensure .env file exists with iFlow credentials (or legacy Claude credentials):

cat .env  # Should show one of:
# IFLOW_API_KEY=sk-...                    (iFlow - recommended)
# IFLOW_OAUTH_TOKEN=...                   (iFlow OAuth)
# CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-... (legacy - deprecated)
# ANTHROPIC_API_KEY=sk-ant-api03-...       (legacy - deprecated)

Root User Restriction

--dangerously-skip-permissions cannot be used with root/sudo privileges

Fix: Container must run as non-root user. Check Dockerfile has USER node.

2. Environment Variables Not Passing

Runtime note: Environment variables passed via -e may be lost when using -i (interactive/piped stdin).

Workaround: The system extracts authentication variables from .env and passes them via stdin (secrets field) for security. The container then sets them in the SDK environment.

Supported credentials (in order of precedence):

IFLOW_API_KEY - iFlow API key (recommended)
IFLOW_OAUTH_TOKEN - iFlow OAuth token
CLAUDE_CODE_OAUTH_TOKEN - Legacy Claude OAuth (deprecated)
ANTHROPIC_API_KEY - Legacy Anthropic API key (deprecated)

To verify credentials are reaching the container:

# Check what the container receives (secrets are passed via stdin, not env)
docker run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
  echo "Checking SDK availability..."
  npm list @iflow-ai/iflow-cli-sdk 2>/dev/null || echo "iFlow SDK not found"
'

3. Mount Issues

Container mount notes:

Docker supports both -v and --mount syntax

Use :ro suffix for readonly mounts:

# Readonly
-v /path:/container/path:ro

# Read-write
-v /path:/container/path

To check what's mounted inside a container:

docker run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c 'ls -la /workspace/'

Expected structure:

/workspace/
├── env-dir/env           # Environment file (iFlow or legacy credentials)
├── group/                # Current group folder (cwd)
├── project/              # Project root (main channel only)
├── global/               # Global CLAUDE.md (non-main only)
├── ipc/                  # Inter-process communication
│   ├── messages/         # Outgoing WhatsApp messages
│   ├── tasks/            # Scheduled task commands
│   ├── current_tasks.json    # Read-only: scheduled tasks visible to this group
│   └── available_groups.json # Read-only: WhatsApp groups for activation (main only)
└── extra/                # Additional custom mounts

4. Permission Issues

The container runs as user node (uid 1000). Check ownership:

docker run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
  whoami
  ls -la /workspace/
  ls -la /app/
'

All of /workspace/ and /app/ should be owned by node.

5. Session Not Resuming / "agent process exited with code 1"

If sessions aren't being resumed (new session ID every time), or the agent exits with code 1 when resuming:

Root cause: The SDK looks for sessions at $HOME/.claude/projects/ (or $HOME/.iflow/ for iFlow). Inside the container, HOME=/home/node, so it looks at /home/node/.claude/projects/.

Check the mount path:

# In container-runner.ts, verify mount is to /home/node/.claude/, NOT /root/.claude/
grep -A3 "Claude sessions" src/container-runner.ts

Verify sessions are accessible:

docker run --rm --entrypoint /bin/bash \
  -v ~/.claude:/home/node/.claude \
  nanoclaw-agent:latest -c '
echo "HOME=$HOME"
ls -la $HOME/.claude/projects/ 2>&1 | head -5
'

Fix: Ensure container-runner.ts mounts to /home/node/.claude/:

mounts.push({
  hostPath: claudeDir,
  containerPath: '/home/node/.claude',  // NOT /root/.claude
  readonly: false
});

6. MCP Server Failures

If an MCP server fails to start, the agent may exit. Check the container logs for MCP initialization errors.

Manual Container Testing

Test the full agent flow:

# Set up test environment
mkdir -p data/env groups/test

# Run test query (credentials passed via stdin secrets field)
echo '{"prompt":"What is 2+2?","groupFolder":"test","chatJid":"test@g.us","isMain":false,"secrets":{"IFLOW_API_KEY":"sk-test-key"}}' | \
  docker run -i \
  -v $(pwd)/groups/test:/workspace/group \
  -v $(pwd)/data/ipc:/workspace/ipc \
  nanoclaw-agent:latest

Test iFlow SDK directly:

docker run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
  echo "=== iFlow SDK version ==="
  npm list @iflow-ai/iflow-cli-sdk
'

Interactive shell in container:

docker run --rm -it --entrypoint /bin/bash nanoclaw-agent:latest

SDK Options Reference

The agent-runner uses these SDK options (via the SDK adapter layer):

// SDK factory creates the appropriate SDK instance
const sdk = await createAgentSDK({
  type: 'iflow',  // or 'claude' for legacy
  credentials: {
    apiKey: process.env.IFLOW_API_KEY,
    oauthToken: process.env.IFLOW_OAUTH_TOKEN,
    modelName: process.env.IFLOW_MODEL_NAME || 'minimax-m2.5',
  },
  agentConfig: {
    cwd: '/workspace/group',
    allowedTools: ['Bash', 'Read', 'Write', 'Edit', 'Glob', 'Grep', ...],
    permissionMode: 'bypassPermissions',
    mcpServers: { nanoclaw: { command: 'node', args: [...], env: {...} } },
  }
});

// Query with streaming
for await (const message of sdk.query({ prompt, options })) {
  // Handle messages: assistant, result, system/init, etc.
}

Key differences from Claude SDK:

iFlow SDK uses IFlowClient with connect()/sendMessage()/receiveMessages()/disconnect() pattern
Permission mode yolo maps to bypassPermissions
MCP servers are passed as structured configs, not CLI flags

Rebuilding After Changes

# Rebuild main app
npm run build

# Rebuild container (use --no-cache for clean rebuild)
./container/build.sh

# Or force full rebuild
docker builder prune -af
./container/build.sh

Checking Container Image

# List images
docker images

# Check what's in the image
docker run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c '
  echo "=== Node version ==="
  node --version

  echo "=== iFlow SDK version ==="
  npm list @iflow-ai/iflow-cli-sdk 2>/dev/null || echo "Not installed"

  echo "=== Installed packages ==="
  ls /app/node_modules/ | head -10
'

Session Persistence

Claude sessions are stored per-group in data/sessions/{group}/.claude/ for security isolation. Each group has its own session directory, preventing cross-group access to conversation history.

Critical: The mount path must match the container user's HOME directory:

Container user: node
Container HOME: /home/node
Mount target: /home/node/.claude/ (NOT /root/.claude/)

To clear sessions:

# Clear all sessions for all groups
rm -rf data/sessions/

# Clear sessions for a specific group
rm -rf data/sessions/{groupFolder}/.claude/

# Also clear the session ID from NanoClaw's tracking (stored in SQLite)
sqlite3 store/messages.db "DELETE FROM sessions WHERE group_folder = '{groupFolder}'"

To verify session resumption is working, check the logs for the same session ID across messages:

grep "Session initialized" logs/nanoclaw.log | tail -5
# Should show the SAME session ID for consecutive messages in the same group

IPC Debugging

The container communicates back to the host via files in /workspace/ipc/:

# Check pending messages
ls -la data/ipc/messages/

# Check pending task operations
ls -la data/ipc/tasks/

# Read a specific IPC file
cat data/ipc/messages/*.json

# Check available groups (main channel only)
cat data/ipc/main/available_groups.json

# Check current tasks snapshot
cat data/ipc/{groupFolder}/current_tasks.json

IPC file types:

messages/*.json - Agent writes: outgoing WhatsApp messages
tasks/*.json - Agent writes: task operations (schedule, pause, resume, cancel, refresh_groups)
current_tasks.json - Host writes: read-only snapshot of scheduled tasks
available_groups.json - Host writes: read-only list of WhatsApp groups (main only)

Quick Diagnostic Script

Run this to check common issues:

echo "=== Checking NanoClaw Container Setup ==="

echo -e "\n1. iFlow credentials configured?"
[ -f .env ] && grep -q "IFLOW_" .env && echo "OK (iFlow)" || \
  ([ -f .env ] && grep -q "CLAUDE_CODE_OAUTH_TOKEN\|ANTHROPIC_API_KEY" .env && echo "OK (legacy - consider migrating to IFLOW_ vars)" || \
  echo "MISSING - add IFLOW_API_KEY or IFLOW_OAUTH_TOKEN to .env")

echo -e "\n2. Container runtime running?"
docker info &>/dev/null && echo "OK" || echo "NOT RUNNING - start Docker Desktop (macOS) or sudo systemctl start docker (Linux)"

echo -e "\n3. Container image exists?"
echo '{}' | docker run -i --entrypoint /bin/echo nanoclaw-agent:latest "OK" 2>/dev/null || echo "MISSING - run ./container/build.sh"

echo -e "\n4. iFlow SDK in container?"
docker run --rm --entrypoint /bin/bash nanoclaw-agent:latest -c 'npm list @iflow-ai/iflow-cli-sdk 2>/dev/null | grep -q @iflow && echo OK || echo MISSING' 2>/dev/null || echo "MISSING - rebuild container"

echo -e "\n5. Session mount path correct?"
grep -q "/home/node/.claude" src/container-runner.ts 2>/dev/null && echo "OK" || echo "WRONG - should mount to /home/node/.claude/, not /root/.claude/"

echo -e "\n6. Groups directory?"
ls -la groups/ 2>/dev/null || echo "MISSING - run setup"

echo -e "\n7. Recent container logs?"
ls -t groups/*/logs/container-*.log 2>/dev/null | head -3 || echo "No container logs yet"

echo -e "\n8. Session continuity working?"
SESSIONS=$(grep "Session initialized" logs/nanoclaw.log 2>/dev/null | tail -5 | grep -o 'session_[a-f0-9]*' | sort -u | wc -l)
[ "$SESSIONS" -le 2 ] && echo "OK (recent sessions reusing IDs)" || echo "CHECK - multiple different session IDs, may indicate resumption issues"