kg-research-workflow - SKILL.md Agent Skill

name: kg-research-workflow description: "End-to-end academic research workflow using knowledge graphs. Searches papers from arxiv/web, imports to KG database, generates embeddings, runs graph algorithms (PageRank, Louvain, vector search), and extracts patterns for skill creation. Use for: automated research workflows, paper analysis pipelines, KG-based literature review."

KG Research Workflow

Complete workflow for academic research using knowledge graphs with sqlite-knowledge-graph.

Features

Paper Acquisition: Search arxiv, web sources, Anthropic research
KG Import: Import papers as entities with keyword relations
Embedding Generation: Create vector embeddings for similarity search
Graph Algorithms: PageRank for importance, Louvain for communities
Pattern Extraction: Identify skill patterns from research papers
Skill Creation: Transform patterns into reusable skills

Activation Keywords

kg research
knowledge graph workflow
paper analysis workflow
学术研究知识图谱
KG研究流程
知识图谱研究
automated literature review
研究自动化

Tools Used

web_search: Search arxiv and other sources for papers
exec: Run Python scripts for KG operations
read: Read paper abstracts and skill templates
write: Create import scripts and skill files
sqlite3: Direct database operations via exec

Prerequisites

# Required files
- kg.db: SQLite knowledge graph database at /Users/hiyenwong/wiki/kg.db (symlink to workspace kg.db)
- kg_tool: Rust binary at scripts/kg_tool/target/release/kg_tool

# Weekly topics (for scheduled research)
- scripts/weekly_topics.py — outputs daily topic and keywords

Actual Schema & Tool Reference

See references/operational-notes.md for the current database schema, kg_tool commands, arxiv access patterns, and operational details. This file is kept up-to-date with each session's findings.

Usage Patterns

Pattern 1: Full Research Pipeline

Complete automated workflow from search to skill creation:

执行 KG 研究流程：搜索 arxiv SNN 论文，导入知识图谱，生成嵌入，提取技能模式

Pattern 2: Paper Import Only

Import papers to KG without full analysis:

导入这些论文到知识图谱：[paper list]

Pattern 3: KG Analysis Only

Run algorithms on existing KG data:

分析知识图谱：运行 PageRank 和向量搜索，找相关论文

Instructions for Agents

First: Read references/operational-notes.md for current DB schema, kg_tool commands, and access patterns.

Step 1: Get Today's Topic

cd /Users/hiyenwong/.openclaw/workspace && python3 scripts/weekly_topics.py

Output gives weekday number, topic name, and keywords for targeted search.

Step 2: Paper Acquisition

Search papers from multiple sources:

# Use web_search for arxiv papers (direct arxiv API/browsing is blocked)
keywords = ["quantum computing", "machine learning", "distributed systems"]
for kw in keywords:
    papers = web_search(f"arxiv {kw} 2025", count=5)

IMPORTANT (2026-05): web_extract blocks ALL arxiv URLs. Use browser_navigate + browser_snapshot to read paper abstracts from arxiv pages.

Step 3: Import to KG

Two options:

Option A — kg_tool:

cd /Users/hiyenwong/.openclaw/workspace
scripts/kg_tool/target/release/kg_tool import-paper --title "Paper Title" --url "https://arxiv.org/abs/XXXX.XXXXX" --abstract "..." --authors "Name1, Name2"

Option B — Direct SQL (when more control needed):

sqlite3 kg.db "INSERT OR IGNORE INTO kg_entities (title, url, content, authors, published_date, category, source) VALUES ('Title', 'URL', 'abstract', 'Authors', 'date', 'category', 'arxiv');"

Step 4: Generate Embeddings

scripts/kg_tool/target/release/kg_tool generate-embeddings

Generates embeddings for entities missing vectors. No parameters needed.

Step 5: Run Graph Algorithms

# PageRank - find important papers
scripts/kg_tool/target/release/kg_tool pagerank --limit 15

# Vector search
scripts/kg_tool/target/release/kg_tool search --query "quantum machine learning" --limit 10

# Community detection (Louvain)
scripts/kg_tool/target/release/kg_tool communities --limit 10

# Stats
scripts/kg_tool/target/release/kg_tool stats

Step 6: Add Relationships

sqlite3 kg.db "INSERT OR IGNORE INTO kg_relationships (source_id, target_id, relationship_type, weight) VALUES (source_id, target_id, 'related_to', 0.9);"

Step 7: Pattern Analysis & Skill Creation

Analyze top papers from PageRank and vector search. Extract reusable patterns. Create skills using skill_manage(action='create') or write SKILL.md directly.

Step 8: Record Results

Save summary to memory/YYYY-MM-DD.md.

Database Schema

See references/operational-notes.md — the schema documented here was based on an older version and is no longer accurate. The operational notes file has the current schema verified from the running database.

Example Papers to Import

Typical research paper structure:

{
    "arxiv_id": "2603.27589",
    "title": "An Energy-Efficient Spiking Neural Network Architecture",
    "abstract": "Spiking Neural Networks offer energy-efficient alternative...",
    "category": "cs.NE",
    "keywords": ["spiking neural network", "energy-efficient", "SNN"]
}

Error Handling

Critical Operational Pitfalls

See references/pitfalls.md for confirmed issues: arxiv API 429 rate limiting, web_extract blocked on arxiv.org, kg_tool DB path, and import command details.

Embedding Dimension Mismatch

If embeddings have different dimensions:
1. Check dimension with: SELECT dimension, COUNT(*) FROM kg_vectors GROUP BY dimension;
2. Regenerate all embeddings with consistent dimension
3. Use scripts/regenerate_embeddings.py

Louvain Algorithm Failure

If Louvain/community detection fails:
1. kg_tool v2.0 uses Union-Find connected components (not true Louvain)
2. Check kg_relations weight column type — some rows store blob data, not REAL
   kg_tool now handles this by converting non-float weights to 1.0
3. Use communities command as fallback: kg_tool communities --limit 10

Arxiv API Timeout

If arxiv API fails:
1. Use web_search instead of direct API
2. Search "arxiv [keyword] 2026"
3. Extract paper IDs from URLs

Best Practices

Batch Import: Import multiple papers at once, not one-by-one
Consistent Dimensions: Always use same embedding dimension (256)
Keyword Extraction: Include 3-5 keywords per paper for better search
Regular Stats: Run kg_tool stats after each import batch
Memory Update: Always record results in memory/YYYY-MM-DD.md

Resources

kg_tool: /Users/hiyenwong/.openclaw/workspace/scripts/kg_tool/target/release/kg_tool (v2.0)
kg.db: /Users/hiyenwong/wiki/kg.db ← CORRECT PATH (not ~/.openclaw/workspace/kg.db)
skill-extractor: Use for pattern extraction
skill-creator: Use for skill creation

Related Skills

arxiv-search: For detailed arxiv searching
skill-extractor: Extract patterns from conversations
skill-creator: Create new skills
memory-retrieval: For storing research results

Notes

Empty tables removed (2026-05-04): kg_hyperedges, kg_hyperedge_entities, kg_turboquant_cache — all had 0 rows, cleaned up with VACUUM
Embeddings: Hash-based (SHA-256 seeded PRNG), deterministic but not semantic. For production use sentence-transformers.
kg_tool v2.0: Full rewrite (was placeholder). Implements real PageRank, Union-Find communities, FTS search, auto-embedding.