project-learner - SKILL.md Agent Skill

name: project-learner description: "Interactive project learning coach via interview-style Q&A. Reads codebase and docs, dynamically generates interview questions per knowledge domain and sub-topic, conducts up to 4 follow-up rounds, scores answers, provides learning guidance with code/doc references, and persists progress. 10 domains × 3-5 sub-topics = 45 knowledge points for comprehensive interview coverage. Use when user says '学习项目', '了解项目', '检验项目', '项目学习', '面试准备', 'learn project', 'study project', 'review project', 'interview prep', 'knowledge check', or wants to understand/master the project through guided Q&A."

Project Learner

Interactive interview-coach that helps users master this project through guided Q&A.

All user-facing interaction in 中文. Internal instructions in English.

Pipeline Overview

Discovery → Check History → User Intent → Select Domain → Select Sub-topic
→ Generate Question → Interactive Q&A (≤4 follow-ups) → Evaluate
→ Learning Guide → Persist Progress → Continue or End

Phase 1: Project Discovery

Autonomously build project understanding. Do NOT ask user anything yet.

Read DEV_SPEC.md — project goals, architecture, tech stack, module design
Read config/settings.yaml — configuration system
List src/ directory tree — module structure (core/, ingestion/, libs/, mcp_server/, observability/)
Read key entry points: main.py, scripts/ingest.py, scripts/query.py
List tests/ — testing strategy overview

Build an internal mental model covering these 10 Knowledge Domains, each containing 3-5 Sub-topics (知识点), totaling 45 interview knowledge points:

Domain & Sub-topic Map

ID	知识域 / 知识点	Key Code Areas
D1	RAG Pipeline 整体架构
D1.1	端到端数据流：从文档上传到生成回答的完整链路	`DEV_SPEC.md`, `main.py`, `scripts/`
D1.2	三层架构设计：core/ingestion/libs 各层职责与依赖方向	`src/core/`, `src/ingestion/`, `src/libs/`
D1.3	Pipeline 组装：配置驱动的组件组合机制	`main.py`, `config/settings.yaml`, `src/core/settings.py`
D1.4	核心数据类型：Document、Chunk、QueryResult 等类型系统	`src/core/types.py`
D1.5	入口脚本设计：CLI 脚本的职责划分与参数传递	`scripts/ingest.py`, `scripts/query.py`, `scripts/evaluate.py`
D2	Ingestion Pipeline
D2.1	Pipeline 整体流程：从文档加载到向量存储的阶段设计	`src/ingestion/pipeline.py`
D2.2	Chunking 策略：RecursiveSplitter 的分割逻辑与参数调优	`src/ingestion/chunking/`, `src/libs/splitter/`
D2.3	Transform 链：ChunkRefiner、MetadataEnricher 的职责与执行顺序	`src/ingestion/transform/`
D2.4	Embedding 编码：Dense/Sparse 双编码与 BatchProcessor 批处理	`src/ingestion/embedding/`
D2.5	存储层：VectorUpserter、BM25Indexer、ImageStorage 三类存储协同	`src/ingestion/storage/`
D3	Hybrid Search & Retrieval
D3.1	Dense Retrieval：向量检索原理与 DenseRetriever 实现	`src/core/query_engine/dense_retriever.py`
D3.2	Sparse Retrieval：BM25 稀疏检索与 SparseRetriever 实现	`src/core/query_engine/sparse_retriever.py`
D3.3	Hybrid Search 融合：RRF 算法与 Fusion 模块设计	`src/core/query_engine/hybrid_search.py`, `fusion.py`
D3.4	QueryProcessor：查询预处理与查询扩展机制	`src/core/query_engine/query_processor.py`
D3.5	Response 构建：ResponseBuilder、CitationGenerator、MultimodalAssembler	`src/core/response/`
D4	Rerank 机制
D4.1	Reranker 抽象与工厂模式：BaseReranker 与 RerankerFactory 设计	`src/libs/reranker/base_reranker.py`, `reranker_factory.py`
D4.2	CrossEncoder Reranker：模型原理与实现细节	`src/libs/reranker/cross_encoder_reranker.py`
D4.3	LLM Reranker：基于大语言模型的重排序方案与 Prompt 设计	`src/libs/reranker/llm_reranker.py`
D4.4	Rerank 在检索 Pipeline 中的集成位置与效果分析	`src/core/query_engine/reranker.py`
D5	MCP Server 协议
D5.1	MCP 协议概述：JSON-RPC 交互模型与标准规范	`src/mcp_server/server.py`
D5.2	Tool 注册机制：三个工具的定义、参数与执行逻辑	`src/mcp_server/tools/`
D5.3	ProtocolHandler：请求路由、分发与能力协商	`src/mcp_server/protocol_handler.py`
D5.4	Server 生命周期管理与异常处理	`src/mcp_server/server.py`, `protocol_handler.py`
D6	可插拔架构 & 配置系统
D6.1	工厂模式全景：LLM/Embedding/Reranker/VectorStore/Evaluator 五大工厂	`src/libs//factory.py`
D6.2	settings.yaml 配置结构与 Settings 类加载机制	`config/settings.yaml`, `src/core/settings.py`
D6.3	LLM Provider 多厂商支持：Azure/OpenAI/DeepSeek/Ollama 切换逻辑	`src/libs/llm/`
D6.4	Embedding Provider 抽象：多后端实现对比与选型策略	`src/libs/embedding/`
D6.5	Base 类设计哲学：接口抽象、继承层次与扩展点	`src/libs//base_.py`
D7	多模态处理
D7.1	PDF 解析：PDFLoader 实现与 FileIntegrity 文件校验	`src/libs/loader/`
D7.2	Vision LLM：Azure/OpenAI Vision 图片理解能力集成	`src/libs/llm/azure_vision_llm.py`, `openai_vision_llm.py`
D7.3	ImageCaptioner：图片描述生成流程与 Prompt 模板设计	`src/ingestion/transform/image_captioner.py`, `config/prompts/`
D7.4	多模态 Chunk 存储与检索：ImageStorage 与 MultimodalAssembler 协同	`src/ingestion/storage/image_storage.py`, `src/core/response/multimodal_assembler.py`
D8	可观测性 & 评估体系
D8.1	Trace 系统：TraceCollector 与 TraceContext 的采集与关联设计	`src/core/trace/`
D8.2	Dashboard 架构：Streamlit App 分页、Services 层数据流	`src/observability/dashboard/`
D8.3	评估指标体系：Recall、Precision、MRR 等核心指标定义与计算	`src/observability/evaluation/`, `scripts/evaluate.py`
D8.4	评估框架：CompositeEvaluator、CustomEvaluator、RAGAS 集成架构	`src/libs/evaluator/`, `src/observability/evaluation/`
D8.5	日志系统：Logger 设计、日志分级与调试支持	`src/observability/logger.py`
D9	测试策略 & 工程化
D9.1	测试分层策略：Unit/Integration/E2E 各层覆盖范围与边界	`tests/unit/`, `tests/integration/`, `tests/e2e/`
D9.2	Test Fixtures 与 conftest.py：Mock 策略与测试数据管理	`tests/conftest.py`, `tests/fixtures/`
D9.3	pyproject.toml 工程配置：依赖管理、构建配置、工具链集成	`pyproject.toml`
D9.4	脚本入口设计：四大脚本的职责边界与参数化设计	`scripts/`
D10	Document Manager & 幂等性
D10.1	文档去重：Hash 计算与重复检测机制	`src/ingestion/document_manager.py`, `src/libs/loader/file_integrity.py`
D10.2	增量 Ingestion：幂等性保证与文档更新策略	`src/ingestion/document_manager.py`, `pipeline.py`
D10.3	Collection 管理：集合元数据关联与文档生命周期	`src/ingestion/document_manager.py`
D10.4	文档状态追踪：已入库/待更新/已删除的状态流转	`src/ingestion/document_manager.py`

Total: 10 domains × 3-5 sub-topics = 45 knowledge points Each sub-topic can be studied multiple times with different questions, providing 100+ possible interview questions.

Phase 2: Check Learning History

Try reading .github/skills/project-learner/references/LEARNING_PROGRESS.md
File missing → first-time learner, proceed to Phase 3
File exists → parse BOTH tables:
- Domain Summary: which domains are ⬜/🔴/🔶/✅
- Sub-topic Progress: which sub-topics are ⬜ (unlearned), 🔴 (weak ≤3), 🔶 (learning 4-6), ✅ (mastered ≥7)
- Count: total sub-topics mastered / 45
- Identify lowest-scoring sub-topics for review recommendation

Phase 3: User Intent

Use ask_questions (中文) to determine what the user wants:

Question 1 — 学习模式 (single-select):

Option	Description
🆕 学习新知识点	Pick from unlearned/weak sub-topics
📖 复习已学内容	Review previously learned low-score sub-topics
📋 查看学习进度	Display progress table, then end
🎯 Agent 推荐	Auto-pick the best next sub-topic to study

If user picks 📋 → display the full progress table from LEARNING_PROGRESS.md and stop.

If user picks 🎯 → Agent auto-selects the optimal sub-topic (prioritize: ⬜ unlearned in weakest domain → 🔴 weak → 🔶 lowest score). Skip Question 2 & 3, go directly to Phase 4.

Question 2 — 知识域选择 (single-select, only for 🆕 or 📖):

List all 10 domains with current status + completion rate. Example format:

D1 RAG Pipeline 整体架构 [2/5 ✅] 🔶
D2 Ingestion Pipeline [0/5 ✅] ⬜

For 📖 mode: only show domains with previous scores. For 🆕 mode: prioritize domains with most ⬜ sub-topics.

Question 3 — 知识点选择 (single-select, only after Question 2):

List all sub-topics under the selected domain with their status:

D2.1 Pipeline 整体流程 ⬜ 未学习
D2.2 Chunking 策略 🔶 6/10
D2.3 Transform 链 ✅ 8/10

Include option:

🎯 Agent 推荐 — auto-pick the weakest/unlearned sub-topic in this domain

Phase 4: Generate Interview Question

Based on the selected sub-topic (not just domain):

Deep-read the sub-topic's specific source code — read actual class definitions, key functions, config sections listed in the Sub-topic Map
Dynamically generate ONE main interview question (中文) grounded in this sub-topic's real code
Internally prepare up to 4 progressive follow-up questions (do NOT show these yet)
Avoid repeating questions from previous sessions — check Detailed History for this sub-topic and generate a different angle

Question Design Principles

Questions MUST reference real code/architecture from THIS project, never generic
Questions should be specific to the sub-topic, not the whole domain
Difficulty progression for follow-ups:
- Follow-up 1: "为什么这样设计？" (design rationale)
- Follow-up 2: "和替代方案对比有什么优劣？" (trade-offs)
- Follow-up 3: "边界条件/异常情况怎么处理？" (edge cases)
- Follow-up 4: "如果让你重新设计，会怎么做？" (redesign thinking)
Adjust follow-ups dynamically based on what the user actually answers

Question Angle Variety

Each sub-topic can be asked from multiple angles. When a sub-topic is revisited, pick a DIFFERENT angle:

What: 描述这个模块/机制做了什么
How: 具体实现细节，代码层面怎么做的
Why: 为什么选择这种设计方案
Compare: 和替代方案的对比
Debug: 如果出了问题怎么排查
Extend: 如果要扩展功能怎么做

Question Format

Present to user:

## 🎯 面试问题

**知识域**: [Domain Name] > **知识点**: [Sub-topic Name]

**面试官问**: [Question text — specific to this sub-topic, referencing project components]

请回答：

Phase 5: Interactive Q&A (≤4 Follow-up Rounds)

Round 0: Main question → User answers
Round 1-4: Brief feedback on previous answer + follow-up question → User answers
Early exit: User says "结束"/"pass"/"跳过" OR answer is sufficiently comprehensive

Per-Round Behavior

Acknowledge what the user got right (1-2 sentences, 中文)
Hint at what was missed without giving away the answer (1 sentence)
Ask follow-up that digs deeper based on their answer direction

Follow-up Output Format

### 第 N 轮追问

✅ **答得好**: [What they got right]
💡 **提示**: [What they could explore further]

**追问**: [Follow-up question]

If user's answer already covers the planned follow-up, skip to a harder one or end early.

Phase 6: Evaluation

After Q&A ends, output a structured evaluation report (中文):

## 📊 评价报告

**知识域**: [Domain] > **知识点**: [Sub-topic ID & Name] — [Question summary]
**追问轮数**: N/4

### ✅ 回答亮点
- [Strength 1 — specific to what they said]
- [Strength 2]

### ⚠️ 需要加强
- [Gap 1 — what was missed or inaccurate]
- [Gap 2]

### 📈 评分明细

| 维度 | 分数 | 说明 |
|------|------|------|
| 准确性 | X/10 | [Factual correctness of answers] |
| 深度 | X/10 | [How deep they went beyond surface] |
| 代码关联 | X/10 | [Did they reference actual code/config] |
| 设计思维 | X/10 | [Trade-off analysis, architecture reasoning] |

### 🏆 综合评分: X/10

### 📊 学习进度: [mastered count]/45 知识点已掌握

Scoring rules:

Average of 4 dimensions, rounded to nearest 0.5
9-10: Expert level, can explain design decisions and trade-offs
7-8: Solid understanding, knows how and why
4-6: Basic understanding, knows what but not deep why
1-3: Surface level, needs significant study

Phase 7: Learning Guide

Immediately after evaluation, provide targeted study resources (中文):

## 📚 学习指南

### 📂 相关代码
- [file_path](file_path#LX-LY) — 说明这段代码的作用和关键逻辑

### 📄 相关文档
- [DEV_SPEC.md 对应章节](DEV_SPEC.md) — 设计原理
- [config/settings.yaml](config/settings.yaml) — 相关配置项

### 🔗 参考资料
- [External concept name] — 1-sentence explanation of relevance

### 💡 建议学习路径
1. 先阅读 [file] 理解 [what]
2. 再看 [file] 掌握 [implementation detail]
3. 运行 `[command]` 实际体验效果
4. 尝试修改 [config/code] 观察变化

Guidelines:

Code references MUST use actual file paths with line numbers where relevant
Only recommend reading 3-5 key files, not entire codebase
Include at least one hands-on command the user can run
External references only for concepts not explained in the codebase (e.g., RRF algorithm, BM25)

Phase 8: Persist Progress

Update .github/skills/project-learner/references/LEARNING_PROGRESS.md.

If file doesn't exist, create it from the template in references/LEARNING_PROGRESS.md. If it exists, update it.

Update Rules

Append one row to the Detailed History table (include Sub-topic ID)
Update the Sub-topic Progress table for the affected sub-topic:
- 已学 = count of sessions for that sub-topic
- 最高分 = max score across all sessions for this sub-topic
- 最近分 = score from this session
- Status: ≥7 → ✅ 掌握, 4-6 → 🔶 学习中, ≤3 → 🔴 薄弱, 0 sessions → ⬜ 未学习
Recalculate the Domain Summary table:
- 已掌握 = count of ✅ sub-topics in that domain / total sub-topics in domain
- 已学习 = count of non-⬜ sub-topics / total sub-topics
- 平均分 = average score of all studied sub-topics in domain
- Domain status: all sub-topics ✅ → ✅ 掌握, any studied → 🔶 学习中 or 🔴 薄弱 (based on avg), none → ⬜ 未学习
Update the Last updated timestamp
Update the session counter # (auto-increment)
Update the overall progress line: 总进度: X/45 知识点已掌握

Phase 9: Continue or End

After persisting, ask the user (中文):

Option	Action
🔄 继续学习下一个知识点	Loop back to Phase 3
🎯 Agent 推荐下一个	Auto-pick optimal next sub-topic, go to Phase 4
📋 查看当前学习进度	Display full progress table
🏁 结束本次学习	Show session summary, stop

Session Summary (on 🏁 end)

## 📝 本次学习总结

- 完成知识点: N 个
- 平均得分: X/10
- 最强知识点: [sub-topic] (X/10)
- 需加强知识点: [sub-topic] (X/10)
- 总进度: X/45 知识点已掌握 (XX%)

继续加油！下次建议学习: [recommended sub-topic name]

Key Paths

File	Purpose
`.github/skills/project-learner/references/LEARNING_PROGRESS.md`	Persistent learning state (45 sub-topics)
`DEV_SPEC.md`	Project specification & architecture
`config/settings.yaml`	Configuration reference
`src/`	All source code modules
`tests/`	Test suite for understanding test strategy
`scripts/`	CLI entry points (ingest/query/evaluate)