study-index

star 26

整理课程材料为带目录索引的速查手册。适用于开卷考试复习资料、课程笔记整理、 知识收藏整理。输入多个 PPT/PDF/文档,输出结构化 Markdown(全部内容+索引)。 Use when the user mentions 开卷考试, 复习资料, 考试速查, 知识整理, 整理笔记, 资料汇总, 速查手册, or asks to organize scattered course materials into an indexed handbook for studying or open-book exams. Proactively suggest this skill when the user has multiple course files (PPTs, PDFs, documents) and needs to consolidate them for review or exam preparation.

Piaoxuemoli By Piaoxuemoli schedule Updated 5/5/2026

name: study-index description: > 整理课程材料为带目录索引的速查手册。适用于开卷考试复习资料、课程笔记整理、 知识收藏整理。输入多个 PPT/PDF/文档,输出结构化 Markdown(全部内容+索引)。 Use when the user mentions 开卷考试, 复习资料, 考试速查, 知识整理, 整理笔记, 资料汇总, 速查手册, or asks to organize scattered course materials into an indexed handbook for studying or open-book exams. Proactively suggest this skill when the user has multiple course files (PPTs, PDFs, documents) and needs to consolidate them for review or exam preparation.

Study Index Skill

将散乱的课程材料(PPT、PDF、文档、笔记)整理成一本包含全部原文内容的速查手册。

核心原则:不丢失信息——源材料中的所有文字内容必须完整保留(不总结、不删减), 关键图片、公式、图表必须提取并放到手册对应位置。

关键:使用脚本自动拼接所有文本内容,而不是让 AI 来"写"手册。AI 只负责生成大纲。

Architecture

User provides course materials
  -> Extract: pull text + images from each source file (extract_content.py)
  -> Filter: remove decorative images (filter_images.py)
  -> Organize: AI groups sources by chapter, writes outline (knowledge-organizer)
  -> Compile: script concatenates ALL text into handbook (compile_handbook.py)

Startup Layer

1. Intake

Infer from user input:

  • course_name: course or topic name
  • source_files: list of PPT/PDF/DOCX/MD/TXT files or directories
  • purpose: exam prep (default), review, reference
  • language: Chinese if prompt is Chinese, English if English

Default assumptions:

Missing item Default
Language Chinese if prompt is Chinese
Output format Markdown
Organization By chapter/topic (auto-detected from source structure)
Images Extract all, keep key images in final handbook

2. Initialize Workspace

python study-index/scripts/init_output_dir.py <course-name> \
  --source-files "<path1>|<path2>" \
  --delivery-formats "md"

Creates:

study-index/outputs/<course-name>/
├── 00_admin/
│   ├── study_context.json
│   └── extract_manifest.json
├── 01_extracted/          # text + images by source (all images)
│   ├── <source-1>/
│   │   ├── slide_01.txt
│   │   ├── slide_01_img_01.png
│   │   └── ...
│   └── <source-2>/
│       ├── page_01.txt
│       └── ...
├── 01_filtered/           # filtered images (decorative removed)
│   └── <source-1>/
│       └── slide_01_img_01.png
├── 02_outline/
│   └── outline.md
├── 04_final/
│   └── final_handbook.md  # complete handbook with ALL content + index
└── 06_qa/
    └── qa_report.md

3. Check Official Skills

Ensure document reading skills are available:

File type Skill
.pdf pdf
.doc, .docx docx
.ppt, .pptx pptx
.xls, .xlsx xlsx

Workflow

Step 1: Extract Content

Read agents/material-extractor.md.

python study-index/scripts/extract_content.py \
  --inputs "<source1>|<source2>|..." \
  --output "<output_dir>/01_extracted/" \
  --manifest "<output_dir>/00_admin/extract_manifest.json"

Step 2: Filter Images

python study-index/scripts/filter_images.py \
  --input-dir "<output_dir>/01_extracted/" \
  --output-dir "<output_dir>/01_filtered/" \
  --report "<output_dir>/00_admin/filter_report.json"

Step 3: Organize Knowledge (AI)

Read agents/knowledge-organizer.md.

  • Read extracted text to understand course structure
  • Group source directories by chapter/topic
  • Write 02_outline/outline.md with the mapping:
    ## Chapter 1: <title>
    - 来源: source_dir_1, source_dir_2
    

Step 4: Compile Handbook (Script)

python study-index/scripts/compile_handbook.py \
  --outline "<output_dir>/02_outline/outline.md" \
  --extracted "<output_dir>/01_extracted/" \
  --filtered "<output_dir>/01_filtered/" \
  --output "<output_dir>/04_final/final_handbook.md" \
  --course-name "<course name>"

This script:

  • Reads the outline to get chapter-to-source mapping
  • For each chapter, reads ALL text files from the source directories
  • Inserts corresponding images at the right positions
  • Adds table of contents and keyword index
  • Preserves every character from the source files — no summarization

Quality Bar

A good study handbook should:

  • Contain ALL text content from source materials — not a summary, but a compilation
  • Every source file's full text appears in the handbook
  • Include key images, formulas, and diagrams from the sources
  • Have a clear table of contents for quick navigation
  • Have a keyword index for quick lookup during exams
  • Test: A student reading only the handbook should have ≥ the information in the originals

Safety and Honesty

  • Do not fabricate content not present in source materials.
  • If a source file cannot be read, record the failure in extract_manifest.json and continue.
  • If an image cannot be extracted, note it in the output with a [图片提取失败] placeholder.

Dependencies

  • Python + python-pptx for PPTX text/image extraction
  • Python + PyMuPDF (fitz) for PDF text/image extraction
  • Python + python-docx for DOCX text/image extraction
  • Python + Pillow for image filtering
  • Official pdf, docx, pptx skills for reading files
Install via CLI
npx skills add https://github.com/Piaoxuemoli/Auto-college --skill study-index
Repository Details
star Stars 26
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator