name: obsidian-course-vault description: Build and maintain a semester-long Obsidian course vault with course overviews, lesson notes, concept pages, graph hubs, replay-sync trackers, and replay-to-note workflows. Use when Codex needs to manage course knowledge in Obsidian, especially when BUAA replay artifacts should become structured lesson notes rather than raw transcripts.
Obsidian Course Vault
Use this skill when the user wants long-term course notes in Obsidian.
Assume commands run from this skill root. Otherwise use the absolute path to scripts/.
Core Boundary
- Let scripts handle vault structure, replay sync, replay diagnosis, cache refresh, tracker maintenance, and file writes.
- Let the agent handle course alignment, concept confirmation, terminology correction, graph interpretation, and final note prose.
- Do not let deterministic seed notes masquerade as finished course notes.
Main Commands
Initialize the vault:
python scripts\init_obsidian_course_vault.py --obsidian-dir "<obsidian-install-dir>" --vault-dir "<vault-dir>"
Add a course:
python scripts\add_course.py --vault-dir "<vault-dir>" --course-name "<course-name>"
Maintain a course or sync BUAA replay state:
python scripts\maintain_obsidian_course.py --vault-dir "<vault-dir>" --course-name "<course-name>" --course-page-url "<coursedetail-url>" --replay-output-dir "<replay-output-dir>"
Course Identity And Placement
Before adding or maintaining a BUAA course in a vault:
- Resolve
course-namefrom the extracted course title whenever possible. The normalized title is the course identity. - If the resolved title already exists in the vault, reuse that course folder. Same title means same course, even if the new
coursedetailURL has a differentcourse_id, lecturer, schedule, classroom, or sub_id range. - If the title is missing or unreliable, use the
course_idonly as a provisional course name such ascourse-136278, or ask the user for the intended title before writing formal notes. Do not infer the course from existing vault folders, teacher names, class times, or old extraction directories. - Interpret "use the previous vault" as selecting the vault root, not automatically selecting an existing course folder inside it.
- Keep each source
course_idand URL in sync metadata so multiple classroom sources can feed the same titled course without overwriting provenance.
Build or rebuild one replay note:
python scripts\maintain_obsidian_course.py --vault-dir "<vault-dir>" --course-name "<course-name>" --draft-replay-sub-ids "<sub-id>" --replay-note-mode "final-explained"
Required Replay Diagnosis
Before any replay note is built, scripts must compute one replay_diagnosis and route the lesson into exactly one of:
waiting_transcriptpartial_transcripttranscript_only
All downstream note generation should consume this diagnosis instead of recomputing route decisions independently.
Recommended Replay Modes
Treat these as the normal user-facing modes:
final-litefinal-explained
Legacy final is treated as a semantic-packet preparation path, not permission to write final prose directly. Use draft only for placeholders or backlog notes.
In semantic modes, scripts must write only:
semantic_rebuild/semantic_rebuild_input.jsonsemantic_rebuild/semantic_rebuild_prompt.md
Do not write a seed lesson note into the vault by default. The agent must read that packet and produce the first user-visible lesson note only after semantic rebuild completes.
Before a rebuilt note is counted as a formal lesson page, run:
python scripts\validate_final_note.py "<lesson-note.md>"
If the validator fails, mark the note as needs_review / quality rejected. Do not include it in course trackers, overview completion counts, or graph growth.
Then create a reviewer packet:
python scripts\review_final_note.py --note "<lesson-note.md>" --semantic-input "<semantic_rebuild_input.json>" --output-dir "<review-dir>"
The reviewer packet directory must be outside the Obsidian vault. If --note is inside a vault and --output-dir is omitted, review_final_note.py writes to $HOME/.codex/course-vault-work/final_note_review/.... Use final_note_review_prompt.md with an independent reviewer agent only when the active system/developer instructions allow spawning one. If subagents are unavailable or not allowed, run a separate reviewer pass yourself with the same prompt, write the result as final_note_review_result.json, and do not edit the note during review.
Efficient Batch Workflow
When the user explicitly asks to organize all pending BUAA replays for a course:
- Reuse the existing replay extraction directory and semantic packets when present.
- Build any missing semantic packets first; do not rerun browser extraction for lessons that already have current
transcript.txtandsemantic_rebuild_input.json. - Filter candidates before writing: skip finalized lessons, skip future lessons, and keep missing/empty/near-empty transcripts in waiting/backlog.
- Before writing each lesson, decide the admitted formal concept set from transcript-stable teaching objects and the existing course graph. Existing concept pages may be reused immediately; a genuinely new concept must be accepted for graph growth and have a planned concept page/hub placement before it can appear as a wiki link or
conceptsfrontmatter item. - For each eligible lesson, read the full transcript, write the formal note using only the admitted concept set, run validation, create the review packet, and record a passing review for the current note hash.
- Materialize admitted new concepts during the same lesson/batch authoring pass, before the note is considered finalized. Do not let a finalized lesson contain concept links that are merely promises to be backfilled later.
- Run
maintain_obsidian_course.pyonce after the batch to refresh overview, trackers, backlog, and sync notes. Run it earlier only when you need a checkpoint or need it to create missing directories/packets. - Do not regenerate concept pages from weak transcript-only hints during the batch; defer graph growth to transcript-stable concepts and the normal maintenance pass.
This is a batching optimization, not a relaxation of the semantic gates.
Mandatory Semantic Workflow
For semantic modes:
- Build the semantic packet from replay artifacts plus recent course context.
- Run a course-alignment check before accepting the note.
- Decide the admitted concept set before drafting prose. Separate existing concept pages, admitted new concepts, and rejected/weak hints.
- Materialize admitted new concept pages and connect them to an existing or new hub before using them as wiki links or lesson frontmatter concepts.
- Rewrite the note semantically using only admitted concepts.
- Run
scripts\validate_final_note.pyon the rewritten Markdown. - Generate a reviewer packet with
scripts\review_final_note.pyinto a vault-external work directory. - Run an independent reviewer pass against the current note hash.
- Mark the lesson as finished only after semantic rebuild completes, concept admission/materialization is consistent, hard gate passes, and reviewer returns
pass. - Only allow a formal lesson page when transcript coverage and transcript-based summary coverage both pass.
If semantic rebuild is still pending, do not count the lesson as finished in course trackers.
For a formal Obsidian replay note, the frontmatter must include at least:
type: lessoncoursetitledatereplay_sub_idsource: buaa-replay-semantic-rebuildreplay_diagnosishas_semantic_rebuild_packet: truesemantic_rebuild_completed: truesemantic_rebuild_status: completedconcepts
Without these fields, maintenance may keep the lesson in pending semantic rebuild or exclude it from 已整理课次.md.
Course Affairs Maintenance
Treat course affairs as a first-class output, not incidental prose.
When a transcript contains supported logistics, write them in the lesson note under ## 课程事务 with these categories when applicable:
### 作业### 考试### 课程安排### 通知
Existing notes may use ## 课堂事务; maintenance treats it as the same rollup source. Prefer ## 课程事务 for newly written notes so the structure is explicit.
Only write transcript-supported affairs as confident bullets. Put uncertain due dates, weights, submission formats, exam scope, or policy details under 待核对 instead of promoting them to a firm affair.
During maintenance, scripts may refresh:
- course-internal
.course-internal/affairs-candidates.mdfrom finished lesson notes - course-level
事务.mdonly when it is still an unreviewed auto-generated placeholder - vault-level
03-Admin/作业总表.mdand03-Admin/考试与通知.mdonly after an agent affairs review has condensed the candidates
Course-level affairs are a reviewed digest, not a keyword dump. Keep only items that change what a student should do, check, submit, read, attend, or expect in assessment. Compress repeated or vague mentions into one short entry per date. Exclude ordinary teaching content, general encouragement, study advice without a concrete deliverable, concept-review suggestions, and broad course narration. 课程安排 may remain inside lesson notes for local context, but do not roll it up to course-level 事务.md unless it contains a concrete schedule/location/session change that belongs under 通知.
Do not let keyword extraction write final affairs directly. Use this flow: generate .course-internal/affairs-candidates.md; run an agent affairs review in the main agent or an allowed independent reviewer; then write concise reviewed entries into 事务.md and the Admin tables. Do not require human review, and do not overwrite an agent-reviewed 事务.md during routine maintenance.
Do not expose affairs candidates in user-facing notes. Do not link .course-internal/affairs-candidates.md from 事务.md, 00-课程总览.md, trackers, or Admin pages.
Agent affairs review must explicitly reject or merge:
- ordinary teaching content or concept-review suggestions
- general encouragement, learning methods, or motivational remarks
- repeated mentions of the same assignment/exam/notice
- vague “maybe useful for homework” notes without a concrete deliverable
- exam-like keyword hits caused only by words such as
分数,分类, or model scores
Do not roll up affairs from waiting transcript notes, partial transcript notes, quality rejected notes, or notes still pending semantic rebuild. Placeholder sentences such as “当前未从转写中识别出稳定...” are not affairs and must not appear in 事务.md or the Admin tables.
Final Note Quality Gate
Do not write or count a lesson as finished if the note is only a decorated transcript segment list. Reject the note and keep it as needs_review if it contains:
- raw ASR/OCR snippets presented as "代表性表达" or representative lines
- headings such as
课堂讲解与主题推进 1 - repeated generic advice like
整理时建议不要把这一段只当作... - section bodies that could fit almost any course
- misrecognized mathematical symbols copied into final prose without correction
- a course tracker or overview marking diagnostics or weak drafts as formal notes
For math-heavy courses, the final note must reconstruct concrete mathematical objects, assumptions, equations, proof ideas, examples, and their relationships. If the agent cannot do that from the transcript, write a review-gated draft rather than a formal lesson page.
The semantic packet must not contain user-facing seed prose such as seed_bullets, raw sample_lines, or transcript_excerpt. It may contain time windows and paths to the transcript; the agent must read the transcript itself and reconstruct the note semantically.
Reviewer Gate
Finalization requires both gates on the current Markdown bytes:
scripts\validate_final_note.pypasses.- The independent reviewer returns
decision=pass,finalization_allowed=true, andreviewed_note_sha256equal tofinal_note_review_input.jsonnote.sha256.
If the note changes after either gate, both gate results are invalid and must be rerun. Do not update course overview, trackers, or graph growth from an outdated review.
Reviewer implementation detail:
- If subagents are permitted, use an independent reviewer agent.
- If subagents are not permitted by active instructions, run a separate reviewer pass in the main agent, write
final_note_review_result.json, and ensurereviewed_note_sha256matchesfinal_note_review_input.json. - Do not rerun review for an unchanged note when an existing
final_note_review_result.jsonalready passes for the same hash.
Reviewer decisions:
pass: the note faithfully covers the transcript, handles course-domain substance, preserves supported affairs/emphasis, and is safe to present as final.needs_revision: the transcript can support a final note, but the current note misses supported content, is too generic, or needs correction. Revise, rerun hard gate, then rerun reviewer.reject: the current source material or note is not fit for finalization. Keep extraction artifacts and semantic packet; do not present a final note or grow the graph.
Absence is not failure. Missing homework, exam, grading, or deadline information is only a problem when the transcript contains evidence for it and the note omits, distorts, or invents it. If the transcript shows early dismissal, in-class exercise, student presentation, discussion, or a logistics-only class, the note may be short but must faithfully describe what happened.
Authoring Contract
When writing the formal Obsidian lesson note from a semantic packet:
- You are writing the finished note, not a seed note, diagnostic note, or instruction to a future organizer.
- Read the full
transcript.txtbefore writing. Usesemantic_rebuild_input.jsononly as metadata, time anchors, and artifact index. - Do not expose evidence snippets, candidate phrases, OCR fragments, raw ASR lines, or internal workflow notes.
- Before drafting the note body, form the admitted concept set. A concept may enter
conceptsfrontmatter or visible wiki links only if it is already a finished page or it is being materialized as a finished page in the same pass. - Reject weak concept hints before writing: section labels, generic nouns, one-off examples, OCR/PPT-only terms, transcript noise, and concepts that cannot be explained from the lesson transcript plus course context.
- If a useful term is not yet strong enough for a concept page, mention it as plain text in the lesson rather than as a concept wiki link or frontmatter concept.
- Every major time block should explain what teaching move happened: definition, model, argument, proof, example, comparison, case discussion, policy explanation, teacher comment, assignment, exam arrangement, or class logistics.
- Capture high-value classroom signals: exams, homework, deadlines, submission format, grading weight, reading requirements, teacher-emphasized key points, repeatedly stressed phrases, formulas, theorems, definitions, examples, and common mistakes.
- If the teacher explicitly says something is important, likely to be tested, easy to confuse, often wrong, or needs review after class, preserve it in the note.
- If transcript evidence is weak, write the item under
待核对instead of turning it into a confident conclusion. - The final note must face the student reader directly. Avoid phrases such as “整理时应...”, “后续重写...”, “这一段主要在...”, or other process commentary.
Course-domain reconstruction guidance:
- Math and statistics: reconstruct objects, definitions, assumptions, equations, theorems, proof ideas, examples, counterexamples, symbol meanings, and links between results.
- Engineering and computer science: reconstruct system components, algorithms, design constraints, implementation steps, experiment setup, failure cases, trade-offs, and how formulas or code relate to the design.
- Humanities and social sciences: reconstruct concepts, arguments, historical or institutional background, author positions, evidence, comparisons, cases, and the teacher's evaluative emphasis.
- Ideological and political courses: reconstruct policy concepts, theoretical claims, historical context, named documents or events, value judgments, exam-oriented formulations, and examples used to explain abstract claims.
- Language, writing, and communication courses: reconstruct vocabulary, rhetorical patterns, text structure, examples, correction points, practice requirements, and teacher feedback.
- Lab, design, or project courses: reconstruct task goals, deliverables, tools, operation steps, data requirements, safety or format constraints, grading criteria, and troubleshooting advice.
Course Alignment Rules
Before accepting a semantic rewrite, judge whether the replay interpretation is:
matchweak_matchmismatch
Use at least:
- the declared course name
- recent lesson notes for the same course
- existing concept pages and chapter hubs
- the current replay transcript and semantic packet
Course alignment checks content fit, not administrative sameness. A different lecturer, weekday, section time, classroom, or course_id is not a mismatch when the confirmed course title is the same. When the title is unavailable, keep alignment provisional and avoid updating formal trackers until the course identity is confirmed.
If the result is mismatch, keep only seed artifacts or a draft and do not write a formal final lesson note.
Transcript-Only Rule
When replay_diagnosis=transcript_only:
- do not produce fake generic headings
- let scripts provide only time segments, representative transcript lines, and
transcript_overviewfrom the course transcript - let the agent infer the real teaching structure from the course transcript plus course context
- do not ask scripts to pre-confirm transcript-only concepts or create concept pages from weak transcript hints
- do not let seed notes count as finished notes by default
PPT Rule
- Treat PPT as supplementary only.
- Do not require PPT to proceed with final rebuild.
- PPT may only help with term spelling, page or book titles, formula symbols, and logistics screenshots.
- PPT must not decide lesson structure, concept growth, or completion state.
Waiting and Partial Transcript Rules
waiting_transcript: create only a waiting placeholder. Do not invent a summary.- Empty or near-empty
transcript.txtcounts as waiting material even if a tracker currently lists the replay under backlog instead ofwaiting_transcript. partial_transcript: create only a diagnostic draft. Do not treat it as a final lesson note.needs_review: create a review-gated note when the course transcript exists but the current transcript-based summary still leaves large uncovered ranges.
Upgraded Source Review
If the platform later adds stronger replay materials such as PPT streams, ppt_outline, or fuller transcripts, surface that lesson in 回放同步.md as a review candidate.
Semi-automatic rebuild:
python scripts\maintain_obsidian_course.py --vault-dir "<vault-dir>" --course-name "<course-name>" --rebuild-upgraded-replays --replay-note-mode "final-explained"
Protect lessons already marked as semantic rebuild completions from silent overwrite.
Output Rules
- Public outputs are only finished products:
00-课程总览.md,事务.md,章节完成度.md,已整理课次.md,待回看问题.md,回放同步.md,待整理回放.md,03-Admin/*.md, formal lesson notes, and formal concept pages. - Internal artifacts are only for workflow use. Do not put reviewer packets, final-note review results, draft packets, or other diagnostic notes inside the Obsidian vault.
.course-internal/*andsemantic_rebuild/*are legacy/internal script artifacts only; avoid creating new user-visible workflow material in the vault. - Never link internal artifacts from public outputs. User-facing placeholder text must also avoid process language such as "agent review", "candidate", or "semantic rebuild".
- Ensure Obsidian ignores legacy internal workflow paths such as
.course-internal,semantic_rebuild, andfinal_note_review, but do not rely on ignore filters as the boundary. Newfinal_note_reviewpackets must live outside the vault. - Keep concept links visible in the note body, not only in frontmatter.
- Keep visible time references in final lesson sections. Time references are a hard gate: write replay-locatable timestamp ranges such as
时间参考:约 \03:29-18:58`or, after the first hour,时间参考:约 `01:05:31-01:23:21`. Never write01:20-01:39to mean the 80th to 99th classroom minute; after one hour, useHH:MM:SS`. Time ranges must be monotone in note order and should be long enough to represent a real major lesson section. - Keep math as
$...$or$$...$$only. - Keep graph-growth rules internal to the semantic packet. Do not expose helper rule notes as vault content.
- Keep course pages concept-centric. Lesson pages support the graph; they should not become the graph itself.
- Grow concept pages from transcript-stable concepts only. Do not let PPT or OCR noise create concept pages.
- Do not create concept pages from low-quality transcript snippets, representative expressions, or generic section labels.
- Treat missing-page checks as a regression guard, not the main workflow. The main workflow is concept admission before authoring: no accepted formal concept may enter a lesson until its concept page either already exists or is being written in the same pass.
- Before declaring a course batch complete, still run a missing-page check: collect finalized lesson
conceptsplus visible concept wiki links, ignore course trackers/admin pages, and verify each formal concept has a corresponding finished concept page under02-Concepts/<course>/. - If the check finds missing pages, treat it as a process failure: either remove the weak concept from the lesson or write the course-supported concept card and hub connection before refreshing
章节完成度.md. - Keep concept pages substantive but compact: define the concept in this course's context, link prerequisite/related/contrasting concepts, list lesson references, and include only transcript-supported formulas or examples.
- Do not add a lesson to course trackers or graph growth if
validate_final_note.pyrejects it.
On Windows, prefer a UTF-8 shell when validating generated files. If needed, set [Console]::InputEncoding and [Console]::OutputEncoding to UTF-8 before manual Get-Content or other console inspection.
For inline Python in PowerShell, use:
@'
print("hello")
'@ | python -
Do not use Bash heredoc syntax such as python - <<'PY' in PowerShell.