name: data-science-post
description: Create a notebook-style data science blog post for carlos-mendez.org. Use when the user wants to write a tutorial in Python, Stata, or R — covering data science, econometrics, causal inference, or statistical methods. Produces a Hugo page bundle with case-study framing, sandwich-pattern code blocks, figures, and interpretation paragraphs. Confirms topic scope and design choices before writing.
argument-hint: " dataset: [references: <URLs, papers, or notes>]"
disable-model-invocation: true
user-invocable: true
Data Science Post: Notebook-Style Blog Post Generator
Create a self-contained, pedagogical data science blog post framed as a case study with a clear motivating problem. The user specifies the topic, dataset, and (optionally) reference materials such as library documentation URLs. The skill produces a Hugo blog post with conceptual explanations before every code block and interpretation of results after every code block.
The post inherits notebook-style CSS styling (teal-accented code blocks, blue
headings, styled tables, left-side TOC) from assets/scss/custom.scss.
Supported languages: Python (default), Stata, R. The language determines code fence tags, companion deliverables, and output verification steps.
Example invocations
/project:data-science-post double machine learning dataset: DS4Bolivia references: https://docs.doubleml.org/stable/intro/intro.html
/project:data-science-post k-means clustering dataset: https://archive.ics.uci.edu/ml/datasets/Iris
/project:data-science-post spatial regression dataset: PySAL example data references: https://pysal.org/spreg/
/project:data-science-post RCT evaluation with panel data in Stata dataset: dataSIM4RCT.dta references: causal.pdf
Site color palette
These colors are used throughout the site and must be used consistently in matplotlib plots, and are referenced in the CSS styling below.
| Name | Hex | Use in plots |
|---|---|---|
| Steel blue | #6a9bcc |
Primary data (bars, scatter, histograms) |
| Warm orange | #d97757 |
Reference lines, secondary series |
| Near black | #141413 |
Tertiary elements, text annotations |
| Teal | #00d4c8 |
Highlights (use sparingly) |
| Heading blue | #1a3a8a |
CSS only -- headings, titles |
Dark theme palette
For posts that use dark-background figures (e.g., to match the site's dark
navbar/footer), use this extended palette for figure backgrounds, grid lines,
and text. Reference post: content/post/python_fwl/script.py.
| Name | Hex | Use in plots |
|---|---|---|
| Dark navy | #0f1729 |
Figure + axes background (facecolor) |
| Grid line | #1f2b5e |
Grid lines (subtle contrast on dark bg) |
| Light text | #c8d0e0 |
Axis labels, tick labels, legend text |
| White text | #e8ecf2 |
Titles, bold annotations |
Deliverables
Python posts
| Output | Path |
|---|---|
| Blog post | content/post/python_<topic-slug>/index.md |
| Python script (optional) | content/post/python_<topic-slug>/script.py |
| Jupyter notebook (optional) | content/post/python_<topic-slug>/notebook.ipynb |
| Figures (>= 3) | content/post/python_<topic-slug>/<slug>_*.png |
Stata posts
| Output | Path |
|---|---|
| Blog post | content/post/stata_<topic-slug>/index.md |
| Stata do-file | content/post/stata_<topic-slug>/analysis.do |
| Stata log | content/post/stata_<topic-slug>/analysis.log |
| Figures | content/post/stata_<topic-slug>/stata_<slug>_*.png |
Stata-specific conventions:
- Code blocks use
```stata(not```python) categories: [Stata]in front matter (not[Python])- Figures are generated by running Stata commands and exported via
graph export - The do-file uses
preserve/restoreto manage data scope; the post may reload data per section for self-contained readability - Use
captureprefix onssc installandnet installcommands - The Stata log (
analysis.log) is linked in the front matterlinks:section so readers can download it featured.pngorfeatured.webpis added manually by the user
Pre-flight
- Parse
$ARGUMENTS:- Topic -- everything before
dataset: - Dataset -- everything between
dataset:andreferences:(or end). Can be a URL, a dataset name, or a description - References -- everything after
references:(optional). Can be URLs, paper titles, or filenames of PDFs/documents in the post folder - Topic slug -- lowercase, underscores (e.g., "double machine learning" ->
doubleml) - Language -- infer from topic, dataset extension, or explicit mention. Default: Python. Use Stata if topic mentions Stata,
.dofiles, or.dtadatasets. Use R if topic mentions R or.Rfiles
- Topic -- everything before
- Fetch reference URLs -- use WebFetch to read each URL and understand the library's API, key classes/functions, and recommended usage patterns. Critical for producing accurate, idiomatic code
- Fetch dataset information -- if the dataset is a URL, use WebFetch to understand its structure. If it's a named dataset, look up the standard loading pattern. Read
references/data-sources.mdfor data loading patterns (URL download with cache, named datasets, DS4Bolivia joins, simulated DGP for method tutorials, user-described data) - Check for pre-existing materials -- if the post folder already exists, check for reference materials (markdown notes, PDFs, datasets, code files). These are the foundation for the tutorial. Read them to understand the intended content and scope
- Identify dependencies -- determine which packages the topic requires:
- Python: note
pip installcommands in the setup code block - Stata: note
ssc install/net installcommands withcaptureprefix
- Python: note
- Read reference post -- read an existing post in the same language to confirm current conventions:
- Python:
content/post/python_ml_random_forest/index.md - Stata:
content/post/stata_rct/index.md
- Python:
Handling PDF reference materials
PDFs (academic papers, software manuals) are common reference inputs but can be very large. Mishandling them will exhaust the context window. Follow these rules strictly:
Never read an entire PDF into the main conversation. Large PDFs (>50 pages) can consume 50,000+ tokens of context. Instead, delegate PDF reading to an Agent (subagent), which has its own context window.
Targeted extraction only. When reading a PDF (either directly or via agent), extract only the specific sections needed:
- Use the table of contents / outline to find relevant page numbers first
- Read only those pages (typically 5--15 pages, not the full document)
- Search for specific keywords (e.g., "ATE estimator", "doubly robust") to locate the right pages
Preferred workflow for PDFs:
Step 1: Extract PDF outline/TOC (pypdf, ~10 lines of code) Step 2: Search for keyword matches across pages (~20 lines) Step 3: Read only the 5-10 relevant pages Step 4: Summarize key equations, definitions, and explanationsUse agents for PDF-heavy research. If the user provides a large PDF (e.g., a 700-page Stata manual), launch an Explore agent with a specific extraction task: "Read pages 560--580 of causal.pdf and extract the formal equations for RA, IPW, and AIPW estimators." The agent processes the PDF in its own context and returns only the distilled findings.
Clean up PDFs before committing. Reference PDFs should NOT be committed to the git repository --- they bloat the repo and may have copyright restrictions. Delete them from the post directory before committing, or add them to
.gitignore. The post's References section provides links to the original sources.
Step 0.5: Confirm scope and design choices
Before creating any files, present the user with a confirmation summary and wait for their response. This step prevents rework by aligning on the post's framing, scope, and deliverables upfront. Display all items in a single formatted block:
Topic understanding: "I'll create a tutorial about [TOPIC] using the [DATASET]. The case study question will be: '[QUESTION]'. Does this capture your intent?"
Post type: "[Causal inference / ML prediction / Exploratory analysis / Statistical method tutorial] -- based on [brief reasoning]. Change?"
Figure theme: "Light background (default) or dark navy background? Dark theme matches the site's navbar/footer aesthetic and works well for scatter plots and line charts."
Post scope -- ask about optional sections based on the topic:
- "Should I include a baseline comparison before the main method?"
- "Should I include a robustness/sensitivity analysis section?"
- "Should I include exercises for self-study?"
Language: "This will be a [Python / Stata / R] tutorial based on [reasoning]. Change?" Determine from topic keywords, dataset extension, or explicit user mention.
Companion deliverables: Depends on language:
- Python: "index.md (always), script.py, notebook.ipynb"
- Stata: "index.md (always), analysis.do (do-file), analysis.log (Stata log)"
- R: "index.md (always), analysis.R (R script)"
Ambiguity resolution (conditional) -- if the topic could be framed as either causal or predictive, ask: "This topic could be framed as [causal estimation of X] or [prediction of Y]. Which framing do you prefer?"
Handling responses:
- "Looks good" / "proceed" / no changes: continue with stated defaults
- Specific adjustments: incorporate them and proceed
- Major reframing requested: revise the scope and re-present the summary
Plan archival: When a plan is created and approved (e.g., via plan mode),
save the approved plan as plan.md in the post directory
(e.g., content/post/python_<topic-slug>/plan.md). This documents the
design decisions and outline that guided the post's creation. The plan file
is not referenced in the post itself --- it serves as internal documentation
for future reference and revisions.
Step 1: Create the post
Create content/post/<lang>_<topic-slug>/index.md with YAML front matter.
The folder prefix matches the language: python_, stata_, or r_.
Python front matter
---
authors:
- admin
categories:
- Python
draft: false
featured: false
date: "<YYYY-MM-DDT00:00:00Z>" # use yesterday's date to avoid future-post exclusion
external_link: ""
image:
caption: ""
focal_point: Smart
placement: 3
links:
- icon: open-data
icon_pack: ai
name: "[Python] Google Colab"
url: <colab-url-if-available>
- icon: code
icon_pack: fas
name: "Python script"
url: script.py
slides:
summary: <One sentence -- method, dataset, and finding>
tags:
- python
- <additional-relevant-tags>
title: "<Tutorial Title>"
url_code: ""
url_pdf: ""
url_slides: ""
url_video: ""
toc: true
---
Stata front matter
---
authors:
- admin
categories:
- Stata
- <additional-category>
draft: false
featured: false
date: "<YYYY-MM-DDT00:00:00Z>" # use yesterday's date to avoid future-post exclusion
external_link: ""
image:
caption: ""
focal_point: Smart
placement: 3
links:
- icon: file-code
icon_pack: fas
name: "Stata do-file"
url: analysis.do
- icon: database
icon_pack: fas
name: "Dataset (.dta)"
url: <dataset-url-or-filename>
- icon: file-alt
icon_pack: fas
name: "Stata log"
url: analysis.log
slides:
summary: <One sentence -- method, dataset, and finding>
tags:
- stata
- <additional-relevant-tags>
title: "<Tutorial Title>"
url_code: ""
url_pdf: ""
url_slides: ""
url_video: ""
toc: true
diagram: true
---
Common front matter notes:
- Set
dateto yesterday's date (e.g., if today is 2026-04-01, use2026-03-31T00:00:00Z). Netlify production builds exclude future-dated posts (--buildFutureis only used for deploy previews), and timezone differences between the local machine and the UTC build server can cause "today's" date to be treated as future content image.placement: 3enables full-width featured image above the title- Omit links that are not applicable
toc: trueenables the left-side table of contents sidebardiagram: trueenables Mermaid diagram rendering (add when using Mermaid)- Featured image: user adds
featured.pngorfeatured.webpmanually - Choose tags from:
python,stata,r,spatial,regional,causal,causal inference,rct,panel,world,gee
Step 2: Write the post body
The post is a case-study tutorial that tells a coherent story: a real-world
question motivates the analysis, the method addresses that question, and the
results answer it. Every section uses ## headings (subsections ###) and
fenced Python code blocks.
2.1 Post structure
| Section | Content | Required |
|---|---|---|
| Colab badge | <a href="..." target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> (if applicable) |
If exists |
| Overview | 1-2 paragraphs: What question are we answering? Why does this method matter? Frame as: "We want to know X. Method Y can help because Z." | Yes |
| Learning objectives | 3-5 bullet points using strong action verbs (Understand, Implement, Estimate, Assess, Compare). Avoid vague verbs like "explore" or "see" | Yes |
| Setup & imports | Imports, config variables, seed, data URLs | Yes |
| Data loading | Load dataset, explain structure, print shape/stats | Yes |
| EDA | At least 1 figure, connected to case study question | Yes |
| Data preparation | Scaling, encoding, train/test split as needed | If needed |
| Baseline | Simple approach first (naive OLS, difference in means, basic model) to establish a benchmark and motivate the full method | Encouraged |
| Core method (1-3 sections) | Main technique with conceptual explanations. Each major step gets its own ##. At least 1 figure |
Yes |
| Evaluation & results | Metrics, comparison tables, at least 1 figure | Yes |
| Validation & robustness | At least one robustness check: learner comparison, refutation test, residual analysis, or sensitivity analysis | Encouraged |
| Discussion | What findings mean for the case study question. Connect to real-world context | Yes |
| Summary and next steps | Takeaways (concrete insights with numbers) + limitations + next steps | Yes |
| Exercises | 2-3 self-study challenges for the reader | Encouraged |
| References | Numbered list of clickable links to all sources | Yes |
Narrative flow rules:
- Transitions. End each section with a sentence that previews the next step or links back to the case study question. The reader should never wonder "why are we doing this now?"
- Question-answer arc. The Overview poses a question. The Discussion must explicitly answer it with specific findings. Check that these two sections mirror each other.
- Result ordering. Present the most important finding first in the Evaluation & Results section. Do not bury the key result after preliminary diagnostics.
- "So what?" moment. At least one paragraph (typically in Discussion) must state a clear practical implication -- what a policymaker, analyst, or practitioner would do with this finding.
Takeaways requirements:
- Takeaways must be concrete insights with numbers, not generic summaries. Bad: "We learned about Random Forest." Good: "Satellite embeddings explain 23% of development variation (R^2 = 0.23), with embedding A05 contributing 3x more than the median feature."
- Cover at least 4 dimensions: (1) a method insight (when to use this method), (2) a data insight (what the data revealed), (3) a practical limitation (when this approach fails), (4) a next step (what to try next).
- Do not restate section headings. Each takeaway should stand alone as something the reader remembers a week later.
Narrative arc:
Structure the post as a journey that follows this arc:
- Question (Overview) -- pose the case study question
- Intuition (EDA) -- show the data, build understanding
- Simple baseline -- establish a benchmark with the simplest approach
- Full method -- introduce and apply the main technique
- Validation -- test whether results hold (robustness checks)
- Takeaways -- answer the original question with specific findings
Subsection structure:
When a method has multiple conceptual steps (e.g., model fitting,
cross-validation, test evaluation), use ### subsections to break them
up. This creates visual structure in the TOC and helps readers navigate.
2.2 The sandwich pattern
Every code block that produces output must follow a four-layer pattern:
1. Explanation paragraph (before):
- What technique/step this is and why it matters
- How it connects to the case study question
- Written generically (no output values -- they haven't been computed yet)
Beginner accessibility rules (apply to all explanation paragraphs):
- Define jargon on first use. The first time a technical term appears (e.g., "cross-fitting", "confounders", "regularization"), follow it immediately with a plain-language definition in the same sentence or the next sentence. Example: "...using cross-fitting -- a procedure that splits the data into folds so that the model never predicts on the same data it was trained on."
- Explain why, not just what. Every code block needs a sentence explaining why this step is needed for the analysis, not just what it does mechanically. Bad: "Next we scale the features." Good: "Next we scale the features so that variables with larger ranges do not dominate the distance calculations."
- No complexity jumps. If a code block is substantially more complex than the previous one, add a bridging paragraph that previews the new concept before the code.
- Concrete before abstract. When introducing a concept, give a real-world analogy or concrete example first, then the formal definition. Example: "Think of cross-validation as a rotating exam: the model takes turns training on different subsets and testing on the remainder, so no single lucky split determines the score."
2. Code block -- focused, well-commented, one logical step:
```python
# Clear comments explaining *why*, not just *what*
result = method(data)
print(result)
```
3. Output block -- shows expected printed output (required when code calls
print(), .describe(), .head(), .summary(), or displays a DataFrame repr).
Use a fenced code block with the text language tag (prevents highlight.js
from auto-detecting a language and applying unwanted syntax coloring):
```text
Dataset shape: (5099, 26)
Treatment groups:
0 3354
1 1745
Name: tg, dtype: int64
```
Output blocks are not needed for code that only saves figures (the figure image reference serves as the visible output).
4. Interpretation paragraph (see Step 3 for detailed guidance):
- Quotes specific numbers from the output
- Explains what they mean in plain language
- Connects to the case study question
- 2-4 sentences, single continuous paragraph
2.2b Function documentation
The first time a Python function is used in the post, introduce it properly so the reader understands what it does and can learn more.
For each key function on first use:
- Link to docs. In the explanation paragraph before the code block,
link the function name to its official documentation. Use markdown:
[train_test_split()](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html). - State its purpose. One plain-language sentence explaining what the function does. Example: "We use DoubleMLPLR() to estimate the causal effect of the treatment while controlling for confounders using machine learning."
- Explain key arguments. Briefly describe 2-3 arguments the reader
needs to understand. Example: "
ml_lsets the learner for the outcome model,ml_msets the learner for the treatment model, andn_foldscontrols how many cross-fitting folds are used."
Not every function needs this treatment -- focus on the functions that
are central to the analysis (model constructors, key sklearn/statsmodels
calls, data transformation functions). Standard pandas/numpy operations
like pd.read_csv() or np.mean() can be skipped unless used in a
non-obvious way.
2.3 Code block conventions
- Use
```pythonfenced blocks (rendered by highlight.js with custom colors inassets/scss/custom.scss) - One logical step per block -- keep focused
- First code block: imports + configuration
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
# Reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Configuration
TARGET = "<target-variable>"
FEATURE_COLS = [...]
DATA_URL = "<url-or-path>"
If the post has a companion notebook in the claude4data repo, use this instead:
import sys
if "google.colab" in sys.modules:
!git clone --depth 1 https://github.com/cmg777/claude4data.git /content/claude4data 2>/dev/null || true
%cd /content/claude4data/notebooks
sys.path.insert(0, "..")
from config import set_seeds, RANDOM_SEED, IMAGES_DIR, TABLES_DIR, DATA_DIR
set_seeds()
2.4 LaTeX math and equations
For LaTeX math in Hugo, escape subscripts as \_ and LaTeX punctuation
commands as \\,, \\;, etc. Letter commands like \theta need no
escaping. Currency dollar signs use \\$ in index.md. Read
references/latex-escaping.md for the complete escaping guide, worked
examples, and equation requirements (plain-language explanations, variable
mapping, notation consistency). Minimum 2 display-math equations for
quantitative method posts.
2.5 Figures, diagrams, and color families
Save figures with dpi=300, bbox_inches="tight", reference with
. Place figure references immediately after the
generating code block, before the interpretation paragraph. At least 3
figures total. Read references/figure-conventions.md for dark theme
setup (rcParams, savefig options), Mermaid diagram guidance, and color
family design for multi-method comparison charts.
2.6 Tables
Write Markdown tables directly (styled by CSS section 11C -- teal header underlines, hover effects):
| Metric | Baseline | Tuned |
|--------|----------|-------|
| R^2 | 0.231 | 0.230 |
| RMSE | 6.52 | 6.52 |
When comparing multiple estimation approaches or model configurations, use a Markdown table to display key metrics side-by-side. This is one of the most effective pedagogical tools -- readers can scan a table faster than re-reading multiple paragraphs of results.
2.7 References section
Final section of every post:
## References
1. [Library Name -- Documentation Title](https://full-url-to-docs)
2. [Author(s) (Year). Paper Title. Journal.](https://doi-or-url)
3. [Dataset Name -- Source](https://dataset-url)
Include at minimum:
- Original method paper -- cite the academic paper that introduced the method (not just library documentation). Example: for DoubleML, cite Chernozhukov et al. (2018), not just the Python package docs.
- Dataset source -- cite with author, year, and title (not just a
raw URL). Example:
[DS4Bolivia -- QUARCS Lab (2021)](https://...). - Library documentation -- link to the main library docs page.
- Order: Number references in order of first mention in the post.
2.8 Causal inference posts
For causal inference posts, explicitly state which estimand (ATE/ATT) each
method targets and use correct framing for randomized vs observational data.
Read references/causal-inference.md for full requirements on estimand
precision, confounding language, and framing guidelines.
2.9 Writing clarity
- Sentence length. Keep sentences under ~40 words. If a sentence needs re-reading, split it. Target ~25 words average per paragraph.
- Active voice. Prefer "We estimate the model" over "The model is estimated." Active voice is clearer and more engaging.
- Analogies for complex concepts. For each new technical concept, provide a real-world analogy or concrete example before the formal definition. At least 2 analogies per post.
- Consistent terminology. Pick one term for each concept and use it throughout. Do not alternate between "treatment variable", "intervention", and "policy" for the same concept without explanation.
2.10 Academic integrity and originality
Respecting copyrights and avoiding plagiarism is non-negotiable. Every sentence in the post must be original writing.
- Always paraphrase. Never copy-paste text from references, documentation, tutorials, or any external source. Read the source, understand the idea, then rewrite it entirely in your own words while preserving the original meaning. Even short phrases should be rephrased -- do not reproduce verbatim passages, even with attribution.
- Proper attribution. When using ideas, methods, results, or data from another source, cite it explicitly in the text (e.g., "Following the approach introduced by Chernozhukov et al. (2018)...") and include the full reference in the References section.
- Code attribution. If code is adapted from documentation, tutorials,
or other sources, add a comment in the code crediting the original
source (e.g.,
# Adapted from scikit-learn docs: <URL>) and include the source in the References section. - Images and figures. Only use self-generated figures produced by
script.py. Never include images from external sources without explicit permission and proper attribution. - When in doubt, cite. If you are unsure whether an idea is common knowledge or attributable to a specific source, err on the side of citing.
Step 3: Interpret results -- THIS IS THE MOST IMPORTANT STEP
The interpretation paragraphs are what transform this post from a code demo into a genuine case-study tutorial. Without them, a beginner sees numbers and plots but has no idea what they mean. Every code block that produces output needs a paragraph immediately after it that explains the result in plain language and connects it back to the case study question.
How to do it
After writing the full post draft (Step 2), review every code block that produces output. For each one, ensure an interpretation paragraph follows.
If the post has a companion notebook or script that has been executed, read the actual output and use the real numbers. If not, write realistic values based on the dataset and method, and mark them for the user to verify.
What good interpretation looks like
Each interpretation paragraph must:
- Quote specific numbers (e.g., "R^2 = 0.23", "339 observations", "mean of 51.05")
- Explain what those numbers mean in plain language a beginner can understand
- Translate to domain meaning -- convert abstract metrics into real-world statements. Bad: "MAE = 4.72." Good: "Predictions are typically off by about 4.7 IMDS points -- meaningful uncertainty that would make targeting aid to specific municipalities difficult."
- Connect findings to the case study question and real-world context
- Be a single continuous paragraph (no bullet points, no hard line breaks)
- Be 2-4 sentences -- concise but substantive
Good vs bad
Bad (vague, no numbers):
The model performed reasonably well on the test set.
Good (specific, contextual):
The tuned model achieves R^2 = 0.2297, RMSE = 6.52, and MAE = 4.72 on the test set -- essentially identical to the baseline. In practical terms, predictions are typically off by about 4.7 IMDS points on a scale where most values fall between 47 and 55. Satellite embeddings capture real but limited predictive signal for municipal development.
Bad (restates output):
The dataset has 339 rows and 88 columns.
Good (adds meaning):
All 339 Bolivian municipalities loaded with no missing values, providing complete national coverage. The merged data has 88 columns: 64 satellite embedding features, SDG indices, and region identifiers. IMDS scores range from 35.70 to 80.20 with a mean of 51.05, meaning most municipalities cluster within about 7 points of the national average.
Verification
Count interpretation paragraphs. There must be at least 8 that reference specific numeric values. Typical locations:
- After data loading (shape, basic stats)
- After EDA figure (distribution patterns, correlations)
- After train/test split (set sizes, implications)
- After baseline model (initial metrics)
- After tuning/main method (changed metrics)
- After evaluation figure (visual patterns)
- After feature importance / key results
- After summary table (overall takeaways)
If fewer than 8, go back and add more.
Step 4: Create companion deliverables
Python
If script.py or notebook.ipynb were confirmed in Step 0.5, read
references/companion-deliverables.md for the templates and conventions.
Key points: script.py includes a docstring header and mirrors the post's
analysis pipeline; notebook.ipynb uses raw LaTeX (no Goldmark escaping);
do NOT generate featured.png or featured.webp -- the user adds it manually.
Stata
Create analysis.do containing all Stata commands organized by section with
comments. Key conventions:
- Use
preserve/restoreto manage data scope between sections - Use
captureprefix onssc installandnet installcommands - Include
graph exportcommands for all figures - Add a header comment block with title, description, usage, and dataset info
- After creating the do-file, run it to generate
analysis.logand all PNG figures:"/Applications/Stata 18.0/StataMP.app/Contents/MacOS/stata-mp" -b do analysis.do - Check the log for errors (
grep "^r(" analysis.log) - Save the log in the post directory and link it in the front matter
- Reference post:
content/post/stata_rct/analysis.do
Step 5: Verify
Check deliverables:
content/post/<lang>_<slug>/index.mdwith complete front mattertoc: truein front matterimage.placement: 3in front matter- At least 3 figure references
- At least 8 interpretation paragraphs with specific numbers
- References section at the end
- Colab badge (if applicable, Python only)
Run the code and verify output:
Python:
- If
script.pyexists, run it:cd content/post/python_<slug> && python3 script.py - If not, assemble code blocks from
index.mdinto a temporary script and run it - Compare actual printed output against the output blocks in
index.md
Stata:
- Run the do-file:
"/Applications/Stata 18.0/StataMP.app/Contents/MacOS/stata-mp" -b do analysis.do - Check for errors:
grep "^r(" analysis.log(should return nothing) - Compare actual Stata log output against the
textoutput blocks inindex.md - Verify all
graph exportcommands produced the expected PNG files - Save the clean
analysis.login the post directory
Common checks (all languages):
- Flag any discrepancies: different numbers, errors, deprecation warnings
- If any output differs, update the output blocks in the post to match actual results
- Image freshness: After ANY code or color change, re-run to regenerate ALL images
- Orphaned images: List all PNGs in the directory, cross-reference with
index.md, and delete unreferenced PNGs - Floating-point drift: Always use values from the LATEST run in output blocks
- If
Clean up before committing:
- Remove reference PDFs from the post directory (do not commit to git)
- Remove temporary do-files, scratch logs, or intermediate files
- Verify the remaining file list contains only deliverables
Run Hugo dev server:
"$HOME/Library/Application Support/Hugo/0.84.2/hugo" server --disableFastRenderVisual checks:
- Post renders at
http://localhost:1313/post/<lang>_<slug>/ - Left-side TOC shows sections and subsections
- Code blocks have syntax highlighting
- Headings are blue (#1a3a8a)
- Tables render with clean styling
- All LaTeX math renders correctly (no raw text, no wrong symbols)
- Mermaid diagrams render (if
diagram: trueis set) - Output blocks appear after code that prints results
- Featured image displays in post header and listings
- Post renders at
Run the full quality checklist from
references/quality-checklist.md. Key checks: sandwich pattern on every output block, at least 8 interpretations with numbers, at least 3 figures, LaTeX rendering, no orphaned PNGs, original writing with proper attribution.Report to user: what was created + local preview URL
Step 5.5: Follow-up
After delivering the post and reporting results, offer the user next steps:
"The post is ready at content/post/<lang>_<slug>/. Want me to:
- Adjust any section or add more figures?
- Run
/project:proofread-postfor a final QA check? - Run
/project:referee-postfor a deep expert review? - Create the infographic prompt with
/project:infographic-instructions?"