stata-skill

name: stata-skill description: | A packaged Stata Runner skill via official MCP-for-Stata server including stata_do, ado_package_install, help, read_log and get_data_info tools. Use it when (1) need to execute Stata do-file; (2) missing ado-packages; (3) find code error caused by syntax in Stata; (4) want to read smcl and text format log file with rich text output; (5) first encounter a data file and want to understand its structure and content. metadata: version: "1.0.8"

MCP-for-Stata

Official plugin for MCP-for-Stata maintained in collaboration with SepineTam.

MCP-for-Stata is an MCP (Model Context Protocol) server that exposes Stata's statistical and econometric capabilities to LLMs. This toolset supports executing do-files, querying data file structures, installing ado packages, reading Stata logs, and looking up command documentation.

Prerequisites

This skill requires the MCP-for-Stata server to be installed and running. If you have not configured it yet, follow @references/installation.md. After installation, verify with uvx stata-mcp doctor. Restart your AI client after installation (required for the first-time setup).

When to Use

Trigger this skill when the user mentions any of the following scenarios:

Needs to execute a Stata do-file for regression or statistical analysis
Encounters a Stata syntax error and needs to troubleshoot or verify command usage
Needs to install third-party ado packages (e.g., outreg2, reghdfe, estout)
First encounters a data file and wants to quickly understand its variable structure and distribution
Needs to read a Stata execution log (.log or .smcl)
Needs to look up the official documentation and syntax for a Stata command

Workflow

1. First Encounter with Data → `get_data_info`

When the user provides a data file path (.dta, .csv, .xlsx, .sav) or says "look at this data," call get_data_info first.

Key points:

data_path: absolute path to the data file
vars_list: if the user only cares about specific variables, pass a list of variable names; otherwise omit (defaults to all)
head: defaults to 0 (no preview rows). Only set to a positive integer when the user explicitly asks to see rows

Return value includes: data source, number of observations, variable list, variable types, and descriptive statistics (mean, standard error, min, max) for each variable.

Note: results are cached based on MD5 hash of file content. Repeated queries on the same file hit the cache.

2. Execute Stata Code → `stata_do`

When the user asks to run Stata commands, perform regression analysis, generate graphs, process data, etc., call stata_do.

Pre-requisites:

Write Stata code into a .do file (using the Write tool)
Confirm the file path is within an allowed directory (<WORKING_DIR>/.statamcp/stata-mcp-dofile/ or <WORKING_DIR>)
Call stata_do(dofile_path=..., log_file_name=...)

Key parameters:

dofile_path: absolute path to the do-file
log_file_name: custom log filename without timestamp, optional
read_log_when_error: defaults to false. When true, only returns log content when Stata returns an error code (e.g., r(198))
enable_smcl: defaults to true, also generates .smcl log (Unix only)

Return value: log_file_path (text and smcl paths); may contain log_content on error.

Notes:

Security guard is enabled by default and blocks dangerous commands (shell, erase, rm, !, etc.)
Do-file must be within a whitelisted directory; otherwise execution is rejected
SMCL log preserves hyperlinks from commands like findsj and getiref (Unix only)

3. Install Third-Party Packages → `ado_package_install`

Treat ado_package_install as a high-risk, opt-in tool. Do not call it merely because a command is missing. First identify the exact package and source, then ask the user to approve that package and source. The MCP tool is available only when the operator starts the MCP server with the unsafe profile.

Key parameters:

package: package name. For GitHub source, use "user/repo" format
source: "ssc" (default), "github", or "net"
is_replace: defaults to false
package_source_from: required only when source="net", specifies a validated HTTPS URL

Authorization and validation: SSC and net package names may contain only ASCII letters and numbers. GitHub repositories must use owner/repository format and match the exact repository allowlist. Unknown sources, local paths, IP hosts, credentials, queries, fragments, and non-default ports are rejected. The MCP client will ask the user to approve the exact request during the tool call; do not attempt to bypass or pre-answer it. The Python API does not require caller confirmation. The CLI prompts unless -y or --yes is supplied.

Examples:

ado_package_install("outreg2") — request approval to install an SSC package
ado_package_install("SepineTam/TexIV", source="github") — request approval to install an allowlisted GitHub repository

Note: GitHub repository contents receive no security protection. Inspect the repository before installation. The tool never installs the GitHub helper. Successful installs automatically attempt help(..., replace=true) for the likely command name; if the package exposes other commands, refresh those commands explicitly.

4. Look Up Command Documentation → `help`

When the user asks about the syntax, options, or usage of a Stata command, or wants to verify a command before troubleshooting an error, call help.

Key parameter:

cmd: Stata command name (e.g., "regress", "describe", "xtset")
replace: defaults to false. When true, bypasses cached help and refreshes it from Stata

Return value: Stata help text string. A cache hit is prefixed with Saved result for {cmd} or Cached result for {cmd}.

Notes:

Unix only (macOS/Linux), not available on Windows
Enabled project and global caches are considered, and the newest non-empty result is returned
If cached content seems stale or incorrect, call help(cmd=..., replace=true) to refresh it

5. Read Execution Log → `read_log`

When the user asks to view a Stata execution log, analyze output results, or wants to inspect the full log after stata_do execution, call read_log.

Key parameters:

file_path: absolute path to the log file
output_format: "dict" (recommended, structured command-result pairs), "full" (all original content), "core" (removes framework lines)
is_beta: defaults to false. When true, uses structured parsing (Unix only, recommended for .smcl + dict format)
lines: content truncation. 0 returns all; positive returns first N items; negative returns last |N| items

Notes:

File must be within the stata-mcp-folder directory (security boundary)
Beta mode uses the StataLog parser and may contain parsing errors

6. Write Do-File (Deprecated)

write_dofile is marked as deprecated. Modern AI agents have native file-writing capabilities. Use the Write tool to create do-files instead of calling this MCP tool.

This tool is disabled by default and only available when STATA_MCP__ENABLE_WRITE_DOFILE=true. It will be removed in a future version.

Typical Workflow

Scenario A: Full Data Analysis Pipeline

get_data_info — explore data structure
Write do-file based on data characteristics (Write tool)
Ask for approval and use ado_package_install; inspect GitHub repositories first
stata_do — execute the do-file
read_log — inspect execution results if needed

Scenario B: Troubleshooting Syntax Errors

help — look up official command documentation (Unix only)
Fix the code in the do-file
stata_do — re-execute to verify

Scenario C: Install and Use a New Package

Confirm the exact package and source with the user
ado_package_install("pkgname") — request approval and install the package
help(cmd="pkg_name", replace=true) — explicitly refresh and check package usage
Use the package commands in the do-file
stata_do — execute

Edge Cases

help Unix limitation: not available on Windows; guide users to alternative documentation methods
write_dofile deprecated: do not use this tool to write do-files; use the Write tool instead
security guard enabled by default: dangerous commands (shell, erase, rm, !) in do-files are blocked. To disable, set STATA_MCP__IS_GUARD=false (not recommended)
RAM monitoring disabled by default: to monitor Stata process memory, set STATA_MCP__IS_MONITOR=true and STATA_MCP__RAM_LIMIT
path boundary check: do-files and log files must be within whitelisted directories; otherwise execution is rejected
SSC installation slow: ado_package_install from SSC source may take time; skip if the package is already installed

References

Name	Location	Description
Installation	`@references/installation.md`	Installation and configuration guide for MCP-for-Stata
stata_do	`@references/stata_do.md`	Detailed guide for the execution tool
get_data_info	`@references/get_data_info.md`	Detailed guide for the data exploration tool
help	`@references/help.md`	Detailed guide for the documentation tool
read_log	`@references/read_log.md`	Detailed guide for the log reader tool
ado_package_install	`@references/ado_package_install.md`	Detailed guide for the package installer tool
Documentation	docs.statamcp.com	Full user documentation
Homepage	statamcp.com	Project homepage
Source Code	github.com/sepinetam/mcp-for-stata	GitHub repository

MCP-for-Stata

Prerequisites

When to Use

Workflow

1. First Encounter with Data → get_data_info

2. Execute Stata Code → stata_do

3. Install Third-Party Packages → ado_package_install

4. Look Up Command Documentation → help

5. Read Execution Log → read_log

6. Write Do-File (Deprecated)

Typical Workflow

Scenario A: Full Data Analysis Pipeline

Scenario B: Troubleshooting Syntax Errors

Scenario C: Install and Use a New Package

Edge Cases

References

1. First Encounter with Data → `get_data_info`

2. Execute Stata Code → `stata_do`

3. Install Third-Party Packages → `ado_package_install`

4. Look Up Command Documentation → `help`

5. Read Execution Log → `read_log`