deep-research

star 347

Use this skill for any task that requires in-depth research, investigation, or comprehensive report generation on a topic. This includes: producing industry analysis, market research, technology surveys, competitive intelligence, trend reports, or any deliverable that synthesizes information from multiple sources into a structured long-form document; answering complex questions that require gathering evidence from the web, analyzing data, and presenting findings with charts and citations; any request where the user explicitly asks for a 'report', 'research', 'survey', 'white paper', or 'deep dive'. Trigger especially when the task cannot be answered from memory alone and requires active information gathering. Do NOT trigger for simple factual Q&A, code-only tasks, single-file data analysis (use data-analysis skill), or creative writing without research backing.

opencmit By opencmit schedule Updated 2/26/2026

name: deep-research description: "Use this skill for any task that requires in-depth research, investigation, or comprehensive report generation on a topic. This includes: producing industry analysis, market research, technology surveys, competitive intelligence, trend reports, or any deliverable that synthesizes information from multiple sources into a structured long-form document; answering complex questions that require gathering evidence from the web, analyzing data, and presenting findings with charts and citations; any request where the user explicitly asks for a 'report', 'research', 'survey', 'white paper', or 'deep dive'. Trigger especially when the task cannot be answered from memory alone and requires active information gathering. Do NOT trigger for simple factual Q&A, code-only tasks, single-file data analysis (use data-analysis skill), or creative writing without research backing." license: Apache-2.0 metadata: author: alphora-team version: "1.0" tags: ["research", "report", "web-search", "data-analysis", "markdown"]

Requirements for Outputs

Final Report

Structure

  • Title page: Topic, date, author line
  • Executive summary: 3-5 sentence overview of key findings (written last)
  • Table of contents: Auto-generated from headings
  • Body sections: Logically organized with H2/H3 headings
  • Conclusion & recommendations: Actionable takeaways
  • References: Numbered list of all sources with URLs

Quality Standards

  • Every factual claim MUST cite its source with [n] notation linking to the references section
  • Charts and images MUST have captions explaining what they show
  • Data tables MUST include units and time periods
  • Minimum 3 distinct sources for any major conclusion
  • No hallucinated statistics — every number must trace to collected evidence or computed code output

Visual Requirements

  • Include at least one data-driven chart per major section (bar, line, pie, etc.)
  • Reference images should be downloaded locally and embedded via relative paths
  • All images saved under /mnt/workspace/report/assets/
  • Image references in markdown: ![Caption](assets/filename.png)

Formatting

  • Use consistent heading hierarchy (H1 for title, H2 for sections, H3 for subsections)
  • Use tables for structured comparisons
  • Use blockquotes for key findings or direct quotes
  • Number formatting: thousands separator for large numbers, 1 decimal for percentages

Intermediate Artifacts

All research materials MUST be persisted in the workspace so findings are not lost between iterations:

Directory Purpose
/mnt/workspace/research/sources/ Extracted web page content (.txt files)
/mnt/workspace/research/data/ Downloaded datasets and raw data
/mnt/workspace/research/images/ Downloaded reference images
/mnt/workspace/research/notes/ Research notes and outlines
/mnt/workspace/report/ Final report markdown
/mnt/workspace/report/assets/ Report images (charts + reference images)

Sandbox Environment

Path Purpose Access
/mnt/workspace/ Working directory for all research materials and outputs Read/Write
/mnt/skills/deep-research/ Skill scripts and references Read-only

The sandbox has network access for web searches and downloads.

Available Scripts

Script Purpose Usage
scripts/web_search.py Search the web and return structured results python /mnt/skills/deep-research/scripts/web_search.py "query" --max-results 10
scripts/fetch_page.py Fetch a URL and extract clean text content python /mnt/skills/deep-research/scripts/fetch_page.py "https://example.com" --output /mnt/workspace/research/sources/page.txt
scripts/download_file.py Download files (images, data, PDFs) to workspace python /mnt/skills/deep-research/scripts/download_file.py "https://example.com/chart.png" --output /mnt/workspace/research/images/chart.png
scripts/compile_report.py Validate and compile the final report python /mnt/skills/deep-research/scripts/compile_report.py /mnt/workspace/report/report.md --validate --to-html

Reference Documentation

Document Content
references/REPORT_TEMPLATE.md Report structure template — use as starting skeleton

Dependency Installation

If dependencies are missing, install them first:

pip install requests beautifulsoup4 duckduckgo-search pandas matplotlib -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

Research Workflow

CRITICAL: Research is Iterative, Not Linear

Deep research follows a collect → analyze → identify gaps → collect more loop. Do NOT attempt to write the final report in one pass. Instead, accumulate materials over multiple iterations, then synthesize.

Phase 1: Planning

Before any searching, create a research plan:

plan = """
# Research Plan: [Topic]

## Core Questions
1. [Primary question to answer]
2. [Secondary question]
3. [...]

## Information Needs
- Market data: [what numbers/trends are needed]
- Expert opinions: [whose perspectives matter]
- Case studies: [specific examples to find]
- Comparisons: [what to compare against]

## Planned Sections
1. [Section title] — sources needed: [type]
2. [Section title] — sources needed: [type]
3. [...]
"""
with open('/mnt/workspace/research/notes/plan.md', 'w') as f:
    f.write(plan)

Save the plan to /mnt/workspace/research/notes/plan.md so it persists across iterations.

Phase 2: Material Collection

Web Search

Search broadly first, then narrow down:

# Broad search
python /mnt/skills/deep-research/scripts/web_search.py "topic overview" --max-results 10

# Targeted follow-up
python /mnt/skills/deep-research/scripts/web_search.py "topic specific aspect 2024 data" --max-results 5

Content Extraction

For promising URLs, fetch the full content:

python /mnt/skills/deep-research/scripts/fetch_page.py "https://example.com/article" \
  --output /mnt/workspace/research/sources/article_name.txt

Image and Data Collection

Download relevant images, datasets, or charts:

python /mnt/skills/deep-research/scripts/download_file.py "https://example.com/chart.png" \
  --output /mnt/workspace/research/images/market_share.png

Collection Strategy

  • Search in multiple languages if the topic is international
  • Use different query angles for the same topic (statistics, trends, opinions, case studies)
  • Save every useful source — you can filter later
  • Record the source URL in each saved file for citation

Phase 3: Data Analysis

When you have quantitative data, analyze it with Python:

import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['Source Han Sans CN', 'WenQuanYi Micro Hei', 'SimHei', 'DejaVu Sans']
plt.rcParams['axes.unicode_minus'] = False

# Load collected data
df = pd.read_csv('/mnt/workspace/research/data/market_data.csv')

# Analyze
summary = df.groupby('category')['revenue'].sum().sort_values(ascending=False)
print(summary)

# Generate chart for the report
fig, ax = plt.subplots(figsize=(10, 6))
summary.plot(kind='bar', ax=ax)
ax.set_title('Revenue by Category')
ax.set_ylabel('Revenue (¥)')
plt.tight_layout()
plt.savefig('/mnt/workspace/report/assets/revenue_by_category.png', dpi=150, bbox_inches='tight')
plt.close()
print("Chart saved.")

For predictive analysis, use appropriate statistical methods:

  • Trend extrapolation: Linear/polynomial regression with numpy.polyfit
  • Growth rates: Compound annual growth rate (CAGR) calculations
  • Comparisons: Percentage differences, ratio analysis

Always state the methodology and limitations of any prediction.

Phase 4: Report Drafting

Step 1: Load the template

Read references/REPORT_TEMPLATE.md for the structural skeleton.

Step 2: Write section by section

Do NOT write the entire report at once. Write each section as a separate operation, referencing your collected materials:

section = """
## Market Overview

The global market for [X] reached ¥{value} billion in 2024, 
representing a {growth}% year-over-year increase [1]. 

![Market Size Trend](assets/market_trend.png)
*Figure 1: Market size evolution from 2020 to 2024. Source: [1]*

Key drivers include:
- **Factor A**: Description with evidence [2]
- **Factor B**: Description with evidence [3]

| Region | Market Share | Growth Rate |
|--------|-------------|-------------|
| Asia   | 45.2%       | 12.3%       |
| Europe | 28.1%       | 8.7%        |
| Americas | 26.7%     | 10.1%       |

*Table 1: Regional market distribution. Source: [1][4]*
"""

Step 3: Assemble the full report

Once all sections are written, assemble them into the final document and save to /mnt/workspace/report/report.md.

Step 4: Write executive summary last

After the full report is written, compose the executive summary based on the actual findings.

Phase 5: Validation and Polish

Compile and validate

python /mnt/skills/deep-research/scripts/compile_report.py \
  /mnt/workspace/report/report.md --validate --to-html

This checks:

  • All image references resolve to existing files
  • Citation numbers [n] match entries in the references section
  • Report structure is complete (title, TOC, body, references)

Review checklist

  • Every factual claim has a [n] citation
  • Every chart has a caption and source attribution
  • Numbers are formatted consistently (thousands separator, decimal places)
  • All images exist at their referenced paths
  • Executive summary accurately reflects the report content
  • References section lists all cited sources with URLs
  • No placeholder text remains ([TODO], [TBD], xxx)

Verification Checklist

Research Quality

  • At least 5 distinct sources consulted
  • Data from multiple independent sources for key claims
  • Recency check: primary data is from the last 1-2 years
  • Conflicting viewpoints acknowledged where they exist

Report Completeness

  • Executive summary present
  • All planned sections written
  • At least one chart/figure per major section
  • References section with numbered entries
  • Conclusion with actionable recommendations

Technical Accuracy

  • All code-generated numbers match the report text
  • Charts accurately represent the underlying data
  • Predictions clearly labeled as estimates with methodology stated
  • Units and time periods specified for all data

Error Handling

  • Search returns no results: Try alternative queries, different keywords, or broader terms
  • Page fetch fails: Try with different headers, or note the URL for manual review
  • Image download fails: Skip and note in report, or find alternative image
  • Missing dependencies: pip install requests beautifulsoup4 duckduckgo-search pandas matplotlib -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
  • Network timeout: Retry with longer timeout, or move to next source
  • Encoding issues: Try encoding='utf-8', then 'gbk', then 'latin1'
Install via CLI
npx skills add https://github.com/opencmit/alphora --skill deep-research
Repository Details
star Stars 347
call_split Forks 33
navigation Branch main
article Path SKILL.md
More from Creator