name: sentiment-analysis description: Collect multi-channel public-opinion/feedback data (web search + Feishu exports), run hybrid statistical + LLM analysis, and publish an interactive HTML report to S3 behind a presigned URL. Use when the user asks to analyze 舆情/feedback/reviews for a product or topic, or mentions "舆情分析", "feedback report", or analyzing collected chat/social data.
舆情分析 Skill
Hybrid pipeline: deterministic stats in Python, semantic labeling/synthesis via fanned-out subagents, output a self-contained interactive HTML report published to S3.
All scripts run from the skill directory and import as python3 -m scripts.<module> (this environment's binary is python3). Runtime artifacts go under data/.
Step 0 — Preflight (gate; do not proceed unless all pass)
- Config: Run
python3 -c "from scripts.config import load_config; print(load_config())".- If
None(first use): ask the user for bucket name, key prefix, region, and presign expiry (default 604800s = 7 days). Save viascripts.config.save_config.
- If
- Python deps:
python3 -c "from scripts.preflight import ensure_python_deps; print(ensure_python_deps(['boto3','jieba','jsonschema']))"— must printTrue. - AWS creds:
python3 -c "from scripts.preflight import check_aws_credentials; print(check_aws_credentials())"— must printTrue, else tell the user to configure AWS credentials and stop. - S3 bucket + write: check
check_s3_bucket(bucket, region)returnsokands3_write_probe(bucket, prefix, region)returnsTrue. Onmissing/forbidden/region_mismatch, report the specific cause and stop. - MCP: confirm
kiro-web-searchis available if the data source includes web. If unavailable and web was requested, warn and fall back to Feishu-only.
Step 1 — Parse intent
Determine: monitoring subject, time range, data sources (web/feishu), Feishu file path if any.
Step 2 — Collect → data/raw_records.json
- Web: derive several query variants from the subject; call
kiro-web-search; for promising hits, fetch full text withWebFetch. Collect items as dicts (title/content/url/site/author/date) and run them throughscripts.normalize.normalize(items, "web"). - Feishu:
scripts.feishu_adapter.parse_feishu(path). - Merge both lists; write to
data/raw_records.json. Record dropped/failed items for the coverage note.
Step 3 — Deterministic stats
Run scripts.stats.add_tokens(records) then compute full-range term_frequency / daily_volume / pareto_by_author / mau / lifespan. Persist tokens back into the records. (Frontend recomputes these per filter; full-range values feed the synthesis context.)
Step 4 — B0: fixed rubric (one subagent)
Dispatch one subagent over a representative sample (~50–100 records) to produce a fixed taxonomy: topic categories, Kano candidate features, JTBD candidate jobs. This rubric is passed verbatim to all B2 subagents to keep labels consistent.
Step 5 — B2: per-record labeling (parallel fan-out)
Use the dispatching-parallel-agents pattern. Split records into batches (~50–100 each). Each subagent receives its batch + the fixed rubric and returns strict JSON matching rubric/label.schema.json: [{id, sentiment{label,score}, topic, painpoint{flag,severity,type}}]. Validate each batch with scripts.models.validate_labels; on failure, retry once, then degrade that batch's labels to topic:"其他", painpoint.flag:false, sentiment:{label:"neu",score:0}. Subagents return only labels (not original text). Merge labels into records by id.
Step 6 — B3: synthesis (1–3 subagents)
Dispatch subagent(s) over the aggregated label stats + sampled quotes to produce corpus-level JTBD jobs / Kano categorization / topic-cluster naming, returning JSON matching rubric/synth.schema.json. Validate with scripts.models.validate_synthesis.
Step 7 — Assemble + build report
Write data/enriched_dataset.json = {meta, records (with tokens+labels), stats, synthesis}. meta includes subject, range, source mix, coverage, and any sampling caveats (including the "MAU/retention — 按账号近似" note). Then:
python3 -c "from scripts.build_report import build; build('data/enriched_dataset.json','templates/report.html.tmpl','data/report.html')"
Step 8 — Publish
python3 -c "from scripts.publish_s3 import publish; from scripts.config import load_config; import datetime; print(publish('data/report.html', load_config(), '<subject>', datetime.datetime.now().strftime('%Y%m%d-%H%M%S')))"
Return the printed presigned URL to the user.
Edge cases
- Web zero-results / fetch failures → skip + count; note coverage in
meta. - Missing timestamps → trend/MAU bucket them as "未知".
- Oversized corpus → cap/sample batches; note sampling口径 in
meta. - Topic not in fixed taxonomy → "其他".