name: fb-competitor-collector description: Use when a business user asks Codex to collect, resume, import, filter, or sync visible Facebook competitor/internal-page posts to the Feishu sheet FB竞品帖子链接. Use OpenCLI Browser Bridge only for live Facebook capture, with SQLite dedupe/enrichment and strict Feishu output gating.
FB Competitor Collector
This skill turns business-language requests into the collector workflow:
request scope
-> preflight and automatic recovery where safe
-> OpenCLI Browser Bridge account/post capture
-> SQLite dedupe and enrichment queue
-> strict quality gate
-> Feishu output
-> Codex business summary
Keep user-facing interaction natural-language first. Do not ask business users to run shell commands unless they explicitly want commands.
References
Read only when needed:
README_FOR_OPERATOR.md: business-facing explanations and sample prompts.docs/architecture.md: browser/runtime architecture and account flow.docs/data-contract.md: fields, Feishu A-K output, status meanings, and quality gate.docs/script-reference.md: full command reference and validation commands.docs/troubleshooting.md: recovery order for auth, OpenCLI, Facebook login, and incomplete jobs.
Standard Flow
For every capture/write request unless the user asks for a read-only check:
- Interpret scope: all configured accounts, one account, direct account URL, direct post URL, existing SQLite rows, or local JSON/CSV import.
- If scope is ambiguous and cannot be inferred from config or user text, ask one concise question.
- Run preflight before live capture, import with real sync, or Feishu write:
python3 scripts/check_env.py --config config/settings.yaml --fix-auth --fix-opencli - Silently recover what Codex can recover: lark-cli user-mode settings, refreshable user token, bounded OpenCLI daemon startup.
- Stop only for human blockers: Facebook logout, visitor preview, CAPTCHA/risk control, wrong Chrome profile, disconnected Browser Bridge after bounded recovery, or page without visible real posts.
- Use the correct business entrypoint. Do not manually stitch stages unless debugging.
- Follow emitted
next_commandsfirst when a job is interrupted, incomplete, or recoverable. - Finish with account-level counts, final usable/synced count, completeness, blockers, and special posts with almost no fields.
Request Routing
Check readiness:
python3 scripts/check_env.py --config config/settings.yaml --fix-auth --fix-opencliRead configured accounts:
python3 scripts/read_accounts.py --config config/settings.yamlCapture all configured accounts and sync complete rows:
python3 scripts/run_accounts_job.py --config config/settings.yaml --last-hours 24 --syncCapture all configured accounts for a date:
python3 scripts/run_accounts_job.py --config config/settings.yaml --target-date YYMMDD --syncCapture one account:
python3 scripts/run_account_job.py --config config/settings.yaml --account-url <url> --last-hours 24 --syncCapture one account for a date:
python3 scripts/run_account_job.py --config config/settings.yaml --account-url <url> --target-date YYMMDD --syncInclude visible coverage expectations when the user provides them: add
--expected-post-count <n>and/or--expected-labels "38m,1h,2h".Resume interrupted account work: use the emitted
next_commands; if needed, rerun scoped account job with--resume-only --force-recover-running --sync.Capture or补抓 a direct post URL: infer account/date from SQLite when possible; otherwise ask for account URL or visible account name. Import as a candidate if missing, then run the account-scoped resume/enrichment path before sync.
Import existing JSON/CSV:
python3 scripts/import_existing_result.py --config config/settings.yaml --input <file> --no-syncSync existing/imported rows: prefer scoped
run_account_job.py --resume-only --force-recover-running --syncwhen rows belong to an account/date. Direct sync/import/filter commands write only strict complete rows unless explicit audit/ledger mode is requested.Filter local library:
python3 scripts/filter_posts.py --config config/settings.yaml ...
Hard Rules
- Live Facebook capture uses OpenCLI Browser Bridge only.
- The job must operate on a tab/page it opened or explicitly matched; do not bind or occupy the user's active tab unconditionally.
- Do not import or sync logged-out, visitor-preview, empty-shell, or one-preview-post pages.
- Start account capture from the homepage top.
- Keep valid media/share candidates even if exact time, parent post, lead link, post type, engagement, article material, or summary is missing.
- Missing fields are enrichment work, not capture-time deletion.
- Normal
--syncwrites only rows passing the current strict quality gate. - Use
--sync-audit/--ledger-synconly when the operator explicitly asks for incomplete audit output. - Never write to the Feishu account source workbook.
- Never invent metrics, article summaries, lead links, or post times.
- Never store passwords, cookies, API keys, or tokens.
Completion And Reporting
A capture job is complete only when run_status=complete.
Report these fields in plain business language:
- accounts attempted
- candidates found per account
- final usable/synced rows per account
- whether required fields are complete
- top blockers or
next_commands - stage pressure when useful: coverage, exact time, lead link, engagement, post type, article material, summary, or Feishu sync
- extreme special cases such as a post with almost no fields captured
Treat high local/ledger candidate count with low final usable count as a补抓 state, not success.