content-discovery - SKILL.md Agent Skill

name: content-discovery description: Debug and troubleshoot ettametta's content discovery and trend scanning system. Use when investigating scan failures, scoring anomalies, platform scanner issues, data normalization problems, or the discovery-to-video pipeline flow.

Content Discovery Debugging

Architecture

Dual-language: Python orchestrator + Go high-performance scanner.

Python (src/services/discovery/service.py): DiscoveryService — scanning, caching, filtering, persistence, AI ranking, recursive expansion.

Go (src/services/discovery-go/): Gin HTTP service (port 8080) with YouTube API + DuckDuckGo fallback, worker pool (10 goroutines), results flow to Python via AIBridge.

Quick Diagnostics

# Trigger scan
curl -X POST http://localhost:8000/api/v1/discovery/scan \
  -H "Content-Type: application/json" \
  -d '{"niche": "tech", "platforms": ["youtube"]}'

# Go service health
curl http://discovery-go:8080/health

# Monitored niches
curl http://localhost:8000/api/v1/discovery/niches

Scan Pipeline

Redis cache check (skipped on deep scans)
Parallel multi-platform scanning (asyncio.gather)
DB fallback if no live results
Scraper swarm fallback (video_lead_scanner)
Quality auditing per candidate
Monetization-mode filtering
Batch persistence to PostgreSQL
Viral score recalculation (70% velocity / 30% original)
Recursive AI expansion (Groq identifies sub-niches)

Platform Scanners (19 total)

Primary (every scan)

Scanner	Method
YouTube Shorts	YouTube Data API v3
YouTube Long	YouTube Data API v3
CloakBrowser YouTube	Stealth Playwright via discovery-scraper
CloakTikTok	Cloak + httpx fallback
DuckDuckGo	Free web scraping fallback

Secondary (deep scans / premium)

Scanner	Method
Reddit	JSON API
CloakX/Instagram/Facebook/LinkedIn	Cloak + httpx
Twitch, Pinterest, Snapchat, Bilibili, Rumble, Skool	Web scrape
Google Trends, Google Search	API/scrape
Public Domain (Pexels, Archive.org)	Stock content

Viral Scoring

Base viral score (0-100)

Velocity (max 50): min(velocity/10, 50)
Engagement (max 25): min(engagement_score/10, 25)
Duration bonus (max 15): +15 for 15-60s, +10 for 60-180s

Platform-specific

YouTube: (velocity/100) * (1 + engagement*10), clamped 1-99
TikTok: (views/5000) * (1 + engagement*10), clamped 1-95
Go service: 35% VPH + 35% engagement + 20% momentum + 10% recency

AI pattern deconstruction (deconstructor.py)

Groq llama3-70b extracts: hook_score, retention_estimate, pacing_bpm, style_keywords, emotional_triggers.

Quality auditing (eligibility.py)

30+ indicators with penalty scoring. Below 0.6 = flagged low quality. Freshness: 1-30 days = "viral sweet spot".

Data Normalization

All platforms normalize to ContentCandidate (models.py):

id (prefixed: "yt_", "tt_", "cloak_yt_"), platform, source_uri
view_count, like_count, comment_count, share_count
velocity (views/hour), engagement_score, viral_score (0-100)
quality_score, quality_flags, analysis_results

Discovery-to-Video Pipeline

Path	Trigger	Flow
ViralContentPipeline	API call	Discover -> Analyze -> AI Video -> Compile
Nexus Trigger	Every 1h (Celery)	High-potential candidates -> NexusJob -> cinema_video
Sentinel Auto-Pilot	Every 4h (Celery)	Full Viral Loop: Discover -> Pick -> Render -> Publish
Batch Download	Manual/trigger	yt-dlp -> Stock fallback -> Safety asset

Periodic Tasks

Task	Schedule	Purpose
discovery.sentinel_watcher	4h	Full viral loop or trend scan per niche
scan_trending_content	2h	ScannerService with circuit breakers
discovery.process_high_potential	1h	High-potential -> Nexus video

Key Files

File	Purpose
src/services/discovery/service.py	Main DiscoveryService orchestrator
src/services/discovery/scanner_base.py	ABC with velocity/score methods
src/services/discovery/models.py	ContentCandidate, ViralPattern
src/services/discovery/tasks.py	Celery tasks
src/services/discovery/scanner_service.py	Periodic scan with circuit breakers
src/services/discovery/analysis_service.py	AI content analysis
src/services/discovery/deconstructor.py	Pattern deconstruction
src/services/discovery/eligibility.py	Quality auditing
src/services/discovery/video_content_pipeline.py	End-to-end pipeline
src/services/discovery-go/scanner.go	Go YouTube scanner + DDG fallback
src/services/discovery-go/bridge.go	AIBridge to Python API

Common Issues

All platforms returning empty

Scraper service down? Circuit breakers open? Check discovery-scraper connectivity.

Stale discovery cache

discovery:trends:* keys have no TTL. Clear manually:

redis-cli -p 7204 -a "$REDIS_PASSWORD" --scan --pattern "discovery:trends:*" | xargs redis-cli del

Recursive expansion causes scan storm

Groq identifies 3 sub-niches per scan, each triggers a background Celery task. Can snowball.

Scanner circuit breaker open

Per-platform (3 failures, 600s recovery). Check logs.

Quality audit too aggressive

30+ penalties. Check eligibility.py thresholds.

Debugging Checklist

Is discovery-scraper reachable?
Circuit breakers open? docker compose logs api | grep circuit
Redis cache: redis-cli --scan --pattern "discovery:*"
Go service: curl http://discovery-go:8080/health
Celery tasks active: celery -A src.api.utils.celery inspect active
DB candidates: SELECT count(*) FROM content_candidates;
Niche expansion: SELECT distinct niche FROM content_candidates;