roblox-game-data-scraper - SKILL.md Agent Skill

name: roblox-game-data-scraper description: 自动化抓取 Roblox 游戏数据的完整工具链。支持从 Trello、Discord、Reddit 和游戏内 API 收集代码、物品、角色、交易价值等结构化数据。 keywords: roblox, data scraping, trello, discord, reddit, game data, automation

Roblox Game Data Scraper - 游戏数据自动化抓取器

这个 skill 帮助你自动化收集 Roblox 游戏的所有核心数据，是内容生成的数据基础。

核心价值主张

手动从 Trello、Discord、Reddit 收集游戏数据需要数小时。这个 skill 能在 10 分钟内自动抓取并结构化所有数据，直接生成可用的 TypeScript 类型定义。

支持的数据源

1. Trello 看板 (官方数据)

游戏机制说明
物品属性和掉落率
角色技能数据
更新日志

2. Discord 频道 (实时更新)

最新代码发布
开发者公告
社区反馈
Bug 报告

3. Reddit 社区 (玩家智慧)

交易价值共识
隐藏机制发现
最佳构建分享
常见问题解答

4. 游戏内 API (如果可用)

实时在线人数
服务器状态
排行榜数据

数据输出格式

代码数据 (Codes)

interface GameCode {
  code: string;
  reward: string;
  status: 'Active' | 'Expired';
  expiryDate?: string;
  requirements?: string;
  source: 'Discord' | 'Twitter' | 'Trello';
  verifiedDate: string;
}

// 示例输出
export const jujutsuInfiniteCodes: GameCode[] = [
  {
    code: 'LUNAR_FAREWELL',
    reward: '50 Spins',
    status: 'Active',
    expiryDate: '2026-03-22',
    source: 'Discord',
    verifiedDate: '2026-03-17'
  }
];

物品数据 (Items)

interface GameItem {
  id: string;
  name: string;
  category: 'Weapon' | 'Accessory' | 'Material' | 'Consumable';
  rarity: 'Common' | 'Rare' | 'Legendary' | 'Mythical';
  stats: Record<string, number>;
  dropRate?: number;
  obtainMethod: string[];
  tradingValue?: string;
  description: string;
}

// 示例输出
export const items: GameItem[] = [
  {
    id: 'split-soul-katana',
    name: 'Split Soul Katana',
    category: 'Weapon',
    rarity: 'Mythical',
    stats: {
      damage: 150,
      speed: 1.2,
      range: 5
    },
    dropRate: 0.0025,
    obtainMethod: ['Soul Curse Raid', 'Craft at Zen Forest'],
    tradingValue: '15 DF',
    description: 'Legendary katana with soul-splitting abilities'
  }
];

角色/技能数据 (Techniques)

interface Technique {
  id: string;
  name: string;
  tier: 'S' | 'A' | 'B' | 'C' | 'D';
  rarity: number;  // 掉落率百分比
  moves: Move[];
  domain?: DomainExpansion;
  awakening?: Awakening;
  pvpRating: number;
  pveRating: number;
}

interface Move {
  name: string;
  damage: number;
  cooldown: number;
  ceConsumption: number;
  description: string;
}

交易价值数据 (Trading Values)

interface TradeValue {
  itemId: string;
  itemName: string;
  value: string;  // "15 DF" 或 "2,200 DF"
  demand: 'Hyped' | 'High' | 'Medium' | 'Low';
  trend: 'Rising' | 'Stable' | 'Falling';
  lastUpdated: string;
}

使用方法

场景 1: 为新游戏创建完整数据库

# 1. 配置游戏信息
python resources/scraper_config.py --game "Your Bizarre Adventure" \
  --trello "https://trello.com/b/..." \
  --discord "https://discord.gg/..." \
  --reddit "r/YourBizarreAdventure"

# 2. 运行完整抓取
python resources/full_scraper.py --game yba --output ./data/yba/

# 3. 生成 TypeScript 类型
python resources/generate_types.py --input ./data/yba/ --output ./src/data/yba/

# 输出文件:
# - ./src/data/yba/codes.ts
# - ./src/data/yba/items.ts
# - ./src/data/yba/techniques.ts
# - ./src/data/yba/trading_values.ts

场景 2: 监控代码更新

# 设置 Discord webhook 监控
python resources/discord_monitor.py \
  --channel-id "123456789" \
  --keywords "code,codes,redeem" \
  --webhook "https://your-site.com/api/new-code"

# 当检测到新代码时，自动:
# 1. 提取代码和奖励
# 2. 更新 codes.ts 文件
# 3. 触发网站重新部署
# 4. 发送通知

场景 3: 抓取 Reddit 交易价值

# 从 Reddit 帖子提取交易价值
python resources/reddit_value_scraper.py \
  --subreddit "QMGames" \
  --keywords "trading,value,worth" \
  --output ./data/trading_values.json

# 自动识别:
# - "Split Soul = 15 DF"
# - "Heian Gauntlets worth 2.2K DF"
# - "LF: Dragon Bone (1.8K DF)"

关键文件说明

`resources/trello_scraper.py`

从 Trello 看板抓取结构化数据：

def scrape_trello_board(board_url: str) -> dict:
    """
    抓取 Trello 看板的所有卡片和列表
    
    返回:
    {
        'items': [...],
        'techniques': [...],
        'mechanics': [...],
        'changelog': [...]
    }
    """
    # 使用 Trello API 或 Selenium
    # 解析卡片标题、描述、标签
    # 提取数值数据 (掉落率、伤害等)

`resources/discord_monitor.py`

实时监控 Discord 频道：

import discord

class CodeMonitor(discord.Client):
    async def on_message(self, message):
        if 'code' in message.content.lower():
            # 提取代码
            code = extract_code(message.content)
            # 保存到数据库
            save_code(code)
            # 触发网站更新
            trigger_deploy()

`resources/reddit_crawler.py`

爬取 Reddit 帖子和评论：

import praw

def scrape_subreddit(subreddit_name: str, keywords: list) -> list:
    """
    搜索 Reddit 帖子并提取相关信息
    
    返回:
    [
        {
            'title': '...',
            'content': '...',
            'upvotes': 123,
            'comments': [...]
        }
    ]
    """
    reddit = praw.Reddit(...)
    subreddit = reddit.subreddit(subreddit_name)
    # 搜索关键词
    # 提取数据

`resources/data_schemas/`

TypeScript 类型定义模板：

data_schemas/
├── codes.schema.ts
├── items.schema.ts
├── techniques.schema.ts
├── trading_values.schema.ts
└── game_config.schema.ts

数据验证和清洗

自动验证规则

def validate_code_data(code: dict) -> bool:
    """验证代码数据的完整性"""
    required_fields = ['code', 'reward', 'status']
    return all(field in code for field in required_fields)

def validate_item_data(item: dict) -> bool:
    """验证物品数据的完整性"""
    # 检查必填字段
    # 验证数值范围
    # 确保枚举值有效

数据去重

def deduplicate_codes(codes: list) -> list:
    """
    去除重复的代码
    保留最新的验证日期
    """
    seen = {}
    for code in codes:
        if code['code'] not in seen:
            seen[code['code']] = code
        elif code['verifiedDate'] > seen[code['code']]['verifiedDate']:
            seen[code['code']] = code
    return list(seen.values())

最佳实践

1. 设置定时任务

# cron 配置
0 */6 * * * python resources/full_scraper.py --game jujutsu-infinite
0 */1 * * * python resources/discord_monitor.py --check-codes
0 0 * * 0 python resources/reddit_value_scraper.py --weekly-update

2. 数据版本控制

# 每次抓取保存快照
./data/
├── jujutsu-infinite/
│   ├── 2026-03-17/
│   │   ├── codes.json
│   │   ├── items.json
│   │   └── techniques.json
│   └── latest/  # 符号链接到最新版本

3. 错误处理

try:
    data = scrape_trello_board(url)
except TrelloAPIError as e:
    # 记录错误
    logger.error(f"Trello API 失败: {e}")
    # 使用缓存数据
    data = load_cached_data()
except Exception as e:
    # 发送告警
    send_alert(f"抓取失败: {e}")

与现有项目集成

在 Next.js 项目中使用

// 1. 运行抓取器
// npm run scrape-data

// 2. 导入生成的数据
import { jujutsuInfiniteCodes } from '@/data/jujutsu-infinite/codes';
import { items } from '@/data/jujutsu-infinite/items';

// 3. 在页面中使用
export default function CodesPage() {
  return (
    <div>
      {jujutsuInfiniteCodes.map(code => (
        <CodeCard key={code.code} {...code} />
      ))}
    </div>
  );
}

自动化部署流程

# .github/workflows/data-update.yml
name: Update Game Data
on:
  schedule:
    - cron: '0 */6 * * *'  # 每 6 小时
jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - name: Scrape Data
        run: python resources/full_scraper.py
      - name: Generate Types
        run: python resources/generate_types.py
      - name: Commit Changes
        run: |
          git add data/
          git commit -m "chore: update game data"
          git push
      - name: Trigger Deploy
        run: curl -X POST ${{ secrets.DEPLOY_WEBHOOK }}

成功案例

案例 1: Jujutsu Infinite 完整数据库

数据源: Trello + Discord + Reddit
抓取时间: 8 分钟
数据量:
- 25 个代码
- 150+ 物品
- 30 个技能
- 200+ 交易价值
更新频率: 每 6 小时自动更新

案例 2: 实时代码监控

监控频道: 5 个 Discord 服务器
检测延迟: < 2 分钟
准确率: 98% (自动过滤假代码)
网站更新: 自动触发重新部署

扩展功能

1. 图片抓取

def scrape_item_images(item_name: str) -> str:
    """
    从游戏 Wiki 或 Discord 抓取物品图片
    自动下载并优化为 WebP 格式
    """

2. 视频教程提取

def extract_youtube_guides(keyword: str) -> list:
    """
    搜索 YouTube 相关教程视频
    提取视频 ID、标题、观看量
    """

3. 社区情绪分析

def analyze_community_sentiment(item_name: str) -> dict:
    """
    分析 Reddit/Discord 对某物品的讨论
    返回:
    {
        'sentiment': 'Positive' | 'Neutral' | 'Negative',
        'hype_level': 0-100,
        'common_opinions': [...]
    }
    """

注意事项

法律和道德

✅ 遵守网站的 robots.txt
✅ 使用合理的请求频率 (避免 DDoS)
✅ 尊重 API 使用限制
✅ 标注数据来源
❌ 不要抓取付费内容
❌ 不要绕过登录验证

技术限制

Discord: 需要 Bot Token (申请官方 Bot)
Reddit: 需要 API Key (免费申请)
Trello: 公开看板可直接抓取
动态内容: 可能需要 Selenium/Playwright

故障排查

问题 1: Discord Bot 无法连接

# 检查 Token 是否有效
python -c "import discord; print(discord.__version__)"

# 确认 Bot 权限
# 需要: Read Messages, Read Message History

问题 2: Reddit API 限流

# 使用 PRAW 的内置限流处理
reddit = praw.Reddit(
    client_id='...',
    client_secret='...',
    user_agent='...',
    ratelimit_seconds=600  # 限流时等待 10 分钟
)

问题 3: Trello 数据格式变化

# 使用灵活的解析器
def parse_trello_card(card):
    try:
        return extract_structured_data(card)
    except Exception:
        # 回退到模糊匹配
        return fuzzy_extract(card)