tikhub-youtube-search

name: tikhub-youtube-search description: Lightweight TikHub YouTube search and video-detail workflow. Prioritizes single-request usage with curl or minimal Python, saves raw API JSON by default, and includes a small stdlib post-processor for CSV and simplified JSON. Use when the user wants YouTube comprehensive search results, continuation-token pagination, or structured video metadata from TikHub without a heavy wrapper.

TikHub YouTube Search

What this skill gives you

This skill is optimized for the common case: one search or one video detail request.

It provides:

Minimal request patterns
- curl for quickest validation
- tiny httpx example for people who prefer Python
Raw JSON saving
- save the full TikHub response after each request
- useful for audit, replay, and later post-processing
One optional post-processor
- postprocess_youtube_raw.py
- reads one raw file or a directory of raw files
- writes youtube_search_summary.csv and youtube_search_summary.json
Optional batch guidance
- enough information for concurrent use later
- intentionally brief, not the main path

Does not import TikHub-Multi-Functional-Downloader or any other project package.

API key requirement

This skill intentionally does not contain any API key.

Use one of these:

environment variable: TIKHUB_API_KEY
ask the user to provide an API key explicitly

If the key is missing, stop and ask for it instead of hardcoding one into scripts.

Install

pip install httpx

Post-processor: no extra packages.

API (for reference)

Search: GET https://api.tikhub.io/api/v1/youtube/web_v2/get_general_search_v2?keyword=...
Video details: GET https://api.tikhub.io/api/v1/youtube/web_v2/get_video_info?video_id=...&language_code=zh-CN&need_format=true
Header: Authorization: Bearer <API_KEY>

Notes from real requests

keyword should be passed through request params or URL-encoded.
need_format should be sent as true or false; a blank parameter may fail boolean parsing.
Search responses usually contain data.videos, data.shorts, data.channels, data.playlists, and sometimes continuation_token.
Detail responses may have empty structured fields while display fields are populated; view_count_text, like_count_text, and date_text are often more reliable than view_count, like_count, and upload_date.

Preferred path: single request

1. Quickest: `curl`

Search:

curl --location --request GET "https://api.tikhub.io/api/v1/youtube/web_v2/get_general_search_v2?keyword=Python%20tutorial" \
--header "Authorization: Bearer $TIKHUB_API_KEY"

Video detail:

curl --location --request GET "https://api.tikhub.io/api/v1/youtube/web_v2/get_video_info?video_id=_uQrJ0TkZlc&language_code=zh-CN&need_format=true" \
--header "Authorization: Bearer $TIKHUB_API_KEY"

With optional search filters:

curl --location --request GET "https://api.tikhub.io/api/v1/youtube/web_v2/get_general_search_v2?keyword=cute%20cats&type=video&sort_by=relevance" \
--header "Authorization: Bearer $TIKHUB_API_KEY"

2. Preferred Python pattern: tiny `httpx`

If the user wants Python, prefer a small request snippet, not a framework.

Search and save raw JSON:

import json
import os
import httpx

api_key = os.getenv("TIKHUB_API_KEY", "").strip()
if not api_key:
    raise SystemExit("Missing TIKHUB_API_KEY")

url = "https://api.tikhub.io/api/v1/youtube/web_v2/get_general_search_v2"
params = {"keyword": "Python tutorial"}
headers = {"Authorization": f"Bearer {api_key}", "Accept": "*/*"}

with httpx.Client(timeout=30.0, follow_redirects=True) as client:
    raw = client.get(url, params=params, headers=headers).json()

with open("youtube_search_raw.json", "w", encoding="utf-8") as f:
    json.dump(raw, f, ensure_ascii=False, indent=2)

for item in raw.get("data", {}).get("videos", [])[:5]:
    print(item.get("title", ""))
    print(item.get("url", ""))

Detail and save raw JSON:

import json
import os
import httpx

api_key = os.getenv("TIKHUB_API_KEY", "").strip()
if not api_key:
    raise SystemExit("Missing TIKHUB_API_KEY")

url = "https://api.tikhub.io/api/v1/youtube/web_v2/get_video_info"
params = {
    "video_id": "_uQrJ0TkZlc",
    "language_code": "zh-CN",
    "need_format": "true",
}
headers = {"Authorization": f"Bearer {api_key}", "Accept": "*/*"}

with httpx.Client(timeout=30.0, follow_redirects=True) as client:
    raw = client.get(url, params=params, headers=headers).json()

with open("youtube_detail_raw.json", "w", encoding="utf-8") as f:
    json.dump(raw, f, ensure_ascii=False, indent=2)

data = raw.get("data", {})
print("title:", data.get("title", ""))
print("author:", data.get("author", ""))
print("views:", data.get("view_count_text", "") or data.get("view_count", ""))
print("video_url:", data.get("video_url", ""))

Save raw JSON by default

For this workflow, the recommended default is:

request the API
save the full raw JSON immediately
print only a few useful fields for quick inspection
optionally run the post-processor later

Suggested file naming:

search raw: search_<keyword>_<request_id>.json
detail raw: detail_<video_id>_<request_id>.json

If request_id is unavailable, hash the keyword or video ID.

Post-process raw JSON

Save as postprocess_youtube_raw.py (stdlib only).

Input:

one raw search/detail JSON file
or a directory containing multiple raw JSON files

Output:

youtube_search_summary.csv
youtube_search_summary.json

#!/usr/bin/env python3
from __future__ import annotations

import argparse
import csv
import json
import os
import sys
from glob import glob
from typing import Any, Dict, List


def collect_inputs(path: str) -> List[str]:
    if os.path.isfile(path):
        return [path]
    if os.path.isdir(path):
        return sorted(glob(os.path.join(path, "*.json")))
    raise FileNotFoundError(path)


def as_list(value: Any) -> List[dict]:
    return value if isinstance(value, list) else []


def flatten_for_csv(row: Dict[str, Any]) -> Dict[str, Any]:
    out: Dict[str, Any] = {}
    for k, v in row.items():
        if v is None:
            out[k] = ""
        elif isinstance(v, (dict, list)):
            out[k] = json.dumps(v, ensure_ascii=False)
        else:
            out[k] = v
    return out


def simplify_search(raw: dict, source_file: str) -> Dict[str, Any]:
    data = raw.get("data") or {}
    videos = as_list(data.get("videos"))
    first_video = videos[0] if videos else {}
    return {
        "source_file": os.path.basename(source_file),
        "record_type": "search",
        "request_id": raw.get("request_id"),
        "api_code": raw.get("code"),
        "router": raw.get("router"),
        "keyword": data.get("keyword") or (raw.get("params") or {}).get("keyword", ""),
        "video_count": len(videos),
        "short_count": len(as_list(data.get("shorts"))),
        "channel_count": len(as_list(data.get("channels"))),
        "playlist_count": len(as_list(data.get("playlists"))),
        "continuation_token": data.get("continuation_token", ""),
        "top_video_id": first_video.get("video_id", ""),
        "top_video_title": first_video.get("title", ""),
        "top_video_author": first_video.get("author", ""),
        "top_video_url": first_video.get("url", ""),
        "top_video_views": first_video.get("view_count", ""),
    }


def simplify_detail(raw: dict, source_file: str) -> Dict[str, Any]:
    data = raw.get("data") or {}
    return {
        "source_file": os.path.basename(source_file),
        "record_type": "detail",
        "request_id": raw.get("request_id"),
        "api_code": raw.get("code"),
        "router": raw.get("router"),
        "video_id": data.get("video_id", ""),
        "title": data.get("title", ""),
        "author": data.get("author", ""),
        "channel_id": data.get("channel_id", ""),
        "channel_handle": data.get("channel_handle", ""),
        "channel_url": data.get("channel_url", ""),
        "is_verified": data.get("is_verified"),
        "view_count": data.get("view_count", ""),
        "view_count_text": data.get("view_count_text", ""),
        "like_count": data.get("like_count", ""),
        "like_count_text": data.get("like_count_text", ""),
        "comment_count": data.get("comment_count", ""),
        "date_text": data.get("date_text", ""),
        "relative_date_text": data.get("relative_date_text", ""),
        "length_seconds": data.get("length_seconds", ""),
        "video_url": data.get("video_url", ""),
        "playability_status": data.get("playability_status", ""),
        "chapter_count": len(as_list(data.get("chapters"))),
    }


def simplify_raw(raw: dict, source_file: str) -> Dict[str, Any]:
    router = raw.get("router") or ""
    if "get_general_search_v2" in router:
        return simplify_search(raw, source_file)
    if "get_video_info" in router:
        return simplify_detail(raw, source_file)
    return {
        "source_file": os.path.basename(source_file),
        "record_type": "unknown",
        "request_id": raw.get("request_id"),
        "api_code": raw.get("code"),
        "router": router,
        "error": "unsupported router for this post-processor",
    }


def main() -> int:
    ap = argparse.ArgumentParser(description="Raw TikHub YouTube JSON -> CSV + simplified JSON")
    ap.add_argument("--input", "-i", required=True, help="One .json file or a directory of .json")
    ap.add_argument("--out-dir", "-o", default=".", help="Output directory (default: current working directory)")
    args = ap.parse_args()

    try:
        files = collect_inputs(args.input)
    except FileNotFoundError as e:
        print("Input not found:", e, file=sys.stderr)
        return 2

    if not files:
        print("No JSON files found.", file=sys.stderr)
        return 2

    out_dir = os.path.abspath(args.out_dir)
    os.makedirs(out_dir, exist_ok=True)
    csv_path = os.path.join(out_dir, "youtube_search_summary.csv")
    json_path = os.path.join(out_dir, "youtube_search_summary.json")

    rows: List[Dict[str, Any]] = []
    for fp in files:
        try:
            with open(fp, "r", encoding="utf-8") as f:
                raw = json.load(f)
        except Exception as ex:
            rows.append({"source_file": os.path.basename(fp), "record_type": "broken_json", "error": f"json load: {ex}"})
            continue
        rows.append(simplify_raw(raw, fp))

    with open(json_path, "w", encoding="utf-8") as f:
        json.dump(
            {
                "generated_from": os.path.abspath(args.input),
                "record_count": len(rows),
                "records": rows,
            },
            f,
            ensure_ascii=False,
            indent=2,
        )

    flat = [flatten_for_csv(r) for r in rows]
    fieldnames = sorted({k for row in flat for k in row.keys()})
    with open(csv_path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
        writer.writeheader()
        for row in flat:
            writer.writerow({k: row.get(k, "") for k in fieldnames})

    print("Wrote:", csv_path)
    print("Wrote:", json_path)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Commands:

python postprocess_youtube_raw.py --input ./youtube_api_raw
python postprocess_youtube_raw.py --input ./youtube_detail_raw.json --out-dir .

Optional: concurrent or multi-request usage

Only use this when the user clearly needs many keywords, many video IDs, or pagination at scale.

Keep the batching layer thin:

accept a text file of keywords or video IDs
call the same two endpoints in a loop or thread pool
save one raw JSON per request
reuse postprocess_youtube_raw.py afterward

Recommended limits:

start with max_workers=3 to 5
reduce concurrency if you hit 429
keep filenames stable and collision-safe

Do not lead with a big wrapper if the task is only one search or one detail lookup.

End-to-end workflow

Provide TIKHUB_API_KEY.
Make a single search or detail request with curl or a tiny httpx snippet.
Save the full raw response JSON.
Inspect a few important fields directly.
If needed, run postprocess_youtube_raw.py on one file or a directory of raw files.

Troubleshooting

401/403: invalid API key or missing YouTube scopes.
429: rate limit; retry later or reduce concurrency.
need_format error: pass true or false, not a blank query parameter.
Empty structured fields in detail: prefer view_count_text, like_count_text, and date_text.
No search results: keyword too narrow, region-dependent results, or temporary upstream changes.

What this skill does not cover

downloading YouTube media streams
comment crawling
transcript extraction
channel-wide crawling beyond what the search response already includes