name: tiktok_download description: Single-file TikTok/Douyin video download and traffic metrics via TikHub API using only httpx; optional persisted raw API JSON plus a stdlib post-processor emitting CSV and simplified JSON. Supports one URL or concurrent batch (max 10 workers). No dependency on any project codebase.
TikHub Download Independent
What this skill gives you
Two small artifacts (all copy-pasteable from this file):
tikhub_independent.py(single file,httpxonly)- Single video or batch (parallel, max 10 workers): download MP4 + print metrics
Raw API JSON: by default every successful API response is written to disk as a full JSON file (same shape as e.g.raw_api_response.jsonin this repo — top-levelcode,request_id,params,data.aweme_detail, etc.). Use--no-save-rawto skip.
postprocess_tikhub_raw.py(stdlib only:json,csv,argparse,glob)- Reads one raw file or a directory of raw JSON files
- Writes into the current working directory (or
--out-dir):tikhub_videos_summary.csvtikhub_videos_summary.json(list of simplified records; one object per video)
Structured-data reference (field meanings, nesting): see project docTikHub_API_数据格式说明.md (same repo, path from project root). Example raw payload: raw_api_response.json.
Does not import TikHub-Multi-Functional-Downloader or any other project package.
Install
pip install httpx
Post-processor: no extra packages.
API (for reference)
- TikTok:
GET https://api.tikhub.io/api/v1/tiktok/app/v3/fetch_one_video_by_share_url?share_url=... - Douyin:
GET https://api.tikhub.io/api/v1/douyin/app/v3/fetch_one_video_by_share_url?share_url=... - Header:
Authorization: Bearer <API_KEY>
Mandatory prerequisite: valid TikHub API key
This skill may be used only when a valid TikHub API key is available.
Agent rule:
- First check whether this skill file already contains a usable API key value in the
API_KEYdefault section. - If the key is missing, empty, placeholder-only, or otherwise unavailable, stop the current task immediately.
- The agent must then tell the human user:
- they need to go to https://tikhub.io/ and apply for their own paid personal API key
- then fill that key into this file:
/AWorld/aworld-skills/tikhub_download/SKILL.md
- Until the human has provided and filled in a valid key, do not continue with download, data fetching, retry, workaround, or fabricated-output behavior.
Concrete example:
- If this file no longer contains a real key in the relevant
API_KEYlocation, treat the key as unavailable and follow the stop-and-notify rule above.
Part A — tikhub_independent.py (download + optional raw JSON)
Save the following as tikhub_independent.py.
Behavior note: After each fetch_video_info call, if saving is enabled (default), the entire parsed JSON object is written with json.dump(..., indent=2, ensure_ascii=False) — this is the audit / replay artifact for downstream tooling, not the simplified extract.
#!/usr/bin/env python3
"""
TikTok/Douyin: download MP4 + metrics via TikHub API. Optional: save full raw API JSON per request.
Requires: pip install httpx
Usage:
python tikhub_independent.py one "https://www.tiktok.com/@user/video/123"
python tikhub_independent.py one "URL" --no-save-raw
python tikhub_independent.py batch urls.txt
python tikhub_independent.py batch urls.txt --raw-dir my_raw_dir --max-workers 4
Raw JSON default directory (relative to current working directory): ./tikhub_api_raw
"""
from __future__ import annotations
import argparse
import concurrent.futures
import hashlib
import json
import os
import re
import sys
from typing import Any, Dict, List
from urllib.parse import urlparse
import httpx
API_KEY = os.getenv(
"TIKHUB_API_KEY",
"",
).strip()
MAX_WORKERS_CAP = 10
DEFAULT_OUT = os.path.expanduser("~/Downloads/tikhub_independent")
DEFAULT_RAW_DIR = "tikhub_api_raw"
def clean_name(name: str, max_len: int = 60) -> str:
name = re.sub(r'[\\/:*?"<>|]+', "_", (name or "").strip())
name = re.sub(r"\s+", " ", name).strip()
return (name[:max_len] or "video").strip(" ._")
def platform_from_url(url: str) -> str:
host = (urlparse(url).netloc or "").lower()
if "douyin.com" in host:
return "douyin"
return "tiktok"
def fetch_video_info(api_key: str, share_url: str) -> dict:
platform = platform_from_url(share_url)
endpoint = f"https://api.tikhub.io/api/v1/{platform}/app/v3/fetch_one_video_by_share_url"
headers = {"Authorization": f"Bearer {api_key}", "Accept": "*/*"}
params = {"share_url": share_url}
with httpx.Client(timeout=30.0, follow_redirects=True) as client:
resp = client.get(endpoint, headers=headers, params=params)
resp.raise_for_status()
return resp.json()
def safe_raw_filename(raw: dict, share_url: str) -> str:
data = raw.get("data") or {}
detail = data.get("aweme_detail")
if not detail and data.get("aweme_details"):
detail = (data.get("aweme_details") or [None])[0]
aid = (detail or {}).get("aweme_id") or "unknown"
rid = (raw.get("request_id") or "noreq").replace("-", "")
rid = rid[:16] if len(rid) > 16 else rid
if aid == "unknown":
h = hashlib.sha256(share_url.encode("utf-8")).hexdigest()[:10]
return f"raw_unknown_{h}_{rid}.json"
return f"raw_{aid}_{rid}.json"
def save_raw_json(raw: dict, share_url: str, raw_dir: str) -> str:
os.makedirs(raw_dir, exist_ok=True)
path = os.path.join(raw_dir, safe_raw_filename(raw, share_url))
with open(path, "w", encoding="utf-8") as f:
json.dump(raw, f, ensure_ascii=False, indent=2)
return path
def extract_clean_data(raw: dict) -> dict:
data = raw.get("data", {})
detail = data.get("aweme_detail")
if not detail and data.get("aweme_details"):
detail = data["aweme_details"][0]
if not detail:
return {}
video = detail.get("video", {})
play = video.get("play_addr", {}) or {}
url_list = play.get("url_list") or []
video_url = url_list[0] if url_list else ""
author = detail.get("author", {}) or {}
stats = detail.get("statistics", {}) or {}
return {
"id": detail.get("aweme_id", ""),
"desc": detail.get("desc", ""),
"author_name": author.get("nickname", ""),
"create_time": detail.get("create_time", 0),
"video_url": video_url,
"like_count": stats.get("digg_count", 0),
"comment_count": stats.get("comment_count", 0),
"share_count": stats.get("share_count", 0),
"play_count": stats.get("play_count", 0),
"duration": video.get("duration", 0),
"width": play.get("width", 0),
"height": play.get("height", 0),
}
def download_file(url: str, output_path: str) -> None:
headers = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
)
}
with httpx.Client(timeout=httpx.Timeout(60.0, read=300.0), follow_redirects=True) as client:
with client.stream("GET", url, headers=headers) as r:
r.raise_for_status()
with open(output_path, "wb") as f:
for chunk in r.iter_bytes(chunk_size=8192):
if chunk:
f.write(chunk)
def metrics_dict(info: dict) -> Dict[str, Any]:
return {
"video_id": info["id"],
"author": info.get("author_name", ""),
"likes": int(info.get("like_count") or 0),
"comments": int(info.get("comment_count") or 0),
"shares": int(info.get("share_count") or 0),
"plays": int(info.get("play_count") or 0),
"duration_ms": int(info.get("duration") or 0),
"resolution": f"{int(info.get('width') or 0)}x{int(info.get('height') or 0)}",
}
def run_one(
share_url: str,
out_dir: str,
raw_dir: str,
save_raw: bool,
) -> int:
if not API_KEY:
print(
"Error: no valid TikHub API key is available. Stop this task and ask the human user "
"to apply for a paid personal API key at https://tikhub.io/ and fill it into "
"/AWorld/aworld-skills/tikhub_download/SKILL.md.",
file=sys.stderr,
)
return 3
os.makedirs(out_dir, exist_ok=True)
raw = fetch_video_info(API_KEY, share_url)
raw_path = None
if save_raw:
raw_path = save_raw_json(raw, share_url, raw_dir)
print("Raw API JSON:", raw_path)
info = extract_clean_data(raw)
if not info or not info.get("id") or not info.get("video_url"):
print("Error: parse failed or video_url missing.", file=sys.stderr)
print("Raw code/message:", raw.get("code"), raw.get("message"), file=sys.stderr)
return 4
base = f"{platform_from_url(share_url)}_{clean_name(info.get('author_name'))}_{info['id']}"
output_path = os.path.join(out_dir, f"{base}.mp4")
download_file(info["video_url"], output_path)
print("Download OK")
print("Saved:", output_path)
print("\n=== Metrics ===")
m = metrics_dict(info)
for k, v in m.items():
print(f"{k}: {v}")
return 0
def process_one_job(
share_url: str,
out_dir: str,
raw_dir: str,
save_raw: bool,
) -> Dict[str, Any]:
try:
raw = fetch_video_info(API_KEY, share_url)
raw_path = None
if save_raw:
raw_path = save_raw_json(raw, share_url, raw_dir)
info = extract_clean_data(raw)
if not info or not info.get("id") or not info.get("video_url"):
return {
"ok": False,
"url": share_url,
"raw_json_path": raw_path,
"error": f"parse failed code={raw.get('code')} msg={raw.get('message')}",
}
base = f"{platform_from_url(share_url)}_{clean_name(info.get('author_name'))}_{info['id']}"
output_path = os.path.join(out_dir, f"{base}.mp4")
download_file(info["video_url"], output_path)
return {
"ok": True,
"url": share_url,
"path": output_path,
"raw_json_path": raw_path,
"metrics": metrics_dict(info),
}
except Exception as e:
return {"ok": False, "url": share_url, "error": str(e), "raw_json_path": None}
def read_urls(path: str) -> List[str]:
urls: List[str] = []
with open(path, "r", encoding="utf-8") as f:
for line in f:
u = line.strip()
if u and not u.startswith("#"):
urls.append(u)
return urls
def run_batch(
urls_file: str,
out_dir: str,
max_workers: int,
raw_dir: str,
save_raw: bool,
) -> int:
if not API_KEY:
print(
"Error: no valid TikHub API key is available. Stop this task and ask the human user "
"to apply for a paid personal API key at https://tikhub.io/ and fill it into "
"/AWorld/aworld-skills/tikhub_download/SKILL.md.",
file=sys.stderr,
)
return 3
urls = read_urls(urls_file)
if not urls:
print("Error: no URLs in file.", file=sys.stderr)
return 2
os.makedirs(out_dir, exist_ok=True)
workers = max(1, min(MAX_WORKERS_CAP, max_workers, len(urls)))
results: List[Dict[str, Any]] = []
with concurrent.futures.ThreadPoolExecutor(max_workers=workers) as ex:
futs = [
ex.submit(process_one_job, u, out_dir, raw_dir, save_raw) for u in urls
]
for fut in concurrent.futures.as_completed(futs):
results.append(fut.result())
ok = sum(1 for r in results if r.get("ok"))
fail = len(results) - ok
print(f"\n=== Batch summary (workers={workers}, cap={MAX_WORKERS_CAP}) ===")
print(f"success={ok}, failed={fail}, total={len(urls)}")
if save_raw:
print(f"Raw API JSON directory: {raw_dir}")
for r in sorted(results, key=lambda x: x.get("url", "")):
if r.get("ok"):
print(f"\nOK {r['url']}")
print(f" video: {r['path']}")
if r.get("raw_json_path"):
print(f" raw: {r['raw_json_path']}")
print(f" metrics: {r['metrics']}")
else:
print(f"\nFAIL {r['url']}")
if r.get("raw_json_path"):
print(f" raw (if any): {r['raw_json_path']}")
print(f" error: {r.get('error', 'unknown')}")
return 0 if fail == 0 else 1
def build_parser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(
description="TikHub single-file downloader (one URL or batch file)."
)
sub = p.add_subparsers(dest="command", required=True)
p_one = sub.add_parser("one", help="Download one video by URL")
p_one.add_argument("url", help="TikTok or Douyin share/video URL")
p_one.add_argument(
"-o",
"--output-dir",
default=DEFAULT_OUT,
help=f"Video output directory (default: {DEFAULT_OUT})",
)
p_one.add_argument(
"--no-save-raw",
action="store_true",
help="Do not write full raw API response JSON to disk",
)
p_one.add_argument(
"--raw-dir",
default=DEFAULT_RAW_DIR,
help=f"Directory for raw API JSON, relative to cwd (default: {DEFAULT_RAW_DIR})",
)
p_batch = sub.add_parser("batch", help="Batch download from urls.txt (concurrent, max 10)")
p_batch.add_argument("urls_file", help="Text file: one URL per line")
p_batch.add_argument(
"-o",
"--output-dir",
default=DEFAULT_OUT,
help=f"Video output directory (default: {DEFAULT_OUT})",
)
p_batch.add_argument(
"--no-save-raw",
action="store_true",
help="Do not write full raw API response JSON to disk",
)
p_batch.add_argument(
"--raw-dir",
default=DEFAULT_RAW_DIR,
help=f"Directory for raw API JSON, relative to cwd (default: {DEFAULT_RAW_DIR})",
)
p_batch.add_argument(
"--max-workers",
type=int,
default=10,
help="Parallel jobs, 1–10 (default 10; hard cap 10)",
)
return p
def main() -> int:
parser = build_parser()
args = parser.parse_args()
save_raw = not args.no_save_raw
raw_dir = os.path.abspath(os.path.join(os.getcwd(), args.raw_dir))
if args.command == "one":
return run_one(
args.url.strip(),
os.path.expanduser(args.output_dir),
raw_dir,
save_raw,
)
if args.command == "batch":
mw = max(1, min(MAX_WORKERS_CAP, int(args.max_workers)))
return run_batch(
args.urls_file,
os.path.expanduser(args.output_dir),
mw,
raw_dir,
save_raw,
)
return 2
if __name__ == "__main__":
raise SystemExit(main())
Commands (tikhub_independent.py)
# one video; raw JSON -> ./tikhub_api_raw/raw_<aweme_id>_....json
python tikhub_independent.py one "https://www.tiktok.com/@mumumelon67/video/7484120063369415978"
python tikhub_independent.py one "URL" --no-save-raw
python tikhub_independent.py one "URL" --raw-dir ./my_raw_exports
python tikhub_independent.py batch urls.txt
python tikhub_independent.py batch urls.txt --max-workers 4 --raw-dir ./batch_raw
Concurrent batch: API call + optional raw write + download all run in parallel threads (cap 10). Raw files are written under --raw-dir (absolute path resolved from cwd).
Part B — postprocess_tikhub_raw.py (CSV + simplified JSON)
Save as postprocess_tikhub_raw.py (stdlib only).
Input: one raw file like raw_api_response.json, or a directory containing multiple raw_*.json / any *.json from this workflow.
Output in --out-dir (default: current working directory):
| File | Purpose |
|---|---|
tikhub_videos_summary.csv |
Flat columns for spreadsheets |
tikhub_videos_summary.json |
{"generated_from": ..., "videos": [ {...}, ... ]} simplified records |
Field selection follows TikHub_API_数据格式说明.md: ids, text, type, time, author strip, statistics, video basics, music/commerce/AIGC flags where present.
#!/usr/bin/env python3
"""
Post-process TikHub raw API JSON files -> CSV + simplified JSON (stdlib only).
Usage:
python postprocess_tikhub_raw.py --input raw_api_response.json
python postprocess_tikhub_raw.py --input ./tikhub_api_raw --out-dir .
See TikHub_API_数据格式说明.md for full field documentation.
"""
from __future__ import annotations
import argparse
import csv
import json
import os
import sys
from glob import glob
from typing import Any, Dict, List, Optional
def get_aweme_detail(raw: dict) -> Optional[dict]:
data = raw.get("data") or {}
d = data.get("aweme_detail")
if d:
return d
ads = data.get("aweme_details")
if isinstance(ads, list) and ads:
return ads[0]
return None
def first_url(obj: Any) -> str:
if not obj or not isinstance(obj, dict):
return ""
lst = obj.get("url_list")
if isinstance(lst, list) and lst:
return str(lst[0])
return ""
def simplify_raw(raw: dict, source_file: str) -> Dict[str, Any]:
params = raw.get("params") or {}
share_url = params.get("share_url") or ""
detail = get_aweme_detail(raw)
if not detail:
return {
"source_file": source_file,
"request_id": raw.get("request_id"),
"api_code": raw.get("code"),
"router": raw.get("router"),
"share_url": share_url,
"parse_ok": False,
"error": "missing data.aweme_detail / aweme_details",
}
stats = detail.get("statistics") or {}
author = detail.get("author") or {}
video = detail.get("video") or {}
play = video.get("play_addr") or {}
music = detail.get("music") or {}
aigc = detail.get("aigc_info") or {}
hashtag_names = []
for x in detail.get("text_extra") or []:
if isinstance(x, dict) and x.get("hashtag_name"):
hashtag_names.append(x["hashtag_name"])
commerce = detail.get("commerce_info")
commerce_min = None
if isinstance(commerce, dict):
commerce_min = {
k: commerce.get(k)
for k in ("auction_ad_invited", "ad_source", "adv_promotable")
if k in commerce
}
if not commerce_min:
commerce_min = {"_present": True}
row: Dict[str, Any] = {
"source_file": os.path.basename(source_file),
"parse_ok": True,
"request_id": raw.get("request_id"),
"api_code": raw.get("code"),
"api_message": raw.get("message"),
"router": raw.get("router"),
"share_url": share_url,
"cache_url": raw.get("cache_url"),
"aweme_id": detail.get("aweme_id"),
"desc": (detail.get("desc") or "")[:2000],
"create_time_unix": detail.get("create_time"),
"aweme_type": detail.get("aweme_type"),
"content_desc": detail.get("content_desc") or "",
"author_nickname": author.get("nickname"),
"author_unique_id": author.get("unique_id"),
"author_sec_uid": author.get("sec_uid"),
"author_uid": author.get("uid"),
"author_follower_count": author.get("follower_count"),
"author_following_count": author.get("following_count"),
"author_aweme_count": author.get("aweme_count"),
"author_verification_type": author.get("verification_type"),
"author_account_region": author.get("account_region"),
"author_commerce_user_level": author.get("commerce_user_level"),
"digg_count": stats.get("digg_count"),
"comment_count": stats.get("comment_count"),
"share_count": stats.get("share_count"),
"play_count": stats.get("play_count"),
"collect_count": stats.get("collect_count"),
"download_count": stats.get("download_count"),
"forward_count": stats.get("forward_count"),
"whatsapp_share_count": stats.get("whatsapp_share_count"),
"video_duration_ms": video.get("duration"),
"video_width": play.get("width"),
"video_height": play.get("height"),
"video_ratio": video.get("ratio"),
"video_has_watermark": video.get("has_watermark"),
"cover_url": first_url(video.get("cover")),
"play_url_sample": first_url(play),
"music_id": music.get("id_str") or music.get("mid"),
"music_title": music.get("title"),
"music_author": music.get("owner_nickname") or music.get("author"),
"music_is_original_sound": music.get("is_original_sound"),
"music_is_commerce_music": music.get("is_commerce_music"),
"music_commercial_right_type": music.get("commercial_right_type"),
"music_user_count": music.get("user_count"),
"is_ads": detail.get("is_ads"),
"commerce_info_min": commerce_min,
"aigc_created_by_ai": aigc.get("created_by_ai"),
"aigc_label_type": aigc.get("aigc_label_type"),
"hashtags": "|".join(hashtag_names) if hashtag_names else "",
}
return row
def collect_inputs(path: str) -> List[str]:
if os.path.isfile(path):
return [path]
if os.path.isdir(path):
files = sorted(glob(os.path.join(path, "*.json")))
return files
raise FileNotFoundError(path)
def flatten_for_csv(row: Dict[str, Any]) -> Dict[str, Any]:
out = {}
for k, v in row.items():
if v is None:
out[k] = ""
elif isinstance(v, (dict, list)):
out[k] = json.dumps(v, ensure_ascii=False)
else:
out[k] = v
return out
def main() -> int:
ap = argparse.ArgumentParser(description="Raw TikHub JSON -> CSV + simplified JSON")
ap.add_argument("--input", "-i", required=True, help="One .json file or a directory of .json")
ap.add_argument(
"--out-dir",
"-o",
default=".",
help="Output directory (default: current working directory)",
)
args = ap.parse_args()
try:
files = collect_inputs(args.input)
except FileNotFoundError as e:
print("Input not found:", e, file=sys.stderr)
return 2
if not files:
print("No JSON files found.", file=sys.stderr)
return 2
out_dir = os.path.abspath(args.out_dir)
os.makedirs(out_dir, exist_ok=True)
csv_path = os.path.join(out_dir, "tikhub_videos_summary.csv")
json_path = os.path.join(out_dir, "tikhub_videos_summary.json")
videos: List[Dict[str, Any]] = []
for fp in files:
try:
with open(fp, "r", encoding="utf-8") as f:
raw = json.load(f)
except Exception as ex:
videos.append(
{
"source_file": os.path.basename(fp),
"parse_ok": False,
"error": f"json load: {ex}",
}
)
continue
videos.append(simplify_raw(raw, fp))
payload = {
"generated_from": os.path.abspath(args.input),
"video_count": len(videos),
"videos": videos,
}
with open(json_path, "w", encoding="utf-8") as f:
json.dump(payload, f, ensure_ascii=False, indent=2)
if videos:
flat = [flatten_for_csv(v) for v in videos]
fieldnames: List[str] = sorted({k for row in flat for k in row.keys()})
with open(csv_path, "w", encoding="utf-8", newline="") as f:
w = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
w.writeheader()
for row in flat:
w.writerow({k: row.get(k, "") for k in fieldnames})
print("Wrote:", csv_path)
print("Wrote:", json_path)
return 0
if __name__ == "__main__":
raise SystemExit(main())
Commands (postprocess_tikhub_raw.py)
From the directory where you want tikhub_videos_summary.* (e.g. project root or a report folder):
python postprocess_tikhub_raw.py --input ./tikhub_api_raw
python postprocess_tikhub_raw.py --input /path/to/raw_api_response.json --out-dir .
End-to-end workflow (for others)
pip install httpx- Copy Part A and Part B scripts next to each other (any folder; no dependency on this repo’s Python packages).
- Run
tikhub_independent.py(oneorbatch) sotikhub_api_raw/(or--raw-dir) contains full API responses — same idea asraw_api_response.json. - Run
postprocess_tikhub_raw.py --input <raw file or dir> --out-dir .→ gettikhub_videos_summary.csvandtikhub_videos_summary.jsonin the chosen working directory. - For field-level meaning of nested keys, open
TikHub_API_数据格式说明.mdin this repository (project root).
Troubleshooting
401/403: invalid key or missing scopes.- No valid API key configured in this skill: stop immediately and tell the human user to apply for a paid personal API key at https://tikhub.io/, then fill it into
/AWorld/aworld-skills/tikhub_download/SKILL.mdbefore retrying. 429: rate limit; in batch, reduce--max-workersor retry later.- No
video_url/ parse fail: video private, removed, or bad URL; a raw file may still be written if the HTTP response was JSON but content incomplete — checkapi_code/parse_okin post-process output. - Post-process
parse_ok: false: file is not a TikHubfetch_one_videopayload or damaged JSON. - Mainland TikTok: may need proxy (not in these scripts).
What this skill does not cover
User-profile crawling, non-URL workflows, image-only carousels as first-class exports, and any endpoint other than fetch_one_video_by_share_url.