x-article-getxapi-fallback - SKILL.md Agent Skill

name: x-article-getxapi-fallback description: Fallback approach for retrieving X/Twitter long-form article bodies (x.com/i/article/...) using GetXAPI when xurl cannot access them. GetXAPI bypasses the X auth wall and returns full article content as structured JSON. trigger: | When xurl read returns metadata-only or "Could not find tweet" for an X article URL (x.com/i/article/... or x.com/user/status/NNN where the post is just an article link). environment: env_var: GETXAPI_KEY api_base: https://api.getxapi.com

GetXAPI Fallback for X Articles

X Articles (long-form posts at x.com/i/article/<id>) are not regular tweets from the X API's perspective. The X API v2 /tweets endpoint cannot resolve article IDs, and xurl returns either:

"Could not find tweet with id" if you query the article ID directly
Metadata-only (title, author, engagement) without the body if you query the parent tweet

GetXAPI (api.getxapi.com) provides a dedicated endpoint that returns the full article body as structured JSON blocks (headings, paragraphs, lists, images, inline styles).

Command

curl -s -H"Authorization: Bearer $GETXAPI_KEY" \
  "https://api.getxapi.com/twitter/tweet/article?id=<TWEET_ID>"

Where <TWEET_ID> is the parent tweet's ID (the tweet that links to/contains the article), NOT the article ID.

Response Structure

{
  "status": "success",
  "msg": "success",
  "article": {
    "id": "ArticleEntity:...",
    "author": {
      "userName": "willccbb",
      "name": "will brown",
      "id": "3064259332",
      "isBlueVerified": true,
      "followers": 43172,
      "description": "reward hacking @primeintellect"
    },
    "replyCount": 41,
    "likeCount": 1805,
    "viewCount": 441884,
    "createdAt": "Fri May 01 02:23:07 +0000 2026",
    "title": "On SFT, RL, and on-policy distillation",
    "preview_text": "...",
    "cover_media_img_url": "https://pbs.twimg.com/media/...",
    "contents": [
      {"type": "header-two", "text": "§1 — Section title"},
      {"type": "unstyled", "text": "Paragraph text"},
      {"type": "unordered-list-item", "text": "List item"},
      {"type": "image", "url": "...", "width": 1574, "height": 1306}
    ]
  }
}

Content Block Types

type	Meaning	Usage
`header-two`	Section heading	`## <text>` in markdown
`unstyled`	Paragraph text	Plain paragraph
`unordered-list-item`	Bullet point	`- <text>` with raw inline styles
`image`	Embedded image	Omit in text version (include URL in frontmatter)
`atomic`	Entity block (tables, embeds)	Usually empty text, skip

Inline Style Ranges

Each unstyled block may have inlineStyleRanges:

"inlineStyleRanges": [
  {"length": 49, "offset": 0, "style": "Bold"},
  {"length": 103, "offset": 0, "style": "Italic"}
]

Available styles: "Bold", "Italic".

Retrieval Strategy — Five Tiers

When encountering an X Article URL (x.com/user/status/NNN linking to x.com/i/article/...):

Tier 1: xurl metadata (fast, free)

xurl read <TWEET_ID>

Returns: article title, author info, engagement metrics (likes, retweets, bookmarks, impressions). Does NOT return body text. The article field will have {title} only — no plain_text. Useful for quick triage and raw article frontmatter.

CRITICAL: xurl read ≠ xurl --auth oauth2 "/2/tweets/ID?tweet.fields=article". Tier 1 (this) only gets metadata. Tier 1.5 (below) gets the full body. The difference is --auth oauth2 and explicit ?tweet.fields=article.

Tier 1.5: xurl with article fields (fast, full body via OAuth2) ⭐ PREFERRED

xurl --auth oauth2 "/2/tweets/<TWEET_ID>?tweet.fields=article"

Returns: article title + full plain_text body + preview_text + cover_media + entities (referenced tweets, URLs). This is the most reliable and fastest way to get the complete article body without a third-party API. Requires OAuth2 user authentication.

The response structure:

{
  "data": {
    "article": {
      "title": "...",
      "plain_text": "Full article body text here...",
      "preview_text": "I asked a simple question...",
      "cover_media": "3_...",
      "entities": { "tweets": [...], "urls": [...] }
    },
    "text": "https://t.co/...",
    "id": "2056598291316634079"
  }
}

Key distinction: This Tier is --auth oauth2 + explicit tweet.fields=article. The simpler xurl read <ID> (Tier 1) returns article: {title} without plain_text. The fetch_x_bookmarks.py script at ~/.hermes/scripts/fetch_x_bookmarks.py uses this exact pattern in its fetch_article_body() function (line 103-129).

Tier 2: web_extract preview (medium effort, partial body)

web_extract(url="https://x.com/user/status/<TWEET_ID>")

Returns: first several paragraphs of the article text plus metadata. Good enough for wiki summaries and concept extraction when full text isn't critical. Note: This Tier is a fallback when Tier 1.5 fails (e.g., OAuth2 auth not working, or the article endpoint returns HTTP 500). Tier 1.5 is preferred as it returns the complete body.

Tier 3: GetXAPI full body (complete, requires API key)

Use when you need the full structured article content with section headings, inline styles, and image URLs. See the Workflow section below.

If GETXAPI_KEY is not set, skip Tier 3 and go directly to Tier 4.

Tier 4: Secondary source discovery (when GetXAPI unavailable)

When direct retrieval fails (no GETXAPI key, browser unavailable, cookie wall, nitter down), use web_search to find news outlets or blogs that published summaries or translations of the article:

web_search(query="<author name> \"<article title>\" <year>")

Effective search patterns:

"<exact article title>" <author handle> — finds exact matches
<author name> <topic keywords> — finds broader coverage
Non-English outlets often publish comprehensive summaries (e.g., Chinese tech media like ABMedia 鏈新聞 for high-profile AI articles)

What to look for in search results:

News outlet summaries that reproduce key quotes and structure
Blog posts that analyze/reference the article with detailed excerpts
GitHub issues/discussions that quote substantial portions
Forum threads (Reddit, HN) with user summaries

Known secondary-source domains for coding-agent articles:

mem0.ai/blog — Mem0 regularly publishes detailed analysis of agent memory tools (Codex, LangChain, LlamaIndex). When an X Article covers agent memory/harness features, check mem0.ai first — they often republish with full technical depth and working code.
blog.langchain.com — Common mirror for agent engineering articles
zylos.ai/research — Deep architectural analysis of coding agent harnesses
abmedia.io — Chinese tech media with comprehensive English article summaries

After finding a secondary source:

web_extract the secondary article for full content
Mark the raw article with getxapi: false and source_fallback: secondary in frontmatter
Add summary_source: <url> and summary_source_name: <outlet name> to document where the reconstructed content came from
Note in the raw article body that content is reconstructed from secondary sources

Full retrieval chain (tried in order): Tier 1 (xurl metadata) → Tier 2 (web_extract tweet URL) → Tier 3 (GetXAPI) → browser_navigate → web_extract article URL → xcancel/nitter → Tier 4 (web_search → secondary source) ✅

Real example (Garry Tan "Meta-Meta-Prompting"):

Tiers 1-3 + browser + nitter all failed
web_search('Garry Tan "Meta-Meta-Prompting" AI Agents 2026') → found ABMedia (abmedia.io) Chinese summary
web_extract on ABMedia → comprehensive article with all key concepts, quotes, and architecture details
Result: produced a complete raw article and concept page from secondary source alone

Workflow

Step 0: Quick triage

xurl read <TWEET_ID>

If the response contains article.title, it's an X Article (not a regular tweet). Extract the title, author, and engagement metrics from this response.

Step 1: Extract the tweet ID from the URL

# URL pattern: https://x.com/user/status/2050038277454143918
tweet_id = url.split("/status/")[1].split("?")[0]

Step 2: Fetch article via curl

⚠️ CRITICAL: Use terminal tool with shell variable expansion, NOT execute_code. The execute_code tool does NOT inherit environment variables from ${HERMES_HOME}/.env — os.environ.get("GETXAPI_KEY") will return an empty string. The terminal tool does inherit shell environment.

Method A — Terminal (preferred, works reliably):

curl -s -H "Authorization: Bearer $GETXAPI_KEY" \
  "https://api.getxapi.com/twitter/tweet/article?id=<TWEET_ID>" \
  > /tmp/hermes_article.json

Then process the saved JSON with execute_code (no env var needed for file I/O):

import json
with open("/tmp/hermes_article.json") as f:
    data = json.load(f)
article = data["article"]

Method B — execute_code (only if you've confirmed the env var is available): First verify: echo ${GETXAPI_KEY:-(not set)} in terminal. If set, the key is in ${HERMES_HOME}/.env but execute_code still won't see it — use Method A.

Step 3: Convert to markdown for raw article

lines = []
for block in article["contents"]:
    t = block["type"]
    text = block.get("text", "").strip()
    if t == "header-two" and text:
        lines.append(f"\n## {text}\n")
    elif t == "unstyled" and text:
        lines.append(f"{text}\n")
    elif t == "unordered-list-item" and text:
        lines.append(f"- {text}")

markdown = "\n".join(lines)

Step 4: Update frontmatter

Mark the article with getxapi: true and source_fallback: false (to indicate the body WAS retrieved, unlike a pure metadata-only fallback).

Before/After Comparison

Aspect	xurl read (T1)	xurl + article fields (T1.5)	xurl + GetXAPI (T3+)
Article title	✅ `article.title`	✅ Same	✅ Same
Article body	❌ Not available	✅ Full `plain_text`	✅ With section structure
Author metadata	✅ Basic	✅ Basic	✅ Rich (followers, bio, blue check)
Engagement metrics	✅ like/RT/reply/bookmark	✅ Same	✅ Same + view count
Inline formatting	N/A	❌ Plain text only	✅ Bold/italic ranges
Embedded images	N/A	✅ Cover media ID	✅ URLs with dimensions
Cost	Free (X API credits)	Free (X API credits)	Paid (GetXAPI)
Reliability	Highest	High	Medium (depends on 3rd party)

Pitfalls

GetXAPI endpoint requires Authorization — As of 2026-05-19, api.getxapi.com/twitter/tweet/article returns {"error":"Missing Authorization header"} (HTTP 200, not 404). The endpoint is LIVE but requires Authorization: Bearer $GETXAPI_KEY. If GETXAPI_KEY is not set in environment, skip to Tier 4. If the endpoint returns 404 or 500, also skip to Tier 4. The GitHub repo xhuisme/getxapi may have updates on key provisioning.
HTTP 500 from X servers — When the X article itself returns HTTP 500 (X server-side error), ALL direct retrieval methods (xurl, web_extract, GetXAPI, browser) will fail. Go straight to Tier 4 (secondary source discovery via web_search).
GetXAPI KEY must be in environment — Use $GETXAPI_KEY which is stored in ${HERMES_HOME}/.env. Verify it's set before calling: echo ${GETXAPI_KEY:-(not set)}. If not set, skip Tier 3 and go directly to Tier 4 (secondary source discovery).
execute_code does NOT inherit .env variables — os.environ.get("GETXAPI_KEY") in Python returns "" because the execute_code sandbox has a clean environment. Always use terminal with shell variable expansion ($GETXAPI_KEY) for API calls. Save the JSON output to a temp file, then process it with execute_code for the conversion step.
Secondary sources vary in quality — Machine-translated summaries may miss nuance. Prefer English-language tech media or bilingual outlets. Always note summary_source and source_fallback: secondary in frontmatter.
Rate limits — GetXAPI has its own rate limits (typically generous but not unlimited). Cache results on first fetch.
Article vs tweet ID confusion — The endpoint takes the tweet ID (the status post that links to the article), NOT the article entity ID. If you pass the article ID, it will fail.
Image-only content — type: "image" blocks contain image URLs but no alt text. Just note the URL in the markdown or skip inline rendering.
Inline styles need manual reconstruction — inlineStyleRanges are offsets into the raw text. The most reliable approach is to skip complex style reconstruction in the saved raw article and just save the text as-is — the raw article is for reference, not publication.
Preview text duplication — The preview_text field often duplicates the first unstyled block. Don't double-include it.
Compliance note — GetXAPI is a third-party data broker. Content retrieved this way may have different ToS constraints than direct X API access.
Secondary sources vary in quality — Machine-translated summaries may miss nuance. Prefer English-language tech media or bilingual outlets. Always note summary_source and source_fallback: secondary in frontmatter.

Supporting References

references/canonical-blog-url-fallback.md — When an X Article bookmark links to content originally published on the author's blog (Substack, personal site, Medium), skip the X article wrapper and web_search → web_extract the canonical URL instead.
references/garry-tan-case-study.md — Full walkthrough of the 8-step retrieval chain that succeeded via Tier 4 secondary source discovery, with the Garry Tan "Meta-Meta-Prompting" article as a worked example.
references/mem0-secondary-source-case-study.md — Mem0 as a secondary source mirror for coding-agent memory X Articles, with the Codex Memory Pipeline article as a worked example.
references/thariq-claude-code-skills-case-study.md — Clean Tier 3 GetXAPI success path: web_extract → GetXAPI full body → raw article → concept page with mechanism + role taxonomy. Thariq Shihipar's "Lessons from Building Claude Code: How We Use Skills" as a worked example.