beyond-translation-cross-cultural-meme - SKILL.md Agent Skill

name: "beyond-translation-cross-cultural-meme" description: "Cross-cultural meme transcreation using a three-stage hybrid pipeline (cultural analysis, visual generation, assembly) that preserves humor and communicative intent while adapting culture-specific references between languages. Triggers: 'transcreate this meme', 'adapt meme for Chinese audience', 'convert meme to US culture', 'cross-cultural meme adaptation', 'localize this meme for another culture', 'meme cultural translation'"

Cross-Cultural Meme Transcreation

This skill enables Claude to guide the implementation of a hybrid meme transcreation pipeline that adapts memes between cultures -- not by translating text, but by replacing culture-specific humor, visual references, and linguistic conventions with equivalents that resonate in the target culture. Based on the MemeXGen framework, the pipeline separates culture-invariant elements (core emotion, humor mechanism, communicative intent) from culture-specific elements (idioms, visual symbols, pop culture references, text style) and handles each independently across three stages: cultural analysis with caption generation, visual template generation, and final meme assembly.

When to Use

When a user wants to adapt a meme from one culture to another (e.g., Chinese to US, US to Chinese, or other culture pairs)
When building a pipeline that takes a meme image as input and produces a culturally adapted version for a different audience
When implementing automated meme localization for social media platforms serving multilingual communities
When evaluating whether a meme's humor and intent survived cross-cultural adaptation using structured quality metrics
When a user asks to analyze what makes a meme culture-specific and identify which elements need adaptation vs. preservation
When building a dataset of paired memes across cultures for research or content moderation

Key Technique

The MemeXGen framework treats meme adaptation as transcreation rather than translation. Translation assumes a one-to-one mapping of words; transcreation acknowledges that humor, cultural references, and visual conventions must be recreated in the target culture. The core insight is a strict separation: identify what is universal (the emotional payload, the humor type -- sarcasm, exaggeration, irony) and what is culture-bound (specific idioms, celebrity references, visual templates, text density conventions). Only culture-bound elements get replaced; universal elements are preserved as anchors.

The framework operates in three sequential stages. Stage 1 (Cultural Analysis) uses a vision-language model (LLaVA 1.6 13B in the paper) to analyze the source meme, extract its communicative intent, identify culture-specific references, map those references to target-culture equivalents, and generate a transcreated caption plus visual recommendations. Stage 2 (Visual Generation) feeds those visual recommendations to an image generation model (FLUX.1 Schnell) to produce a culture-appropriate visual template. Stage 3 (Assembly) composites the transcreated caption onto the generated visual using culture-aware typography -- Chinese memes use denser text layouts, while English memes use more spaced, larger fonts with Impact-style conventions.

A critical finding is directional asymmetry: US-to-Chinese transcreation scores 4.48/5.0 while Chinese-to-US scores 3.93/5.0. This gap arises because US memes rely on globally recognizable templates while Chinese memes depend on context-specific wordplay and implicit cultural concepts that lack direct Western equivalents. Any implementation must account for this asymmetry by investing more effort in the cultural analysis stage when adapting from high-context cultures.

Step-by-Step Workflow

Ingest the source meme -- Accept a meme image and its source culture label (e.g., "US", "Chinese"). If the meme contains text, run OCR or use a VLM to extract it. Store the raw image, extracted text, and culture label as structured input.
Perform cultural analysis -- Prompt a vision-language model with the meme image and instruct it to output: (a) a description of all culture-specific references (idioms, celebrities, visual symbols, meme templates), (b) the core communicative intent in one sentence, (c) the humor mechanism (sarcasm, wordplay, exaggeration, irony, absurdism), (d) the emotional tone (joy, anger, sadness, fear, disgust, surprise) with intensity 1-5, and (e) which elements are universal vs. culture-bound. Use temperature 0.7 for balanced creativity.
Map culture-specific elements to target equivalents -- For each culture-bound element identified in step 2, prompt the model to propose a target-culture equivalent. For example: a reference to "996 work culture" maps to "hustle culture" or "quiet quitting"; a Weibo-style reaction maps to a Twitter/Reddit-style reaction. Require the model to justify each mapping with a brief rationale.
Generate the transcreated caption -- Using the intent, humor mechanism, emotional tone, and mapped equivalents, prompt the model to write a meme caption in the target language that: preserves the original emotional effect, uses natural meme language conventions of the target culture (not formal/literary register), and incorporates the mapped cultural references. Avoid literal translation at all costs.
Produce visual recommendations -- From the cultural analysis, generate a structured description of what the target meme's visual should depict: character types, scene setting, composition style, and any culture-specific visual conventions (e.g., Chinese memes often use cartoon/anime-derived characters; US memes favor reaction photos and established templates).
Generate the target visual -- Feed the visual recommendations into an image generation model (e.g., FLUX, SDXL, or DALL-E). Specify 1024x1024 resolution and meme-appropriate style (bold, clear subjects, simple backgrounds). If the source meme uses a well-known template with a target-culture equivalent, reference that template explicitly.
Assemble the final meme -- Composite the transcreated caption onto the generated visual. Apply culture-aware typography: for Chinese targets, use denser text with appropriate CJK fonts (e.g., Noto Sans CJK); for English targets, use Impact or similar bold sans-serif with stroke outlines. Position text according to target culture conventions (top/bottom for US Impact-style, overlaid or sidebar for many Chinese formats).
Evaluate transcreation quality -- Score the output on six dimensions (each 1-5): Caption Quality (clarity, tone, meme-appropriateness), Image Quality (visual clarity, composition), Synergy (image-text coherence), Cultural Fit (relatability for target audience), Intent Preservation (fidelity to original message), and Overall Score (average). Use a VLM evaluator (Qwen-VL-Max shows strongest human correlation at r=0.926) or human reviewers.
Iterate on failures -- If any dimension scores below 3.0, diagnose the failure mode: formal speech dampening humor (rewrite caption in casual register), visual disconnect from caption (regenerate visual with more specific prompt), or humor mechanism lost (re-analyze source and try alternative cultural mapping). Re-run from the failed stage.

Concrete Examples

Example 1: US meme to Chinese adaptation

User: "Adapt this 'distracted boyfriend' meme about choosing pizza over salad for a Chinese audience."

Approach:

Analyze source: US "distracted boyfriend" template; humor mechanism = irony/relatable desire; intent = "we all prefer unhealthy comfort food despite knowing better"; culture-bound elements = the specific template, pizza vs. salad dichotomy
Map equivalents: Pizza/salad -> hotpot/diet food or milk tea/plain water (common Chinese meme dichotomy). Template -> Chinese equivalent could use the "choosing between two buttons" meme format popular on Weibo, or adapt the boyfriend template since it has some global recognition
Generate caption: "明知道该喝白开水，还是选择了奶茶" (Knowing I should drink plain water, still chose milk tea)
Generate visual: Two-button meme format with exaggerated sweating character, one button labeled "健康饮食" (healthy eating), other labeled "奶茶外卖" (milk tea delivery)
Assemble with dense Chinese text overlay, rounded font style common in Chinese meme culture

Output: A meme using a familiar Chinese format with the milk tea vs. healthy choice framing that preserves the ironic self-awareness humor of the original while using references that resonate with Chinese internet culture.

Example 2: Chinese meme to US adaptation

User: "This Chinese meme shows a student saying '我太难了' (I'm so tired/it's so hard) with a crying face about exam season. Adapt it for US audiences."

Approach:

Analyze source: "我太难了" is a viral Chinese internet phrase expressing exaggerated exhaustion; humor mechanism = hyperbolic self-deprecation; intent = commiserating about academic pressure; culture-bound = the specific catchphrase, Chinese exam culture (gaokao associations)
Map equivalents: "我太难了" -> "This is fine" or "I'm in danger" energy; gaokao pressure -> finals week; the crying face aesthetic -> established US reaction image (e.g., crying cat, "it's fine" dog in fire)
Generate caption: "me during finals week pretending everything is fine" or "nobody: / me at 3am the night before the exam:"
Generate visual: "This is fine" dog-in-fire style scene or crying reaction face in a library/dorm setting with textbooks
Assemble with Impact font, top-bottom text placement

Output: A US-format meme using "this is fine" energy with finals week context that preserves the hyperbolic self-deprecation while using formats and references native to US meme culture.

Example 3: Implementing the evaluation pipeline

User: "I need to evaluate whether my meme transcreations are good. Build me an evaluation script."

Approach:

Define the six evaluation dimensions as a structured schema
For each meme pair (source + transcreation), prompt a VLM with both images and a rubric for each dimension
Collect scores on the 1-5 Likert scale per dimension
Compute overall score as the mean across dimensions
Flag any dimension below 3.0 for manual review
Track directional asymmetry by comparing mean scores grouped by transcreation direction

Output:

import json
from dataclasses import dataclass

@dataclass
class TranscreationEval:
    caption_quality: float  # Clarity, tone, meme-appropriateness
    image_quality: float    # Visual clarity, composition
    synergy: float          # Image-text coherence
    cultural_fit: float     # Relatability for target audience
    intent_preservation: float  # Fidelity to original message

    @property
    def overall(self) -> float:
        dims = [self.caption_quality, self.image_quality,
                self.synergy, self.cultural_fit, self.intent_preservation]
        return sum(dims) / len(dims)

    @property
    def failure_dimensions(self) -> list[str]:
        return [name for name, val in {
            "caption_quality": self.caption_quality,
            "image_quality": self.image_quality,
            "synergy": self.synergy,
            "cultural_fit": self.cultural_fit,
            "intent_preservation": self.intent_preservation,
        }.items() if val < 3.0]

EVAL_PROMPT = """Rate this meme transcreation on a 1-5 scale for each dimension.
Source culture: {source_culture} | Target culture: {target_culture}

Dimensions:
1. Caption Quality: Is the caption clear, natural meme language, appropriate tone?
2. Image Quality: Is the visual clear, well-composed, recognizable?
3. Synergy: Do image and text work together to convey humor/emotion?
4. Cultural Fit: Would the target audience find this relatable and natural?
5. Intent Preservation: Does this preserve the original meme's message and feeling?

Return JSON: {{"caption_quality": N, "image_quality": N, "synergy": N,
"cultural_fit": N, "intent_preservation": N}}"""

def evaluate_pair(vlm_client, source_img, target_img,
                  source_culture: str, target_culture: str) -> TranscreationEval:
    prompt = EVAL_PROMPT.format(
        source_culture=source_culture, target_culture=target_culture
    )
    response = vlm_client.analyze(
        images=[source_img, target_img], prompt=prompt
    )
    scores = json.loads(response)
    return TranscreationEval(**scores)

Best Practices

Do: Separate culture-invariant elements (emotion, humor type, intent) from culture-specific elements (idioms, references, visual templates) before attempting any adaptation. This separation is the foundation of quality transcreation.
Do: Use natural meme register in the target language. Memes that sound formal or literary fail immediately. Test captions by asking: "Would someone actually post this unironically on social media?"
Do: Account for directional asymmetry. When transcreating from high-context cultures (Chinese, Japanese, Korean) to lower-context cultures (US, Western European), allocate extra effort to the cultural mapping step since implicit cultural concepts often lack direct equivalents.
Do: Evaluate on all six dimensions independently. A meme can have a perfect caption but fail on synergy if the visual doesn't match, or score well on image quality but fail on cultural fit.
Avoid: Literal translation of meme text. "我太难了" is not "I am too difficult." Transcreation means recreating the communicative effect, not translating words.
Avoid: Using the same visual template across cultures without checking whether it resonates. The "distracted boyfriend" template has global recognition, but many templates are culture-specific (e.g., Chinese "内涵图" implicit humor images have no US equivalent format).
Avoid: Assuming VLM evaluation is sufficient alone. Qwen-VL-Max correlates well with humans (r=0.926), but most VLMs show weak correlation (r<=0.33) and exhibit conservative scoring bias (averaging 0.54 points below human scores). Always validate with human reviewers for high-stakes applications.

Error Handling

OCR/text extraction fails on source meme: Fall back to VLM-based text extraction by prompting the model to read all text in the image. If text is heavily stylized, ask the user to provide the text manually.
No clear cultural equivalent exists: When a culture-specific reference has no natural mapping (e.g., Chinese philosophical concepts like "内卷" involution), use an explanatory adaptation: find a target-culture concept that captures the feeling even if the reference differs (e.g., "rat race" or "hustle culture" for 内卷).
Generated visual doesn't match meme aesthetics: Memes have a distinct low-fi, bold aesthetic. If the image generator produces photorealistic or overly polished output, add style modifiers to the prompt: "internet meme style, bold simple composition, reaction image format."
Humor mechanism is untranslatable: Wordplay and puns rarely survive transcreation. When the source humor depends entirely on linguistic wordplay, pivot to the underlying emotion and find a target-culture joke format that evokes the same feeling, even if the joke structure changes completely.
Synergy score is low despite good caption and image: This usually means the text and image were generated independently without enough cross-referencing. Re-run stage 1 with the generated image as additional context, asking the model to refine the caption to better match the visual.

Limitations

Language pair coverage: The MemeXGen framework and dataset focus on Chinese-US meme pairs. Adaptation to other culture pairs (e.g., Japanese-Brazilian, Arabic-French) requires new cultural mapping knowledge and potentially different visual generation styles.
Rapidly evolving meme culture: Meme formats, references, and slang change weekly. A pipeline trained or prompted with static cultural knowledge will produce dated output. Any production system needs regular updates to its cultural reference mappings.
Implicit/philosophical humor: Memes relying on deep cultural context, historical references, or philosophical concepts (common in Chinese internet humor) remain the hardest category, with Chinese-to-US transcreation scoring significantly lower than the reverse direction.
Visual generation fidelity: Current image generators struggle to reproduce the specific lo-fi, template-based aesthetic of internet memes. Generated visuals may look too polished or miss the intentionally crude style that is part of meme humor.
Evaluation subjectivity: Even among bilingual, bicultural evaluators, inter-annotator correlation ranges from r=0.58 to r=0.81. Humor is inherently subjective, and what counts as "culturally appropriate" varies between individuals within the same culture.
VLM bias: Vision-language models have stronger training data exposure to US/Western internet culture, contributing to the directional asymmetry. This bias is structural and won't be resolved by prompt engineering alone.

Reference

Beyond Translation: Cross-Cultural Meme Transcreation with Vision-Language Models (Zhao, Zhang, Ignat, 2026). Look for: the three-stage hybrid pipeline architecture (Section 3), the six-dimension evaluation rubric (Section 4), directional asymmetry analysis (Section 5), and success/failure pattern taxonomy (Section 6). Code and dataset: github.com/AIM-SCU/MemeXGen.