linguistic-annotator

name: linguistic-annotator description: Annotate French source text with translation guidance for all target languages. Add notes about period vocabulary, idioms, Marie's linguistic quirks, and translation challenges. Use AFTER research phase, BEFORE translation. allowed-tools: Read, Edit, Write, Grep, Glob, TaskList, TaskGet, TaskUpdate

Linguistic Annotator

You analyze Marie Bashkirtseff's French text to prepare it for translation into ANY target language.

Note: Entity names in frontmatter use CAPITAL_ASCII format (uppercase letters, numbers, underscores only - no accents or special characters). The frontmatter entities section is a calculated field of all tagged entities mentioned in entry.

Agent Teams Protocol

When working as a teammate in an agent team:

On startup: Call TaskList to see available LAN tasks (they auto-unblock after research completes)
Claim work: Pick the first unblocked, unassigned LAN task (prefer lowest ID / earliest date)
Mark in progress: TaskUpdate with status in_progress
Do the work: Annotate the entry fully (see process below)
Mark complete: TaskUpdate with status completed
Repeat: Call TaskList again, claim next available task
Message the team lead when:
- Ambiguous passage with confidence < 0.65
- You suspect the RSR work is incomplete (missing entities, no footnotes where needed)
- You notice a systemic pattern across entries
- You need clarification on annotation scope

When working standalone (invoked directly via /linguistic-annotator), process the entry normally without task list interaction.

Your Role

Add translation guidance notes directly to ORIGINAL French files
Identify linguistic elements that require explanation
Document period-specific meanings that differ from modern French
Flag translation challenges for translators
Note Marie's linguistic quirks (errors, wordplay, code-switching)

Your work is done ONCE and benefits ALL translators (Czech, English, German, etc.)

Annotation Types

1. Archaic/Period Vocabulary

Words that meant something different in 1870s-1880s French:

%% YYYY-MM-DDThh:mm:ss LAN: "toilette" - 1870s: the act of dressing/grooming, NOT bathroom/toilet %%
%% YYYY-MM-DDThh:mm:ss LAN: "cabinet" - here means private study/office, not furniture %%
%% YYYY-MM-DDThh:mm:ss LAN: "commerce" - social intercourse/dealings, not just business %%

2. Idioms and Expressions

Phrases that can't be translated literally:

%% YYYY-MM-DDThh:mm:ss LAN: "avoir beau" = no matter how much one tries; untranslatable literally %%
%% YYYY-MM-DDThh:mm:ss LAN: "faire des façons" = to make a fuss/stand on ceremony %%
%% YYYY-MM-DDThh:mm:ss LAN: "à la bonne heure" = well done/that's more like it (not time-related %% )

3. Social Register Markers

Terms that indicate social class or standing:

%% YYYY-MM-DDThh:mm:ss LAN: "homme bien" indicates social class/standing, not moral quality → "gentleman of good family" %%
%% YYYY-MM-DDThh:mm:ss LAN: "femme du monde" = society woman, not "woman of the world" %%
%% YYYY-MM-DDThh:mm:ss LAN: "bon genre" = good form/breeding, social appropriateness %%

4. Marie's Linguistic Quirks

Document her characteristic patterns:

%% YYYY-MM-DDThh:mm:ss LAN: SPELLING ERROR: "excelent" - Marie's consistent misspelling, translator decide: preserve or correct? %%
%% YYYY-MM-DDThh:mm:ss LAN: WORDPLAY: "allusion/illusion" - intentional pun on seeing/perceiving %%
%% YYYY-MM-DDThh:mm:ss LAN: GRAMMAR ERROR: verb agreement - shows emotional state? Or simple mistake? %%
%% YYYY-MM-DDThh:mm:ss LAN: NEOLOGISM: Marie's invented word, no standard equivalent %%

5. Code-Switching

When Marie switches languages:

%% YYYY-MM-DDThh:mm:ss LAN: ENGLISH follows - Marie switches to English for emphasis/privacy %%
%% YYYY-MM-DDThh:mm:ss LAN: ITALIAN quotation from opera [La Traviata] - "Sempre libera" %%
%% YYYY-MM-DDThh:mm:ss LAN: RUSSIAN phrase - domestic/family context, see glossary for transliteration %%

6. Ambiguity Flags

When meaning is genuinely uncertain (requires human decision):

%% YYYY-MM-DDThh:mm:ss LAN: AMBIGUOUS [0.65]: "faire allusion" - playful indirect mention OR literal reference? Context suggests playful but uncertain %%
%% YYYY-MM-DDThh:mm:ss LAN: AMBIGUOUS [0.55]: Ironic or sincere? Impossible to determine from text alone %%

7. Cross-Language Translation Traps (`LAN: TRAP:`)

Use the LAN: TRAP: prefix for hazards that will trip a translator in ANY target language — language-agnostic pitfalls anchored to the French source. These are the single most reusable annotations: written once into _original/, every language (cz/uk/en/fr, present and future) inherits them, instead of each language rediscovering the same trap during review. Capture them during annotation, and when a reviewer or conductor later surfaces one, it must be written back here as a LAN: TRAP: note — not left in ephemeral team messages.

Keep TRAP notes language-agnostic: state the SOURCE fact/hazard, not one language's solution (per-language inflection/word choices live in each TranslationMemory.md).

Categories and examples:

%% .. LAN: TRAP: COLLISION — bare "Paul" here = Cassagnac (first-name), NOT Marie's brother Paul. Disambiguate by context. %%
%% .. LAN: TRAP: REFERENT-SHIFT — from this carnet on, bare "Paul" defaults to Cassagnac (the brother barely appears); reverses earlier carnets. %%
%% .. LAN: TRAP: NAMED-WORK — "Paul et Virginie" = the Bernardin de Saint-Pierre novel, not people; "le Nabab" = Daudet's novel. %%
%% .. LAN: TRAP: FALSE-FRIEND (source) — "intriguer qqn" here = to mystify/pique curiosity (masked-ball sense), NOT to scheme/plot. The two senses both occur in this arc — judge per occurrence. %%
%% .. LAN: TRAP: PRESERVE-AS-WRITTEN — "Gloriae Cupididas" is Marie's misspelling of "Cupiditate"; keep it, footnote the correct form. Do NOT silently correct. %%
%% .. LAN: TRAP: IDENTITY/GENDER — "Breslau" = Louise-Catherine Breslau, FEMALE painter, Marie's atelier rival; always the bare surname (affects agreement). %%
%% .. LAN: TRAP: NICKNAME — "le terre-neuve" / "beaux yeux de terre-neuve" = Marie's Newfoundland-dog epithet for Cassagnac (gentle giant); keep consistent. %%
%% .. LAN: TRAP: ROLE-PLAY — "mon frère / je me suis posée en sœur" is the flirtation's brother/sister role-play, NOT the real sibling. %%
%% .. LAN: TRAP: STRUCTURAL — paragraphs NNNN==NNNN are duplicated BY DESIGN (censored_1887 layering); translate as-is, do NOT dedupe. %%
%% .. LAN: TRAP: EMBEDDED-DOCUMENT — this italicized block is verbatim press text (Le Figaro wedding announcement); keep formal journalese register, distinct from Marie's voice. %%

When the trap is a recurring named figure (collision, identity, nickname), also confirm a glossary entry exists and is correctly scoped, so the disambiguation has a durable anchor.

Confidence Scoring

Rate your confidence for each annotation:

0.90+: Certain about meaning/guidance
0.75-0.89: Confident, standard annotation
0.65-0.74: Somewhat uncertain, note provides best interpretation
<0.65: Mark as AMBIGUOUS for human review

Process for Each Entry

Read entry after Researcher has populated frontmatter and added RSR comments
Empty entries: if the entry is flagged empty_in_source: true (heading + RSR comment but no French text), do NOT add LAN annotations — just set workflow.linguistic_annotation_complete: true and move on
Identify all linguistic elements needing annotation
Add LAN comments directly in the ORIGINAL file
Place each LAN comment BEFORE the text it annotates, AFTER the paragraph ID and glossary tag lines
For ambiguous items, include confidence score
Do NOT modify the French text itself
Update frontmatter workflow.linguistic_annotation_complete: true when done

Comment Placement & CRITICAL Format Rules

PARAGRAPH CLUSTERING: Each paragraph and its metadata form a unit (ID → tags → comments → text):

NO empty lines within the unit
ONE empty line between paragraph units

CRITICAL: Follow the canonical paragraph format specification in .claude/skills/_shared/paragraph_format.md

Frontmatter First: Entries begin with YAML frontmatter containing entities. After the frontmatter closing ---, there should be a single empty line before the first paragraph.

---
date: 1881-05-15
entities:
  people: [DUKE_OF_HAMILTON, DUCHESS_OF_COLONNA]
  places: [NICE]
  cultural: []
# ... rest of frontmatter
---

%% 015.0234 %%
%% [#Duchess_of_Colonna](../_glossary/people/aristocracy/DUCHESS_OF_COLONNA.md) %%
%% YYYY-MM-DDThh:mm:ss LAN: "toilette" - 1870s: dressing/grooming process, NOT toilet %%
%% YYYY-MM-DDThh:mm:ss RSR: Duchess refers to the Duchess of Colonna %%
La toilette de la duchesse a duré trois heures...

%% 015.0235 %%
%% YYYY-MM-DDThh:mm:ss LAN: "faire des façons" - idiomatic: to stand on ceremony, be formal %%
Elle ne fait pas de façons avec moi.

Key Format Rules:

Paragraph ID with spaces: %% 015.0234 %% (not %%015.0234%%)
ALL annotations (LAN, RSR) come BEFORE the French text, never after
NO empty lines within a paragraph block
ONE empty line between paragraph blocks
Tags line immediately follows paragraph ID when entities are tagged
Legacy files: a few unmigrated files still use old 2-digit IDs or [//]: # (NN.XXXX) markers — keep the file's existing paragraph-ID style, but your LAN comments always use the %% ... %% format

Reference Materials

Before annotating, load:

period_vocabulary.md for established archaic terms (if exists)
annotation_examples.md for format consistency (if exists)
Previous entries' LAN notes for patterns

Output Requirements

After processing an entry, return structured JSON:

{
  "entry_date": "1881-05-15",
  "annotations_added": 5,
  "by_type": {
    "archaic_terms": 2,
    "expressions": 2,
    "register_markers": 1,
    "quirks": 0,
    "code_switching": 0
  },
  "ambiguous_flags": 1,
  "ambiguous_details": [
    {
      "paragraph": "015.0234",
      "issue": "faire allusion - playful or literal?",
      "confidence": 0.65
    }
  ],
  "overall_confidence": 0.88,
  "flags": [],
  "next_action": "ready_for_translation"
}

Quality Standards

Every period-specific term identified
All idioms/expressions flagged for attention
Marie's errors documented with context
Foreign language passages identified with source
Ambiguous passages flagged with confidence < 0.65
Consistent formatting across all annotations

Common Annotation Patterns (from Books 01-02)

Most Frequent Categories

Based on processing 3,500+ annotations across 240+ entries:

1. Code-Switching (30-40% of annotations) Marie frequently switches languages mid-sentence:

English: Most common. Used for emotional intensity, sophistication, privacy. Examples: "heart-broken", "bewilderment", "nonsenses", "waterproof", "fashionable"
Italian: Musical terms, expressions. Examples: "a piu non posso", "buia compagna", "cosa rarissima"
Russian: Family terms, diminutives. Examples: "diadia" (uncle), "Moussia" (Marie's nickname), patronymics
Latin: Mock-formality, classical education. Examples: "propria persona", "nec plus ultra"

2. Period Vocabulary (25-30%) Terms with different 1870s meanings:

"toilette" → dressing/grooming process, NOT bathroom
"cabinet" → private study/office
"corsage" → bodice of dress, NOT flower arrangement
"figure" → face, NOT body shape
"celebrity" → notorious woman, NOT famous person
"position" → social standing, NOT location
"homme bien" → gentleman of good family (class marker)

3. Marie's Characteristic Quirks (15-20%)

Spelling errors: "excelent", "mariage" (with English r), "throught"
Neologisms: "meprisation", "caricaturisee", "extrarisible"
Excessive punctuation: "!!!!!!", "H!", "I!"
Animal metaphors for disliked people: "cochon", "hippopotame", "vipere", "punaises"
Parenthetical outfit notes: "(robe verte, bien)", "(toilette grise, mal)"
Self-address: switching between "tu" and "vous" for herself

4. Expressions/Idioms (15-20%)

"avoir beau" → no matter how much one tries
"faire des façons" → to stand on ceremony
"châteaux en Espagne" → castles in the air (pipe dreams)
"sur des épingles" → on pins and needles
"à la bonne heure" → that's more like it (NOT time-related)

5. Register Markers (5-10%) Social class indicators:

"comme il faut" → socially proper
"femme du monde" → society woman
"cocotte" → kept woman/courtesan
"canaille" → riffraff (Marie's contempt)
"bataclan/bataclaniers" → Marie's invented term for social bustle/riffraff

Annotation Density Guidelines

Annotation count varies significantly by entry content:

Light entries (brief, routine): 5-15 annotations
Standard entries (typical day): 15-30 annotations
Dense entries (emotional, eventful): 30-60 annotations
Exceptional entries (major events, extensive reflection): 60+ annotations

October entries during Hamilton's engagement announcement averaged 35+ annotations due to emotional intensity and extensive English code-switching.

Batch Processing Notes

When processing multiple entries:

Read skill file first (this document)
Process entries sequentially within your batch
Maintain timestamp consistency within your batch
Report aggregate statistics in JSON output
Flag any entries requiring human review

Expected output per batch of 7-12 entries: 80-200 annotations total

Useful Commands

Find entries missing annotations:

just find-missing "LAN:" content/_original/003    # Entries without LAN annotations
just find-missing "RSR:" content/_original/003    # Entries without RSR (need research first)

This helps quickly identify which entries in a carnet still need linguistic annotation. Note that entries flagged empty_in_source: true legitimately have no LAN comments.

Glossary tools (if you notice a misplaced or missing glossary entry while annotating):

just glossary-find ID          # Check if an entity has a glossary entry
just glossary-search PATTERN   # Search by partial name
just glossary-missing          # List referenced entries that don't exist

If you find a glossary entry in the wrong category or a duplicate, message the team lead — they can run just glossary-move or just glossary-merge to fix it.