image-translation-enhancer - SKILL.md Agent Skill

name: image-translation-enhancer description: 圖片文字翻譯品質增強工具。當使用者要求翻譯圖片中的文字時使用，特別是包含圖表、信息圖、簡報截圖等需要保持高品質排版的圖片。支援多語言翻譯、品質驗證、文字提取與重新渲染。支援直接圖片檔案(PNG/JPG)、PDF 中的圖片提取、掃描圖檔的影像預處理校正。適用場景：(1) 翻譯含有圖表的圖片並保持原始排版，(2) 需要品質檢驗確保翻譯後可讀性，(3) 多語言圖片本地化任務，(4) 從 PDF 提取並翻譯圖片，(5) 掃描文檔的預處理與翻譯。

圖片翻譯品質增強工具

本技能提供高品質的圖片文字翻譯流程,解決傳統 OCR 翻譯工具常見的品質問題。

支援的檔案格式

1. 直接圖片檔案

PNG, JPG, JPEG
對話框附加的圖片
本地圖片檔案

2. PDF 中的圖片

從 PDF 提取特定頁面
支援多頁 PDF 批次提取
可調整輸出解析度

3. 掃描圖檔

支援影像預處理（去噪、校正、增強）
自動傾斜校正
對比度增強
自動裁切空白區域

工作流程

標準流程（直接圖片檔案）

步驟 1: 環境準備與需求確認

確認目標語言(可接受參數,預設:繁體中文 zh-TW)
檢查並安裝必要的 Python 套件
了解圖片類型(圖表/海報/簡報等)

步驟 2: 文字提取

執行 scripts/extract_text.py 提取圖片中的文字:

python scripts/extract_text.py <input_image> --output <text_data.json>

輸出 JSON 格式包含:

文字內容
邊界框座標(x, y, width, height)
OCR 置信度分數
文字方向與角度

步驟 3: 翻譯處理

執行 scripts/translate_text.py 進行翻譯:

python scripts/translate_text.py <text_data.json> --target-lang zh-TW --output <translated_data.json>

參數說明:

--target-lang: 目標語言代碼(zh-TW/zh-CN/en/ja/ko 等)
--glossary: 可選,指定專業術語對照表路徑

步驟 4: 圖片重新渲染

執行 scripts/render_image.py 生成翻譯後的圖片:

python scripts/render_image.py <input_image> <translated_data.json> --output <output_image.png>

渲染處理:

智能移除原始文字(使用 inpainting 技術)
自動選擇適當字體(參考 references/font_guidelines.md)
調整文字大小以適應邊界框
保持顏色對比度與可讀性

步驟 5: 品質驗證

執行 scripts/quality_check.py 驗證輸出品質:

python scripts/quality_check.py <output_image.png> <translated_data.json> --report <quality_report.json>

驗證標準:

可讀性檢查
- 文字最小尺寸 ≥ 12pt
- 文字背景對比度 ≥ 4.5:1(WCAG AA)
- 字體邊緣清晰,無模糊或鋸齒
位置準確性
- 文字中心點偏移 ≤ 5%
- 對齊方式與原圖一致(左對齊/置中/右對齊)
完整性檢查
- 所有原始文字已翻譯
- 無文字截斷或重疊
- 無遺漏的文字區塊
輸出報告
- 整體狀態: PASS/FAIL
- 問題清單與嚴重程度
- 改進建議

步驟 6: 問題修正(如需要)

若品質檢查 FAIL,根據報告調整:

字體大小/類型調整
文字位置微調
背景修補改善
重新執行渲染與驗證

PDF 處理流程

步驟 0: 從 PDF 提取圖片

使用 scripts/pdf_and_preprocessing.py 提取 PDF 頁面:

python scripts/pdf_and_preprocessing.py extract-pdf <input.pdf> --output <pdf_images/> --dpi 300

參數說明:

--dpi: 輸出解析度,建議 300 DPI 以獲得最佳 OCR 效果

提取完成後,對每個圖片執行標準流程(步驟 2-6)

掃描圖檔處理流程

步驟 0: 影像預處理

對掃描或拍攝的圖片進行預處理:

# 完整預處理（去噪 + 校正 + 增強）
python scripts/pdf_and_preprocessing.py preprocess <scanned_image.jpg> \
  --output <preprocessed.png> \
  --operations denoise deskew enhance

# 僅傾斜校正
python scripts/pdf_and_preprocessing.py preprocess <image.jpg> \
  --output <corrected.png> \
  --operations deskew

# 二值化處理（適合黑白文檔）
python scripts/pdf_and_preprocessing.py preprocess <scan.jpg> \
  --output <binary.png> \
  --operations denoise binarize sharpen

可用的預處理操作:

denoise: 去噪
deskew: 自動傾斜校正
enhance: 對比度增強（CLAHE）
binarize: 二值化（適合掃描文檔）
sharpen: 銳化

預處理完成後,執行標準流程(步驟 2-6)

步驟 0.5: 自動裁切空白區域（可選）

python scripts/pdf_and_preprocessing.py crop <preprocessed.png> \
  --output <cropped.png> \
  --margin 10

字體選擇指南

選擇字體時考慮:

場景類型: 圖表用無襯線字體(思源黑體/Noto Sans CJK),文檔用襯線字體
字重匹配: 原圖粗體對應 Bold/Heavy,細體對應 Light/Regular
風格一致: 現代設計用幾何字體,傳統內容用宋體/明體

詳細字體清單與範例見 references/font_guidelines.md

進階功能

批次處理

處理多張圖片:

python scripts/batch_translate.py <input_folder> --target-lang zh-TW --output <output_folder> --quality-check

完整 PDF 翻譯工作流程

# 1. 提取 PDF
python scripts/pdf_and_preprocessing.py extract-pdf document.pdf --output pdf_pages/

# 2. 批次翻譯
python scripts/batch_translate.py pdf_pages/ --target-lang zh-TW --output translated_pages/

# 輸出: 每頁的翻譯圖片 + 品質報告

掃描文檔翻譯工作流程

# 1. 預處理掃描圖片
python scripts/pdf_and_preprocessing.py preprocess scan.jpg \
  --output preprocessed.png \
  --operations denoise deskew enhance

# 2. 翻譯
python scripts/extract_text.py preprocessed.png --output text.json
python scripts/translate_text.py text.json --target-lang zh-TW --output translated.json
python scripts/render_image.py preprocessed.png translated.json --output final.png
python scripts/quality_check.py final.png translated.json --report report.json

自定義翻譯規則

編輯 references/translation_rules.json:

{
  "glossary": {
    "Mechanization": "機械化",
    "Electrification": "電氣化"
  },
  "preserve_terms": ["AI", "API", "CPU"],
  "domain": "technical"
}

常見問題排查

參考 references/troubleshooting.md:

OCR 識別率低 → 提高圖片解析度或使用預處理
字體渲染失敗 → 檢查系統字體安裝
品質檢查失敗 → 調整渲染參數或手動微調
PDF 提取失敗 → 檢查 poppler-utils 是否安裝
掃描圖片傾斜 → 使用 deskew 操作自動校正