qwen3-embedding-1-0-0 - SKILL.md Agent Skill

name: qwen3-embedding-1-0-0 description: Complete toolkit for Qwen3 Embedding models (0.6B, 4B, 8B) and Qwen3 Reranker models providing state-of-the-art text embedding, semantic search, reranking, and multilingual retrieval with support for Sentence Transformers, raw Transformers, vLLM, and TEI inference engines. Use when generating text embeddings, building semantic search or RAG pipelines, performing cross-lingual retrieval, code retrieval, text classification/clustering, or reranking search results with Qwen3-Embedding or Qwen3-Reranker models.

Qwen3 Embedding

Overview

Qwen3 Embedding is a family of text embedding and reranking models built on the Qwen3 foundation models, released by Alibaba's Qwen team under the Apache 2.0 license. The series provides state-of-the-art performance across text retrieval, code retrieval, text classification, text clustering, and bitext mining. The 8B embedding model ranks No.1 on the MTEB multilingual leaderboard (score 70.58 as of June 2025).

The series includes three sizes for both embedding and reranking:

Qwen3-Embedding-0.6B — 28 layers, 1024-dim embeddings, 32K context
Qwen3-Embedding-4B — 36 layers, 2560-dim embeddings, 32K context
Qwen3-Embedding-8B — 36 layers, 4096-dim embeddings, 32K context
Qwen3-Reranker-0.6B/4B/8B — cross-encoder reranking models, 32K context

Key features include Matryoshka Representation Learning (MRL) for flexible output dimensions, instruction-aware prompting for task-specific optimization, and support for 100+ languages including programming languages.

When to Use

Building semantic search or RAG pipelines requiring high-quality text embeddings
Performing cross-lingual or multilingual retrieval across 100+ languages
Code retrieval tasks (retrieving relevant code from natural language queries)
Text classification, clustering, or bitext mining
Two-stage retrieval: dense embedding for candidate selection, then reranker for precision
Needing flexible embedding dimensions via Matryoshka Representation Learning
Deploying embedding models via Sentence Transformers, raw Transformers, vLLM, or TEI

Core Concepts

Dual-Encoder Architecture (Embedding): The embedding model processes a single text input and extracts the semantic representation from the hidden state of the final [EOS] token. This enables efficient approximate nearest neighbor search for retrieval.

Cross-Encoder Architecture (Reranking): The reranker takes query-document pairs as input and outputs a relevance score using a cross-encoder structure. It is used after initial dense retrieval to re-rank top candidates with higher precision.

Instruction-Aware Embedding: Both embedding and reranking models support user-defined instructions. Queries should be prefixed with Instruct: {task_description}\nQuery:{query} while documents are embedded as-is. Using instructions typically yields 1-5% improvement over not using them. Write instructions in English even for multilingual contexts, as training instructions were primarily in English.

Matryoshka Representation Learning (MRL): Embedding models support user-defined output dimensions by truncating the embedding vector. For example, Qwen3-Embedding-8B produces 4096-dim embeddings by default but can output any dimension from 32 to 4096. Smaller dimensions trade accuracy for storage and compute efficiency.

Last-Token Pooling: Embeddings are extracted using last-token pooling — the hidden state corresponding to the final token in each sequence (determined by attention mask). This differs from mean-pooling used by some other embedding models.

Advanced Topics

Model Comparison & Benchmarks: MTEB, C-MTEB, and reranking benchmark results → Model Comparison

Usage Examples: Code for Sentence Transformers, Transformers, vLLM, TEI, and reranker usage → Usage Examples

Architecture & Training: Model architecture details, three-stage training pipeline, LoRA fine-tuning → Architecture & Training