name: rag-documents description: Analyze uploaded documents (PDF, Word, Excel, PPT) using local RAG system at http://localhost:9000. Use when user uploads a file and asks to analyze, summarize, or extract information from it.
RAG Document Analysis
Analyze uploaded documents using the local RAG system.
When to Use
- User uploads a file and asks to analyze/summarize it
- User asks to extract data from Excel/PDF
- User asks questions about document content
Supported Formats
- Excel (.xlsx, .xls)
- PDF (.pdf)
- Word (.docx)
- PowerPoint (.pptx)
Prerequisites
1. Get API Token
First, get a token from the RAG system:
# Register a user (one-time)
curl -X POST "http://localhost:9000/v1/users/register" \
-H "Content-Type: application/json" \
-d '{"username": "testuser", "email": "test@test.com", "password": "testpass123"}'
# Or login to get token
curl -X POST "http://localhost:9000/v1/users/login" \
-H "Content-Type: application/json" \
-d '{"username": "testuser", "password": "testpass123"}'
Save the access_token from response.
2. Create Project
curl -X POST "http://localhost:9000/v1/projects" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{"name": "my-project", "description": "For document analysis"}'
Save the id from response as PROJECT_ID.
Usage Steps
1. Upload Document
curl -X POST "http://localhost:9000/v1/documents/upload?project_id=$PROJECT_ID" \
-H "Authorization: Bearer $TOKEN" \
-F "file=@/path/to/file.xlsx"
Returns: {"document_id": "xxx", "status": "processing"}
2. Wait for Processing
# Poll until status is "completed"
curl "http://localhost:9000/v1/documents/{document_id}"
Wait 2-3 seconds between polls. Status: processing → completed
3. Analyze with RAG
curl -X POST "http://localhost:9000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-14b",
"messages": [{"role": "user", "content": "Summarize this document"}],
"project_id": "temp",
"enable_rag": true,
"top_k": 5
}'
4. Cleanup (Optional)
curl -X DELETE "http://localhost:9000/v1/documents/{document_id}"
Complete Example
# Upload file
RESPONSE=$(curl -s -X POST "http://localhost:9000/v1/documents/upload?project_id=temp" \
-F "file=@uploaded_file.xlsx")
DOC_ID=$(echo $RESPONSE | python3 -c "import sys,json; print(json.load(sys.stdin)['document_id'])")
# Wait for processing
for i in {1..30}; do
STATUS=$(curl -s "http://localhost:9000/v1/documents/$DOC_ID" | \
python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
[ "$STATUS" = "completed" ] && break
sleep 2
done
# Query document
curl -s -X POST "http://localhost:9000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"qwen2.5-14b\",
\"messages\": [{\"role\": \"user\", \"content\": \"分析这份文档的主要内容\"}],
\"project_id\": \"temp\",
\"enable_rag\": true
}"
# Delete document
curl -s -X DELETE "http://localhost:9000/v1/documents/$DOC_ID"
Important Notes
- No authentication required - Local service, no tokens needed
- Always use project_id: "temp" for temporary analysis
- Wait for processing - Documents need 5-30 seconds for OCR/chunking
- Check chunk_count - If 0, the file may be empty or image-based
- Always enable_rag: Set to
truefor document-based answers
Troubleshooting
| Issue | Solution |
|---|---|
| Connection refused | Check: docker ps | grep lihuo |
| Processing timeout | Large files take longer, wait up to 60s |
| Empty response | Check if chunk_count > 0 |
| Parse error | File may be corrupted or password-protected |