name: lucy-ng:dereplicate
description: Dereplication only - match observed 13C shifts against reference databases to identify known compounds. Use for quick checks if a compound is already known before deciding whether full CASE is needed.
lucy-ng:dereplicate
Dereplication only - match observed 13C shifts against reference databases to identify known compounds.
Purpose
This skill performs ONLY dereplication (database matching). It does NOT proceed to full structure elucidation. Use this when you want to:
Quickly check if a compound is already known
Identify a natural product from a database
Get candidate matches before deciding whether full CASE is needed
Domain Knowledge: For dereplication score interpretation and ranking thresholds,
see
skill/SKILL.mdSection 4 (Dereplication).
Prerequisites
lucy --version || pip install lucy-ng
Database: lucy-ng-derep.db
The SQLite compound database lucy-ng-derep.db contains 928K compounds:
COCONUT: 895,099 natural products with predicted 13C shifts
NMRShiftDB: 33,344 compounds with experimental 13C shifts
111,493 unique molecular formulas indexed for fast lookup
The CLI automatically discovers the database by searching:
LUCY_DATABASEenvironment variabledata/reference/lucy-ng-derep.db(project location)Common paths (
~/.lucy/,~/lucy-ng/,~/.local/share/lucy-ng/)macOS Spotlight (
mdfind) for fast discovery
If not found, download with:
lucy database download
Workflow
Step 1: Identify Available Data
Check what NMR experiments are available:
# List experiments and identify 13C spectrum
for dir in */; do
if [ -f "$dir/acqus" ]; then
nuc=$(grep "##\$NUC1=" "$dir/acqus" | head -1)
echo "Experiment $dir: $nuc"
fi
done
Dereplication requires:
13C spectrum (essential) - either binary data or peak list
Molecular formula (essential) - from user (simulating HRMS)
Step 2: Request Molecular Formula
Always ask the user for the molecular formula. Do not extract from metadata.
"Please provide the molecular formula for this compound (typically from HRMS)."
Step 3: Run Dereplication
From Bruker spectrum (preferred)
lucy dereplicate c13 <bruker_13c_path> <formula> -n 10
The CLI automatically discovers and uses lucy-ng-derep.db.
From peak list (Python API)
from lucy_ng.database import DatabaseQueryService
from lucy_ng.dereplication import DereplicationService
shifts = [187.81, 152.55, 135.73, 123.41, 120.68, 120.09, 118.99, 113.45]
# db_path auto-discovered or specify explicitly
with DatabaseQueryService(db_path) as query:
service = DereplicationService(query)
result = service.dereplicate_from_shifts(shifts, "C16H10N2O2", top_n=10)
for match in result.top_matches:
print(f"{match.entry.name}: score={match.score:.3f}")
Step 4: Interpret Results
For dereplication score interpretation thresholds and recommendations, see skill/SKILL.md Section 4 (Dereplication).
Step 5: Report Results
For strong/possible matches:
## Dereplication Results
**Molecular Formula:** C16H10N2O2
**Database:** lucy-ng-derep.db (928K compounds)
### Top Matches
| Rank | Compound | Score | Avg Deviation |
|------|----------|-------|---------------|
| 1 | [Name] | 0.XX | X.X ppm |
| 2 | [Name] | 0.XX | X.X ppm |
### Assessment
[Strong/Possible/Weak/No] match found.
**Top candidate:** [Name]
**Confidence:** [High/Medium/Low]
**SMILES:** [if available]
### Recommendation
[Either "Compound likely identified as X" or "Consider full CASE for confirmation"]
For no matches:
## Dereplication Results
**Molecular Formula:** C16H10N2O2
**Database:** lucy-ng-derep.db (928K compounds)
No strong matches found (best score: 0.XX)
This suggests:
1. Novel compound not in database
2. Known compound with different stereochemistry
3. Compound not yet added to reference database
### Recommendation
Proceed to full CASE: `/lucy-ng:CASE`
Important Notes
This skill does NOT perform full structure elucidation
Molecular formula must come from user, not metadata
Symmetry affects matching - if formula has 16 carbons but only 8 signals, the compound has symmetry
Database is ~100x faster than SD file scanning due to formula-based indexing
Quick Reference
# Quick dereplication with CLI (auto-discovers database)
lucy dereplicate c13 ./2 C16H10N2O2 -n 10
# Check database status
lucy database info lucy-ng-derep.db
# Download database if missing
lucy database download