name: lucy-ng:CASE
description: Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations. Use when dereplication returned no matches, the compound is known to be novel, or you want to solve the structure from first principles.
lucy-ng:CASE
Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations.
Purpose
This skill performs FULL Computer-Assisted Structure Elucidation (CASE) without dereplication. Use this when:
Dereplication already returned no matches
You know the compound is novel/not in databases
You want to solve the structure from first principles
You're evaluating AI-based CASE methodology
Domain Knowledge
Reference: For NMR background, peak picking strategy, symmetry detection,
dereplication scoring, LSD reference, and ranking interpretation,
see the main skill document:
skill/SKILL.md
This skill focuses on the CASE procedure (step-by-step execution). The main skill
document contains all shared domain knowledge.
Prerequisites
lucy --version || pip install lucy-ng
lucy lsd check # Must show LSD and outlsd available
Required Data
| Data | Essential? | Purpose |
|------|-----------|---------|
| Molecular formula | YES | From user (HRMS) |
| 13C spectrum | YES | All carbon positions |
| HSQC | YES | Direct C-H correlations |
| HMBC | YES | Long-range correlations |
| DEPT-135 | Recommended | Multiplicities (CH, CH2, CH3) |
| COSY | Optional | H-H correlations |
Workflow
Supervisor integration: When running under supervisor control, write CASE-PROGRESS.md after each LSD iteration (see Step 7c). This enables the supervisor to detect loops and provide diagnostic guidance.
Step 0: Setup Documentation
mkdir -p analysis
Document all steps in analysis/ as you proceed.
Step 1: Request Molecular Formula
Always ask the user:
"Please provide the molecular formula for this unknown compound (typically from HRMS)."
Calculate key values from formula:
Total carbons
Total hydrogens
Heteroatoms (N, O, S, etc.)
Degree of unsaturation: DBE = (2C + 2 + N - H) / 2
Step 2: Identify Available Experiments
for dir in */; do
if [ -f "$dir/acqus" ]; then
nuc=$(grep "##\$NUC1=" "$dir/acqus" | head -1)
pp=$(grep "##\$PULPROG=" "$dir/acqus" | head -1)
echo "Exp $dir: $nuc | $pp"
fi
done
Map experiments:
1H:
zg*13C:
zgdc*,zgpg*DEPT:
dept*HSQC:
hsqc*HMBC:
hmbc*COSY:
cosy*
Step 3: Analyze Symmetry
Compare expected vs observed signals:
lucy analyze symmetry <data_dir> <formula>
Or manually:
Count peaks in 13C spectrum
Compare to carbons in formula
If observed < expected → molecule has symmetry
Document:
## Symmetry Analysis
- Expected carbons (from formula): X
- Observed 13C signals: Y
- Interpretation: [No symmetry / C2 symmetry / etc.]
Step 4: Pick 13C Peaks
lucy pick 1d <13c_experiment>
Or from peaklist.xml if binary data is poor:
Extract F1 values from
<Peak1D F1="..."/>tagsList all carbon shifts
Document all peaks with proposed assignments:
| # | Shift (ppm) | Type (if known) |
|---|-------------|-----------------|
| 1 | 187.8 | Carbonyl? |
| 2 | 152.5 | C-N? |
| ... | ... | ... |
Step 5: Pick HSQC Peaks
Get raw HSQC peaks:
lucy pick hsqc <hsqc_exp> --format json
Apply DEPT-guided filtering (see skill/SKILL.md Section 3):
- Pick DEPT-135 peaks:
lucy pick 1d <dept135_exp> --format json
Match HSQC carbon positions to DEPT carbons within ±1.5 ppm
Extract multiplicities from DEPT sign (positive = CH/CH3, negative = CH2)
If DEPT-90 available, disambiguate CH vs CH3
Document:
Which carbons are protonated (have HSQC signals)
Which are quaternary (no HSQC signal)
Multiplicities (CH, CH2, CH3)
Step 6: Pick HMBC Peaks
Get raw HMBC peaks:
lucy pick hmbc <hmbc_exp> --format json
Apply cross-validation filtering (see skill/SKILL.md Section 3):
Validate each HMBC peak:
Carbon position exists in 13C peaks (±1.5 ppm)
Proton position exists in HSQC peaks (±0.1 ppm)
Retain only validated correlations
Document all HMBC correlations:
| Carbon (ppm) | Proton (ppm) | Notes |
|--------------|--------------|-------|
| 187.8 | 7.5 | Carbonyl to aromatic H |
| ... | ... | ... |
Step 7: Generate LSD Input
Write the LSD file directly using skill knowledge:
Reference:
skill/SKILL.md Section 6 (LSD Reference)
skill/diagnostic/SKILL.md Section 1 (LSD Command Reference)
Build the LSD file manually:
; LSD input for <FORMULA>
; Atom definitions (MULT atom# element hybridization H-count)
MULT 1 C 2 0 ; Carbonyl carbon, sp2, 0H (quaternary)
MULT 2 C 2 1 ; Aromatic CH, sp2, 1H
MULT 3 N 3 1 ; Amine nitrogen, sp3, 1H (NH)
MULT 4 O 2 0 ; Carbonyl oxygen, sp2, 0H
...
; HSQC correlations (MUST come before HMBC)
HSQC 2 2 ; C2 has H2 attached
HSQC 5 5 ; C5 has H5 attached
...
; HMBC correlations
HMBC 1 2 ; C1 correlates to H2
HMBC 1 5 ; C1 correlates to H5
...
; Heteroatom constraints (optional but helpful)
BOND 1 4 ; C1 bonded to O4 (carbonyl)
Critical checks before running:
sp2 count is EVEN
Hydrogen count matches formula
All HSQC commands before HMBC commands
NO
ELIMcommand on first run
Step 7b: Iterative HMBC Addition (Minimize Solutions)
CRITICAL: Do NOT add all HMBC correlations at once!
Adding too many HMBC correlations often leads to 0 solutions (over-constrained) due to:
Noise artifacts in the HMBC spectrum
Long-range correlations (⁴J+) that exceed LSD's default 2-3 bond assumption
Overlapping or incorrectly assigned peaks
Strategy: Gradually add HMBC correlations until solutions reach a minimum > 0
Start with high-confidence correlations only (5-7 strongest peaks)
Run LSD and check solution count
Add 1-2 more correlations at a time
Re-run LSD after each addition
Stop when solutions are minimized but still > 0
Workflow example:
# Start with base correlations
cp compound_base.lsd compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
# → "47 solutions found"
# Add HMBC 4 9
echo "HMBC 4 9" >> compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
# → "12 solutions found"
# Add HMBC 5 9
echo "HMBC 5 9" >> compound_test.lsd
lsd compound_test.lsd 2>&1 | grep solution
# → "1 solution found" ✓ IDEAL!
# If we add one more and get 0 solutions, remove it!
Tracking table (recommended):
| HMBC Count | Correlations Added | Solutions | Action |
|------------|-------------------|-----------|--------|
| 5 | Base set | 47 | Add more |
| 7 | + C1→H7, C2→H10 | 12 | Add more |
| 8 | + C8→H10 | 6 | Add more |
| 9 | + C6→H9 | 6 | Add more |
| 10 | + C4→H9 | 5 | Add more |
| 11 | + C5→H9 | 1 | STOP - Ideal! |
| 12 | + C3→H4 | 0 | Remove last |
Key principles:
Ideal: 1 solution — uniquely determined structure
Acceptable: 2-10 solutions — can rank by 13C prediction
0 solutions — over-constrained, remove last correlation(s)
Never use ELIM to "fix" 0 solutions — it masks the real problem
Prioritize correlations by:
Intensity (stronger peaks are more reliable)
Proximity to known fragment assignments
Correlations that connect unassigned regions
Step 7c: Write Progress Checkpoint (CASE-PROGRESS.md)
After EVERY LSD iteration (including the baseline run), append an iteration entry to CASE-PROGRESS.md in the compound's working directory. This file is read by the supervisor agent to monitor progress, detect loops, and provide diagnostic guidance.
First iteration: Create the file with header section:
# CASE Progress Log
**Compound:** <compound_path>
**Formula:** <molecular_formula>
**Started:** <timestamp>
Each iteration: Append a new section:
---
## Iteration N: <brief description>
**Time:** <timestamp>
**LSD file:** <filename>.lsd
**Solution count:** <count>
**Constraints added:**
- <constraint and reasoning>
**Constraints removed:**
- <constraint and reasoning> (or "None")
**Why:** <natural language explanation of strategy for this iteration>
**Constraint effectiveness:** <% reduction from previous, or "baseline", or "over-constrained (0 solutions)">
**Confidence:** <qualitative assessment: too many solutions / converging / stuck / etc.>
**HMBC correlations used:** X/Y
**Notes:**
- sp2 count: <N> (<even/odd>) <check/warning>
- H budget: <matches/mismatch>
- <other observations>
Rules:
NEVER overwrite the file — always append new iteration sections
Include ALL required fields in every iteration entry
The "Why" field must explain reasoning, not just state what was done
The "Constraints added/removed" must list each constraint individually with reasoning
If recovering from 0 solutions, document which correlations were removed and why
For the complete format specification with examples, see skill/supervisor/SKILL.md Section 7.
Step 8: Run LSD Solver
lucy lsd run compound.lsd
Or directly:
LSD compound.lsd
For solution count interpretation and troubleshooting, see skill/SKILL.md Section 5 (LSD Reference).
Step 9: Convert to SMILES
outlsd 5 < compound.sol > solutions.smi
Step 10: Rank Solutions
lucy lsd rank solutions.smi --spectrum <13c_exp>
# Or with shift list:
lucy lsd rank solutions.smi --shifts "187.8,152.5,135.7,..."
For MAE score interpretation and ranking guidance, see skill/SKILL.md Section 6 (Ranking and Prediction).
Step 11: Analyze J-Coupling Path Lengths
After solving, use lucy lsd analyze to compute the actual J-coupling path lengths for all HMBC correlations:
lucy lsd analyze compound.sol compound.lsd
This command:
Parses the OUTLSD section of the .sol file to extract molecular connectivity
Builds a graph from atom neighbors
Uses BFS shortest path to compute bonds between carbon and proton-bearing carbon
Reports nJ = path_length + 1 for each HMBC correlation
Example output:
Solution 2: 9× ²J 11× ³J (all ²J/³J, no ELIM needed)
HMBC Correlations:
-------------------------------------------------------
C# H# C (ppm) Path J-coupling
-------------------------------------------------------
1 7 131.29 1 ²J_CH
1 10 131.29 1 ²J_CH
2 7 124.71 2 ³J_CH
...
Interpretation:
All ²J/³J correlations: Structure is consistent with standard HMBC without ELIM
Contains ⁴J+ correlations: May explain why ELIM was needed
JSON output for PDF generation:
lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.json
Generate structure images with LSD atom numbering:
lucy lsd analyze compound.sol compound.lsd --draw solution_{n}.png
This generates a 2D structure image where each atom is labeled with its LSD index (C1, C2, ..., O11), making the HMBC table directly readable against the structure.
Generate publication-quality correlation diagrams with arrows:
For visualizing HMBC correlations directly on the structure with curved arrows and J-coupling labels:
# Generate correlation diagram with atom numbers and J-coupling labels
lucy visualize correlations \
--sol compound.sol \
--lsd-file compound.lsd \
--show-atom-numbers \
--show-j-coupling \
-o analysis/hmbc_diagram.svg
This creates a publication-quality SVG diagram showing:
Clean 2D structure (from the solved .sol file)
Red atom number annotations positioned away from the structure
Curved HMBC arrows connecting correlating atoms
²J/³J labels on arrows indicating coupling path length
Include the correlation diagram next to the HMBC table in your PDF report - it provides an immediate visual representation of how the HMBC correlations connect the molecular fragments.
Step 12: Report Results
## CASE Results
**Molecular Formula:** [formula]
**Degree of Unsaturation:** [DBE]
### Data Used
- 13C: [X] signals
- HSQC: [Y] correlations (Z protonated carbons)
- HMBC: [N] correlations
- Symmetry: [description]
### LSD Results
- Solutions found: [count]
- ELIM used: [Yes/No]
### Top Candidates
**Rank 1:** MAE = X.XX ppm ([Quality])
[SMILES]
- Key features: [description]
**Rank 2:** MAE = X.XX ppm ([Quality])
[SMILES]
- Differs from #1 in: [description]
### Confidence Assessment
[High/Medium/Low] - [reasoning]
### Recommendation
[Final structure proposal or need for additional data]
Step 13: Generate PDF Report
Always generate a PDF report with rendered structures and formatted tables at the end of every CASE analysis.
# Generate PDF report with structures and tables
python3 << 'EOF'
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from reportlab.lib import colors
from reportlab.lib.pagesizes import A4
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle
from reportlab.lib.enums import TA_CENTER
import io
# Create the PDF document
doc = SimpleDocTemplate(
"analysis/CASE_Report.pdf",
pagesize=A4,
rightMargin=0.75*inch,
leftMargin=0.75*inch,
topMargin=0.75*inch,
bottomMargin=0.75*inch
)
# Styles
styles = getSampleStyleSheet()
title_style = ParagraphStyle('CustomTitle', parent=styles['Heading1'],
fontSize=20, spaceAfter=30, alignment=TA_CENTER)
heading_style = ParagraphStyle('CustomHeading', parent=styles['Heading2'],
fontSize=14, spaceBefore=20, spaceAfter=10)
normal_style = styles['Normal']
story = []
# Title
story.append(Paragraph("CASE Structure Elucidation Report", title_style))
story.append(Spacer(1, 0.25*inch))
# Summary table
story.append(Paragraph("Summary", heading_style))
summary_data = [
["Molecular Formula", "<FORMULA>"],
["Molecular Weight", "<MW> Da"],
["Degree of Unsaturation (DBE)", "<DBE>"],
["LSD Solutions Found", "<COUNT>"],
]
summary_table = Table(summary_data, colWidths=[2.5*inch, 3*inch])
summary_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),
('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 8),
]))
story.append(summary_table)
story.append(Spacer(1, 0.3*inch))
# 13C NMR Data Table
story.append(Paragraph("13C NMR Data", heading_style))
c13_data = [
["#", "Shift (ppm)", "Multiplicity", "Assignment"],
# Add rows for each carbon signal:
# ["1", "131.29", "C (quat)", "=C< olefinic"],
]
c13_table = Table(c13_data, colWidths=[0.4*inch, 1.2*inch, 1.2*inch, 2.5*inch])
c13_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 6),
]))
story.append(c13_table)
story.append(Spacer(1, 0.3*inch))
# Structure rendering function
def smiles_to_image(smiles, size=(400, 300)):
mol = Chem.MolFromSmiles(smiles)
AllChem.Compute2DCoords(mol)
img = Draw.MolToImage(mol, size=size)
img_buffer = io.BytesIO()
img.save(img_buffer, format='PNG')
img_buffer.seek(0)
return img_buffer
# For each candidate structure:
story.append(Paragraph("Structure Candidates", heading_style))
# candidate_smiles = ["SMILES1", "SMILES2", ...]
# for i, smi in enumerate(candidate_smiles, 1):
# story.append(Paragraph(f"<b>Rank {i}:</b> {name}", normal_style))
# story.append(Paragraph(f"MAE: {mae} ppm | SMILES: {smi}", normal_style))
# img = smiles_to_image(smi)
# story.append(Image(img, width=3*inch, height=2.25*inch))
# story.append(Spacer(1, 0.2*inch))
# Ranking comparison table
story.append(Paragraph("Ranking Comparison", heading_style))
rank_data = [
["Rank", "Structure", "MAE (ppm)", "Quality", "Within 3ppm"],
# ["1", "Name", "2.69", "Good", "6/10"],
]
rank_table = Table(rank_data, colWidths=[0.5*inch, 2.5*inch, 1*inch, 0.8*inch, 1*inch])
rank_table.setStyle(TableStyle([
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (0, 0), (-1, -1), 'CENTER'),
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
('PADDING', (0, 0), (-1, -1), 6),
]))
story.append(rank_table)
# Build PDF
doc.build(story)
print("PDF report generated: analysis/CASE_Report.pdf")
EOF
CRITICAL: Use data from the successful analysis
Do NOT re-pick peaks for the PDF. Extract all data directly from the LSD file that produced successful solutions. The LSD file contains the exact peaks and correlations that were used.
The PDF report must include complete tables of ALL data used:
Summary table — formula, MW, DBE, solution count, recommended structure
Complete 13C NMR table — ALL carbons used in the LSD file:
Carbon number (C1, C2, ...)
Chemical shift (ppm)
Multiplicity (C, CH, CH2, CH3) from DEPT
Hybridization (sp2/sp3)
H-count
Assignment/interpretation
Complete HSQC table — ALL direct C-H correlations from the LSD file:
Every HSQC command in the LSD file becomes a row
Include carbon identity, shift, multiplicity, and proton chemical shift if known
HMBC Correlation Diagram (placed ABOVE the HMBC table):
Generate the diagram FIRST before the HMBC table:
lucy visualize correlations --sol compound.sol --lsd-file compound.lsd \ --show-atom-numbers -o analysis/hmbc_diagram.svgConvert SVG to PNG for ReportLab embedding:
import cairosvg cairosvg.svg2png(url='analysis/hmbc_diagram.svg', write_to='analysis/hmbc_diagram.png', scale=2.0)The diagram shows:
Clean 2D structure with explicit atom labels (C, H, O)
Red curved arrows connecting HMBC-correlating atoms
Atom numbers matching the LSD file numbering
Optimized layout to avoid overlaps between arrows and labels
Include as a centered Image in the PDF, full page width (~6 inches)
Complete HMBC table (placed BELOW the diagram) — ALL long-range correlations from the LSD file:
Every HMBC command in the LSD file becomes a row
Columns: "From Carbon", "To Proton", "nJCH", "Structural Information"
The J-coupling column shows path length using spectroscopist notation:
²JCH = 2-bond (C directly bonded to C bearing H)
³JCH = 3-bond (most common in HMBC)
⁴JCH = 4-bond (W-pathway, rare in HMBC)
CRITICAL: Use
lucy lsd analyzeto calculate path lengths, do NOT guess!lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.jsonThis parses the OUTLSD section and uses BFS to compute actual bond distances.
All HMBC correlations should be ²J or ³J. If you find ⁴J+, the CASE likely required ELIM.
ReportLab note: Use
Paragraph()objects for cells with super/subscript. Use<super>and<sub>tags.Note: Reciprocal correlations (e.g., C1→H7 and C7→H2) appear as separate entries because they provide independent constraints
Excluded signals section — Document WHY certain peaks were not used:
Solvent peaks (e.g., CDCl3 at 77 ppm)
Noise/artifacts
Duplicate signals from overlapping peaks
Signals that couldn't be assigned confidently
Structure candidates — Rendered 2D images (RDKit) with SMILES and MAE scores
Ranking comparison table — All candidates with MAE, quality rating, carbons within tolerance
Recommended structure — Larger image with SMILES and InChI, plus reasoning if not Rank #1
Required dependencies:
CRITICAL: Install missing dependencies - do NOT fall back to suboptimal solutions (like text placeholders instead of images).
# Core PDF generation (RDKit should already be installed)
pip install reportlab
# SVG to PNG conversion for embedding diagrams in PDF
pip install cairosvg
# cairosvg requires the Cairo system library - install if not present:
# macOS:
brew install cairo
# Then run Python with the library path if needed:
# DYLD_LIBRARY_PATH=/opt/homebrew/opt/cairo/lib:$DYLD_LIBRARY_PATH python3 script.py
# Linux (Debian/Ubuntu):
# sudo apt-get install libcairo2-dev
# Linux (RHEL/CentOS):
# sudo yum install cairo-devel
Before generating the PDF, verify all dependencies are working:
# Test imports - if any fail, install the missing package
from reportlab.platypus import SimpleDocTemplate
from rdkit import Chem
from rdkit.Chem import Draw
import cairosvg # For SVG→PNG conversion
If cairosvg import fails with "no library called cairo", install the system Cairo library as shown above.
Troubleshooting
For detailed troubleshooting guidance, see skill/SKILL.md Section 5 (LSD Reference) and Section 6 (Ranking and Prediction).
Quick checklist for 0 solutions: sp2 count is EVEN, hydrogen count matches formula, HMBC correlations correct, only then try ELIM 1 0.
Quick Reference
# Full workflow
mkdir -p analysis
lucy pick 1d ./2 # 13C peaks
lucy pick hsqc ./5 ./3 --dept90 ./4 # HSQC + multiplicities
lucy pick hmbc ./6 ./2 ./5 --dept135 ./3 # HMBC correlations
lucy lsd generate . C16H10N2O2 -o analysis/compound.lsd # Generate LSD input
cd analysis && LSD compound.lsd # Solve
outlsd 5 < compound.sol > solutions.smi # Convert to SMILES
lucy lsd rank solutions.smi --spectrum ../2 # Rank by 13C prediction
lucy lsd analyze compound.sol compound.lsd --draw structure_{n}.png # Analyze with numbered structures
# Generate PDF report (see Step 13 for full template)
IMPORTANT: Always generate a PDF report at the end of every CASE analysis (Step 13).