lucy-ngcase - SKILL.md Agent Skill

name: lucy-ng:CASE

description: Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations. Use when dereplication returned no matches, the compound is known to be novel, or you want to solve the structure from first principles.

lucy-ng:CASE

Full de novo structure elucidation - skip dereplication and solve the structure from NMR correlations.

Purpose

This skill performs FULL Computer-Assisted Structure Elucidation (CASE) without dereplication. Use this when:

Dereplication already returned no matches
You know the compound is novel/not in databases
You want to solve the structure from first principles
You're evaluating AI-based CASE methodology

Domain Knowledge

Reference: For NMR background, peak picking strategy, symmetry detection,

dereplication scoring, LSD reference, and ranking interpretation,

see the main skill document: skill/SKILL.md

This skill focuses on the CASE procedure (step-by-step execution). The main skill

document contains all shared domain knowledge.

Prerequisites


lucy --version || pip install lucy-ng

lucy lsd check  # Must show LSD and outlsd available

Required Data

| Data | Essential? | Purpose |

|------|-----------|---------|

| Molecular formula | YES | From user (HRMS) |

| 13C spectrum | YES | All carbon positions |

| HSQC | YES | Direct C-H correlations |

| HMBC | YES | Long-range correlations |

| DEPT-135 | Recommended | Multiplicities (CH, CH2, CH3) |

| COSY | Optional | H-H correlations |

Workflow

Supervisor integration: When running under supervisor control, write CASE-PROGRESS.md after each LSD iteration (see Step 7c). This enables the supervisor to detect loops and provide diagnostic guidance.

Step 0: Setup Documentation


mkdir -p analysis

Document all steps in analysis/ as you proceed.

Step 1: Request Molecular Formula

Always ask the user:


"Please provide the molecular formula for this unknown compound (typically from HRMS)."

Calculate key values from formula:

Total carbons
Total hydrogens
Heteroatoms (N, O, S, etc.)
Degree of unsaturation: DBE = (2C + 2 + N - H) / 2

Step 2: Identify Available Experiments


for dir in */; do

    if [ -f "$dir/acqus" ]; then

        nuc=$(grep "##\$NUC1=" "$dir/acqus" | head -1)

        pp=$(grep "##\$PULPROG=" "$dir/acqus" | head -1)

        echo "Exp $dir: $nuc | $pp"

    fi

done

Map experiments:

1H: zg*
13C: zgdc*, zgpg*
DEPT: dept*
HSQC: hsqc*
HMBC: hmbc*
COSY: cosy*

Step 3: Analyze Symmetry

Compare expected vs observed signals:


lucy analyze symmetry <data_dir> <formula>

Or manually:

Count peaks in 13C spectrum
Compare to carbons in formula
If observed < expected → molecule has symmetry

Document:


## Symmetry Analysis

- Expected carbons (from formula): X

- Observed 13C signals: Y

- Interpretation: [No symmetry / C2 symmetry / etc.]

Step 4: Pick 13C Peaks


lucy pick 1d <13c_experiment>

Or from peaklist.xml if binary data is poor:

Extract F1 values from <Peak1D F1="..."/> tags
List all carbon shifts

Document all peaks with proposed assignments:

| # | Shift (ppm) | Type (if known) |

|---|-------------|-----------------|

| 1 | 187.8 | Carbonyl? |

| 2 | 152.5 | C-N? |

| ... | ... | ... |

Step 5: Pick HSQC Peaks

Get raw HSQC peaks:


lucy pick hsqc <hsqc_exp> --format json

Apply DEPT-guided filtering (see skill/SKILL.md Section 3):

Pick DEPT-135 peaks:


lucy pick 1d <dept135_exp> --format json

Match HSQC carbon positions to DEPT carbons within ±1.5 ppm
Extract multiplicities from DEPT sign (positive = CH/CH3, negative = CH2)
If DEPT-90 available, disambiguate CH vs CH3

Document:

Which carbons are protonated (have HSQC signals)
Which are quaternary (no HSQC signal)
Multiplicities (CH, CH2, CH3)

Step 6: Pick HMBC Peaks

Get raw HMBC peaks:


lucy pick hmbc <hmbc_exp> --format json

Apply cross-validation filtering (see skill/SKILL.md Section 3):

Validate each HMBC peak:
- Carbon position exists in 13C peaks (±1.5 ppm)
- Proton position exists in HSQC peaks (±0.1 ppm)
Retain only validated correlations

Document all HMBC correlations:

| Carbon (ppm) | Proton (ppm) | Notes |

|--------------|--------------|-------|

| 187.8 | 7.5 | Carbonyl to aromatic H |

| ... | ... | ... |

Step 7: Generate LSD Input

Write the LSD file directly using skill knowledge:

Reference:

skill/SKILL.md Section 6 (LSD Reference)
skill/diagnostic/SKILL.md Section 1 (LSD Command Reference)

Build the LSD file manually:


; LSD input for <FORMULA>



; Atom definitions (MULT atom# element hybridization H-count)

MULT 1 C 2 0    ; Carbonyl carbon, sp2, 0H (quaternary)

MULT 2 C 2 1    ; Aromatic CH, sp2, 1H

MULT 3 N 3 1    ; Amine nitrogen, sp3, 1H (NH)

MULT 4 O 2 0    ; Carbonyl oxygen, sp2, 0H

...



; HSQC correlations (MUST come before HMBC)

HSQC 2 2        ; C2 has H2 attached

HSQC 5 5        ; C5 has H5 attached

...



; HMBC correlations

HMBC 1 2        ; C1 correlates to H2

HMBC 1 5        ; C1 correlates to H5

...



; Heteroatom constraints (optional but helpful)

BOND 1 4        ; C1 bonded to O4 (carbonyl)

Critical checks before running:

sp2 count is EVEN
Hydrogen count matches formula
All HSQC commands before HMBC commands
NO ELIM command on first run

Step 7b: Iterative HMBC Addition (Minimize Solutions)

CRITICAL: Do NOT add all HMBC correlations at once!

Adding too many HMBC correlations often leads to 0 solutions (over-constrained) due to:

Noise artifacts in the HMBC spectrum
Long-range correlations (⁴J+) that exceed LSD's default 2-3 bond assumption
Overlapping or incorrectly assigned peaks

Strategy: Gradually add HMBC correlations until solutions reach a minimum > 0

Start with high-confidence correlations only (5-7 strongest peaks)
Run LSD and check solution count
Add 1-2 more correlations at a time
Re-run LSD after each addition
Stop when solutions are minimized but still > 0

Workflow example:


# Start with base correlations

cp compound_base.lsd compound_test.lsd

lsd compound_test.lsd 2>&1 | grep solution

# → "47 solutions found"



# Add HMBC 4 9

echo "HMBC 4 9" >> compound_test.lsd

lsd compound_test.lsd 2>&1 | grep solution

# → "12 solutions found"



# Add HMBC 5 9

echo "HMBC 5 9" >> compound_test.lsd

lsd compound_test.lsd 2>&1 | grep solution

# → "1 solution found" ✓ IDEAL!



# If we add one more and get 0 solutions, remove it!

Tracking table (recommended):

|------------|-------------------|-----------|--------|

| 5 | Base set | 47 | Add more |

| 7 | + C1→H7, C2→H10 | 12 | Add more |

| 8 | + C8→H10 | 6 | Add more |

| 9 | + C6→H9 | 6 | Add more |

| 10 | + C4→H9 | 5 | Add more |

| 11 | + C5→H9 | 1 | STOP - Ideal! |

| 12 | + C3→H4 | 0 | Remove last |

Key principles:

Ideal: 1 solution — uniquely determined structure
Acceptable: 2-10 solutions — can rank by 13C prediction
0 solutions — over-constrained, remove last correlation(s)
Never use ELIM to "fix" 0 solutions — it masks the real problem

Prioritize correlations by:

Intensity (stronger peaks are more reliable)
Proximity to known fragment assignments
Correlations that connect unassigned regions

Step 7c: Write Progress Checkpoint (CASE-PROGRESS.md)

After EVERY LSD iteration (including the baseline run), append an iteration entry to CASE-PROGRESS.md in the compound's working directory. This file is read by the supervisor agent to monitor progress, detect loops, and provide diagnostic guidance.

First iteration: Create the file with header section:


# CASE Progress Log



**Compound:** <compound_path>

**Formula:** <molecular_formula>

**Started:** <timestamp>

Each iteration: Append a new section:


---



## Iteration N: <brief description>



**Time:** <timestamp>

**LSD file:** <filename>.lsd

**Solution count:** <count>



**Constraints added:**

- <constraint and reasoning>



**Constraints removed:**

- <constraint and reasoning> (or "None")



**Why:** <natural language explanation of strategy for this iteration>



**Constraint effectiveness:** <% reduction from previous, or "baseline", or "over-constrained (0 solutions)">

**Confidence:** <qualitative assessment: too many solutions / converging / stuck / etc.>

**HMBC correlations used:** X/Y



**Notes:**

- sp2 count: <N> (<even/odd>) <check/warning>

- H budget: <matches/mismatch>

- <other observations>

Rules:

NEVER overwrite the file — always append new iteration sections
Include ALL required fields in every iteration entry
The "Why" field must explain reasoning, not just state what was done
The "Constraints added/removed" must list each constraint individually with reasoning
If recovering from 0 solutions, document which correlations were removed and why

For the complete format specification with examples, see skill/supervisor/SKILL.md Section 7.

Step 8: Run LSD Solver


lucy lsd run compound.lsd

Or directly:


LSD compound.lsd

For solution count interpretation and troubleshooting, see skill/SKILL.md Section 5 (LSD Reference).

Step 9: Convert to SMILES


outlsd 5 < compound.sol > solutions.smi

Step 10: Rank Solutions


lucy lsd rank solutions.smi --spectrum <13c_exp>

# Or with shift list:

lucy lsd rank solutions.smi --shifts "187.8,152.5,135.7,..."

For MAE score interpretation and ranking guidance, see skill/SKILL.md Section 6 (Ranking and Prediction).

Step 11: Analyze J-Coupling Path Lengths

After solving, use lucy lsd analyze to compute the actual J-coupling path lengths for all HMBC correlations:


lucy lsd analyze compound.sol compound.lsd

This command:

Parses the OUTLSD section of the .sol file to extract molecular connectivity
Builds a graph from atom neighbors
Uses BFS shortest path to compute bonds between carbon and proton-bearing carbon
Reports nJ = path_length + 1 for each HMBC correlation

Example output:


Solution 2: 9× ²J 11× ³J (all ²J/³J, no ELIM needed)



HMBC Correlations:

-------------------------------------------------------

  C#   H#    C (ppm)   Path   J-coupling

-------------------------------------------------------

   1    7     131.29      1        ²J_CH

   1   10     131.29      1        ²J_CH

   2    7     124.71      2        ³J_CH

   ...

Interpretation:

All ²J/³J correlations: Structure is consistent with standard HMBC without ELIM
Contains ⁴J+ correlations: May explain why ELIM was needed

JSON output for PDF generation:


lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.json

Generate structure images with LSD atom numbering:


lucy lsd analyze compound.sol compound.lsd --draw solution_{n}.png

This generates a 2D structure image where each atom is labeled with its LSD index (C1, C2, ..., O11), making the HMBC table directly readable against the structure.

Generate publication-quality correlation diagrams with arrows:

For visualizing HMBC correlations directly on the structure with curved arrows and J-coupling labels:


# Generate correlation diagram with atom numbers and J-coupling labels

lucy visualize correlations \

    --sol compound.sol \

    --lsd-file compound.lsd \

    --show-atom-numbers \

    --show-j-coupling \

    -o analysis/hmbc_diagram.svg

This creates a publication-quality SVG diagram showing:

Clean 2D structure (from the solved .sol file)
Red atom number annotations positioned away from the structure
Curved HMBC arrows connecting correlating atoms
²J/³J labels on arrows indicating coupling path length

Include the correlation diagram next to the HMBC table in your PDF report - it provides an immediate visual representation of how the HMBC correlations connect the molecular fragments.

Step 12: Report Results


## CASE Results



**Molecular Formula:** [formula]

**Degree of Unsaturation:** [DBE]



### Data Used

- 13C: [X] signals

- HSQC: [Y] correlations (Z protonated carbons)

- HMBC: [N] correlations

- Symmetry: [description]



### LSD Results

- Solutions found: [count]

- ELIM used: [Yes/No]



### Top Candidates



**Rank 1:** MAE = X.XX ppm ([Quality])

[SMILES]


- Key features: [description]



**Rank 2:** MAE = X.XX ppm ([Quality])

[SMILES]


- Differs from #1 in: [description]



### Confidence Assessment

[High/Medium/Low] - [reasoning]



### Recommendation

[Final structure proposal or need for additional data]

Step 13: Generate PDF Report

Always generate a PDF report with rendered structures and formatted tables at the end of every CASE analysis.


# Generate PDF report with structures and tables

python3 << 'EOF'

from rdkit import Chem

from rdkit.Chem import Draw, AllChem

from reportlab.lib import colors

from reportlab.lib.pagesizes import A4

from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle

from reportlab.lib.units import inch

from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image, Table, TableStyle

from reportlab.lib.enums import TA_CENTER

import io



# Create the PDF document

doc = SimpleDocTemplate(

    "analysis/CASE_Report.pdf",

    pagesize=A4,

    rightMargin=0.75*inch,

    leftMargin=0.75*inch,

    topMargin=0.75*inch,

    bottomMargin=0.75*inch

)



# Styles

styles = getSampleStyleSheet()

title_style = ParagraphStyle('CustomTitle', parent=styles['Heading1'],

    fontSize=20, spaceAfter=30, alignment=TA_CENTER)

heading_style = ParagraphStyle('CustomHeading', parent=styles['Heading2'],

    fontSize=14, spaceBefore=20, spaceAfter=10)

normal_style = styles['Normal']



story = []



# Title

story.append(Paragraph("CASE Structure Elucidation Report", title_style))

story.append(Spacer(1, 0.25*inch))



# Summary table

story.append(Paragraph("Summary", heading_style))

summary_data = [

    ["Molecular Formula", "<FORMULA>"],

    ["Molecular Weight", "<MW> Da"],

    ["Degree of Unsaturation (DBE)", "<DBE>"],

    ["LSD Solutions Found", "<COUNT>"],

]

summary_table = Table(summary_data, colWidths=[2.5*inch, 3*inch])

summary_table.setStyle(TableStyle([

    ('BACKGROUND', (0, 0), (0, -1), colors.lightgrey),

    ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),

    ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),

    ('PADDING', (0, 0), (-1, -1), 8),

]))

story.append(summary_table)

story.append(Spacer(1, 0.3*inch))



# 13C NMR Data Table

story.append(Paragraph("13C NMR Data", heading_style))

c13_data = [

    ["#", "Shift (ppm)", "Multiplicity", "Assignment"],

    # Add rows for each carbon signal:

    # ["1", "131.29", "C (quat)", "=C< olefinic"],

]

c13_table = Table(c13_data, colWidths=[0.4*inch, 1.2*inch, 1.2*inch, 2.5*inch])

c13_table.setStyle(TableStyle([

    ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),

    ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),

    ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),

    ('ALIGN', (0, 0), (-1, -1), 'CENTER'),

    ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),

    ('PADDING', (0, 0), (-1, -1), 6),

]))

story.append(c13_table)

story.append(Spacer(1, 0.3*inch))



# Structure rendering function

def smiles_to_image(smiles, size=(400, 300)):

    mol = Chem.MolFromSmiles(smiles)

    AllChem.Compute2DCoords(mol)

    img = Draw.MolToImage(mol, size=size)

    img_buffer = io.BytesIO()

    img.save(img_buffer, format='PNG')

    img_buffer.seek(0)

    return img_buffer



# For each candidate structure:

story.append(Paragraph("Structure Candidates", heading_style))

# candidate_smiles = ["SMILES1", "SMILES2", ...]

# for i, smi in enumerate(candidate_smiles, 1):

#     story.append(Paragraph(f"<b>Rank {i}:</b> {name}", normal_style))

#     story.append(Paragraph(f"MAE: {mae} ppm | SMILES: {smi}", normal_style))

#     img = smiles_to_image(smi)

#     story.append(Image(img, width=3*inch, height=2.25*inch))

#     story.append(Spacer(1, 0.2*inch))



# Ranking comparison table

story.append(Paragraph("Ranking Comparison", heading_style))

rank_data = [

    ["Rank", "Structure", "MAE (ppm)", "Quality", "Within 3ppm"],

    # ["1", "Name", "2.69", "Good", "6/10"],

]

rank_table = Table(rank_data, colWidths=[0.5*inch, 2.5*inch, 1*inch, 0.8*inch, 1*inch])

rank_table.setStyle(TableStyle([

    ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#4472C4')),

    ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),

    ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),

    ('ALIGN', (0, 0), (-1, -1), 'CENTER'),

    ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),

    ('PADDING', (0, 0), (-1, -1), 6),

]))

story.append(rank_table)



# Build PDF

doc.build(story)

print("PDF report generated: analysis/CASE_Report.pdf")

EOF

CRITICAL: Use data from the successful analysis

Do NOT re-pick peaks for the PDF. Extract all data directly from the LSD file that produced successful solutions. The LSD file contains the exact peaks and correlations that were used.

The PDF report must include complete tables of ALL data used:

Summary table — formula, MW, DBE, solution count, recommended structure
Complete 13C NMR table — ALL carbons used in the LSD file:
- Carbon number (C1, C2, ...)
- Chemical shift (ppm)
- Multiplicity (C, CH, CH2, CH3) from DEPT
- Hybridization (sp2/sp3)
- H-count
- Assignment/interpretation
Complete HSQC table — ALL direct C-H correlations from the LSD file:
- Every HSQC command in the LSD file becomes a row
- Include carbon identity, shift, multiplicity, and proton chemical shift if known
HMBC Correlation Diagram (placed ABOVE the HMBC table):
- Generate the diagram FIRST before the HMBC table:
```
lucy visualize correlations --sol compound.sol --lsd-file compound.lsd \

    --show-atom-numbers -o analysis/hmbc_diagram.svg
```
- Convert SVG to PNG for ReportLab embedding:
```
import cairosvg

cairosvg.svg2png(url='analysis/hmbc_diagram.svg',

                 write_to='analysis/hmbc_diagram.png', scale=2.0)
```
- The diagram shows:
  - Clean 2D structure with explicit atom labels (C, H, O)
  - Red curved arrows connecting HMBC-correlating atoms
  - Atom numbers matching the LSD file numbering
  - Optimized layout to avoid overlaps between arrows and labels
- Include as a centered Image in the PDF, full page width (~6 inches)
Complete HMBC table (placed BELOW the diagram) — ALL long-range correlations from the LSD file:
- Every HMBC command in the LSD file becomes a row
- Columns: "From Carbon", "To Proton", "ⁿJ_CH", "Structural Information"
- The J-coupling column shows path length using spectroscopist notation:
  - ²J_CH = 2-bond (C directly bonded to C bearing H)
  - ³J_CH = 3-bond (most common in HMBC)
  - ⁴J_CH = 4-bond (W-pathway, rare in HMBC)
- CRITICAL: Use lucy lsd analyze to calculate path lengths, do NOT guess!
```
lucy lsd analyze compound.sol compound.lsd --format json > analysis/j_coupling.json
```
  This parses the OUTLSD section and uses BFS to compute actual bond distances.
- All HMBC correlations should be ²J or ³J. If you find ⁴J+, the CASE likely required ELIM.
- ReportLab note: Use Paragraph() objects for cells with super/subscript. Use <super> and <sub> tags.
- Note: Reciprocal correlations (e.g., C1→H7 and C7→H2) appear as separate entries because they provide independent constraints
Excluded signals section — Document WHY certain peaks were not used:
- Solvent peaks (e.g., CDCl3 at 77 ppm)
- Noise/artifacts
- Duplicate signals from overlapping peaks
- Signals that couldn't be assigned confidently
Structure candidates — Rendered 2D images (RDKit) with SMILES and MAE scores
Ranking comparison table — All candidates with MAE, quality rating, carbons within tolerance
Recommended structure — Larger image with SMILES and InChI, plus reasoning if not Rank #1

Required dependencies:

CRITICAL: Install missing dependencies - do NOT fall back to suboptimal solutions (like text placeholders instead of images).


# Core PDF generation (RDKit should already be installed)

pip install reportlab



# SVG to PNG conversion for embedding diagrams in PDF

pip install cairosvg



# cairosvg requires the Cairo system library - install if not present:

# macOS:

brew install cairo

# Then run Python with the library path if needed:

# DYLD_LIBRARY_PATH=/opt/homebrew/opt/cairo/lib:$DYLD_LIBRARY_PATH python3 script.py



# Linux (Debian/Ubuntu):

# sudo apt-get install libcairo2-dev



# Linux (RHEL/CentOS):

# sudo yum install cairo-devel

Before generating the PDF, verify all dependencies are working:


# Test imports - if any fail, install the missing package

from reportlab.platypus import SimpleDocTemplate

from rdkit import Chem

from rdkit.Chem import Draw

import cairosvg  # For SVG→PNG conversion

If cairosvg import fails with "no library called cairo", install the system Cairo library as shown above.

Troubleshooting

For detailed troubleshooting guidance, see skill/SKILL.md Section 5 (LSD Reference) and Section 6 (Ranking and Prediction).

Quick checklist for 0 solutions: sp2 count is EVEN, hydrogen count matches formula, HMBC correlations correct, only then try ELIM 1 0.

Quick Reference


# Full workflow

mkdir -p analysis

lucy pick 1d ./2                                    # 13C peaks

lucy pick hsqc ./5 ./3 --dept90 ./4                # HSQC + multiplicities

lucy pick hmbc ./6 ./2 ./5 --dept135 ./3           # HMBC correlations

lucy lsd generate . C16H10N2O2 -o analysis/compound.lsd  # Generate LSD input

cd analysis && LSD compound.lsd                     # Solve

outlsd 5 < compound.sol > solutions.smi            # Convert to SMILES

lucy lsd rank solutions.smi --spectrum ../2        # Rank by 13C prediction

lucy lsd analyze compound.sol compound.lsd --draw structure_{n}.png  # Analyze with numbered structures

# Generate PDF report (see Step 13 for full template)

IMPORTANT: Always generate a PDF report at the end of every CASE analysis (Step 13).