tooluniverse-drug-target-validation - SKILL.md Agent Skill

name: tooluniverse-drug-target-validation description: Comprehensive computational validation of drug targets for early-stage drug discovery. Evaluates targets across 10 dimensions (disambiguation, disease association, druggability, chemical matter, clinical precedent, safety, pathway context, validation evidence, structural insights, validation roadmap) using 60+ ToolUniverse tools. Produces a quantitative Target Validation Score (0-100) with GO/NO-GO recommendation. Use when users ask about target validation, druggability assessment, target prioritization, or "is X a good drug target for Y?"

Drug Target Validation Pipeline

Validate drug target hypotheses using multi-dimensional computational evidence before committing to wet-lab work. Produces a quantitative Target Validation Score (0-100) with priority tier classification and GO/NO-GO recommendation.

KEY PRINCIPLES:

Report-first approach - Create report file FIRST, then populate progressively
Target disambiguation FIRST - Resolve all identifiers before analysis
Evidence grading - Grade all evidence as T1 (experimental) to T4 (computational)
Disease-specific - Tailor analysis to disease context when provided
Modality-aware - Consider small molecule vs biologics tractability
Safety-first - Prominently flag safety concerns early
Quantitative scoring - Every dimension scored numerically (0-100 composite)
Negative results documented - "No data" is data; empty sections are failures
Source references - Every statement must cite tool/database
Completeness checklist - Mandatory section showing analysis coverage
English-first queries - Always use English terms in tool calls. Respond in user's language

When to Use This Skill

Apply when users:

Ask "Is [target] a good drug target for [disease]?"
Need target validation or druggability assessment
Want to compare targets for drug discovery prioritization
Ask about safety risks of modulating a target
Need chemical starting points for target validation
Ask about pathway context for a target
Need a GO/NO-GO recommendation for a target
Want a comprehensive target dossier for investment decisions

NOT for (use other skills instead):

General target biology overview -> Use tooluniverse-target-research
Drug compound profiling -> Use tooluniverse-drug-research
Variant interpretation -> Use tooluniverse-variant-interpretation
Disease research -> Use tooluniverse-disease-research

Input Parameters

Parameter	Required	Description	Example
target	Yes	Gene symbol, protein name, or UniProt ID	`EGFR`, `P00533`, `Epidermal growth factor receptor`
disease	No	Disease/indication for context	`Non-small cell lung cancer`, `Pancreatic cancer`
modality	No	Preferred therapeutic modality	`small molecule`, `antibody`, `protein therapeutic`, `PROTAC`

Target Validation Scoring System

Score Components (Total: 0-100)

Disease Association (0-30 points):

Genetic evidence: 0-10 (GWAS, rare variants, somatic mutations)
Literature evidence: 0-10 (publications, clinical studies)
Pathway evidence: 0-10 (disease pathway involvement)

Druggability (0-25 points):

Structural tractability: 0-10 (structure quality, binding pockets)
Chemical matter: 0-10 (known compounds, bioactivity data)
Target class: 0-5 (validated target family bonus)

Safety Profile (0-20 points):

Tissue expression selectivity: 0-5 (expression in critical tissues)
Genetic validation: 0-10 (knockout phenotypes, human genetics)
Known adverse events: 0-5 (safety signals from modulators)

Clinical Precedent (0-15 points):

Approved drugs: 15 (strong precedent, validated target)
Clinical trials: 10 (moderate precedent)
Preclinical compounds: 5 (weak precedent)
None: 0 (novel target)

Validation Evidence (0-10 points):

Functional studies: 0-5 (CRISPR, siRNA, biochemical)
Disease models: 0-5 (animal models, patient data)

Priority Tiers

Score	Tier	Recommendation
80-100	Tier 1	Highly validated - proceed with confidence
60-79	Tier 2	Good target - needs focused validation
40-59	Tier 3	Moderate risk - significant validation needed
0-39	Tier 4	High risk - consider alternatives

Evidence Grading System

Tier	Symbol	Criteria	Examples
T1	[T1]	Direct mechanistic, human clinical proof	FDA-approved drug, crystal structure with mechanism, patient mutation
T2	[T2]	Functional studies, model organism	siRNA phenotype, mouse KO, biochemical assay, CRISPR screen
T3	[T3]	Association, screen hits, computational	GWAS hit, DepMap essentiality, expression correlation
T4	[T4]	Mention, review, text-mined, predicted	Review article, database annotation, AlphaFold prediction

Phase 0: Target Disambiguation & ID Resolution (ALWAYS FIRST)

Objective: Resolve target to ALL needed identifiers before any analysis.

Resolution Strategy

# Step 1: Determine input type and get initial identifiers
# If gene symbol (e.g., "EGFR"):
mygene = tu.tools.MyGene_query_genes(query="EGFR", species="human", fields="symbol,name,ensembl.gene,uniprot.Swiss-Prot,entrezgene")
# Extract: ensembl_id, uniprot_id, entrez_id, symbol, name

# If UniProt ID (e.g., "P00533"):
uniprot = tu.tools.UniProt_get_entry_by_accession(accession="P00533")
# Extract: gene names, Ensembl xrefs, function

# Step 2: Resolve Ensembl ID and get versioned ID for GTEx
ensembl = tu.tools.ensembl_lookup_gene(gene_id=ensembl_id, species="homo_sapiens")
# CRITICAL: species parameter is REQUIRED
# CRITICAL: Response is wrapped in {status, data, url, content_type} - access via ensembl['data']
ensembl_data = ensembl.get('data', ensembl) if isinstance(ensembl, dict) else ensembl
# Extract: version for versioned_id (e.g., "ENSG00000146648.18")

# Step 3: Get Ensembl cross-references
xrefs = tu.tools.ensembl_get_xrefs(id=ensembl_id)
# Extract: HGNC, UniProt, EntrezGene mappings

# Step 4: Get OpenTargets target info
ot_target = tu.tools.OpenTargets_get_target_id_description_by_name(targetName="EGFR")
# Verify ensemblId matches

# Step 5: Get ChEMBL target ID
chembl_targets = tu.tools.ChEMBL_search_targets(pref_name__contains="EGFR", organism="Homo sapiens", limit=5)
# Extract: target_chembl_id for later use

# Step 6: Get UniProt function summary
function_info = tu.tools.UniProt_get_function_by_accession(accession=uniprot_id)
# Returns list of strings (NOT dict)

# Step 7: Get alternative names for collision detection
alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=uniprot_id)

Identifier Resolution Output

## 1. Target Identity

| Database | Identifier | Verified |
|----------|-----------|----------|
| Gene Symbol | EGFR | Yes |
| Full Name | Epidermal growth factor receptor | Yes |
| Ensembl | ENSG00000146648 | Yes |
| Ensembl (versioned) | ENSG00000146648.18 | Yes |
| UniProt | P00533 | Yes |
| Entrez Gene | 1956 | Yes |
| ChEMBL | CHEMBL203 | Yes |
| HGNC | HGNC:3236 | Yes |

**Protein Function**: [from UniProt_get_function_by_accession]
**Subcellular Location**: [from UniProt_get_subcellular_location_by_accession]
**Target Class**: [from OpenTargets_get_target_classes_by_ensemblID]

Known Parameter Corrections

Tool	WRONG Parameter	CORRECT Parameter
`ensembl_lookup_gene`	`id`	`gene_id` (+ `species="homo_sapiens"` REQUIRED)
`Reactome_map_uniprot_to_pathways`	`uniprot_id`	`id`
`ensembl_get_xrefs`	`gene_id`	`id`
`GTEx_get_median_gene_expression`	`gencode_id` only	`gencode_id` + `operation="median"`
`OpenTargets_*`	`ensemblID` (uppercase)	`ensemblId` (camelCase)
`OpenTargets_get_publications_*`	`ensemblId`	`entityId`
`OpenTargets_get_associated_drugs_by_target_ensemblID`	`ensemblId` only	`ensemblId` + `size` (REQUIRED)
`MyGene_query_genes`	`q`	`query`
`PubMed_search_articles`	returns `{articles: [...]}`	returns plain list of dicts
`UniProt_get_function_by_accession`	returns dict	returns list of strings
`HPA_get_rna_expression_by_source`	`ensembl_id`	`gene_name` + `source_type` + `source_name` (ALL required)
`alphafold_get_prediction`	`uniprot_accession`	`qualifier`
`drugbank_get_safety_*`	simple params	`query`, `case_sensitive`, `exact_match`, `limit` (ALL required)

Phase 1: Disease Association Evidence (0-30 points)

Objective: Quantify the strength of target-disease association from genetic, literature, and pathway evidence.

1A. OpenTargets Disease Associations (Primary)

# Get ALL disease associations for target
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensembl(ensemblId=ensembl_id)

# If specific disease provided, get detailed evidence
if disease_name:
    disease_info = tu.tools.OpenTargets_get_disease_id_description_by_name(diseaseName=disease_name)
    efo_id = disease_info.get('id')  # e.g., "EFO_0003060"

    evidence = tu.tools.OpenTargets_target_disease_evidence(
        efoId=efo_id, ensemblId=ensembl_id
    )

    # Get evidence by data source for detailed breakdown
    datasource_evidence = tu.tools.OpenTargets_get_evidence_by_datasource(
        efoId=efo_id, ensemblId=ensembl_id,
        datasourceIds=["ot_genetics_portal", "eva", "gene2phenotype", "genomics_england", "uniprot_literature"],
        size=100
    )

1B. GWAS Genetic Evidence

# GWAS associations for target gene
gwas_snps = tu.tools.gwas_get_snps_for_gene(mapped_gene=gene_symbol, size=50)

# If specific disease, search for trait-specific associations
if disease_name:
    gwas_studies = tu.tools.gwas_search_studies(query=disease_name, size=20)

1C. Constraint Scores (gnomAD)

# Genetic constraint - intolerance to loss of function
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
# Extract: pLI, LOEUF, missense_z, pRec
# High pLI (>0.9) = highly intolerant to LoF = likely essential

1D. Literature Evidence

# PubMed for target-disease association
articles = tu.tools.PubMed_search_articles(
    query=f'"{gene_symbol}" AND "{disease_name}" AND (target OR therapeutic OR inhibitor)',
    limit=50
)
# PubMed_search_articles returns a plain list of dicts

# OpenTargets publications
pubs = tu.tools.OpenTargets_get_publications_by_target_ensemblID(entityId=ensembl_id)

Scoring Logic - Disease Association

Genetic Evidence (0-10):
  - GWAS hits for specific disease: +3 per significant locus (max 6)
  - Rare variant evidence (ClinVar pathogenic): +2
  - Somatic mutations in disease: +2
  - pLI > 0.9 (essential gene): +2

Literature Evidence (0-10):
  - >100 publications on target+disease: 10
  - 50-100 publications: 7
  - 10-50 publications: 5
  - 1-10 publications: 3
  - 0 publications: 0

Pathway Evidence (0-10):
  - OpenTargets overall score > 0.8: 10
  - Score 0.5-0.8: 7
  - Score 0.2-0.5: 4
  - Score < 0.2: 1

Phase 2: Druggability Assessment (0-25 points)

Objective: Assess whether the target is amenable to therapeutic intervention.

2A. OpenTargets Tractability

# Tractability assessment across modalities
tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblID(ensemblId=ensembl_id)
# Returns: label, modality (SM, AB, PR, OC), value (boolean/score)
# Modalities: Small Molecule, Antibody, PROTAC, Other Clinical

2B. Target Class & Family

# Target classification (kinase, GPCR, ion channel, etc.)
target_classes = tu.tools.OpenTargets_get_target_classes_by_ensemblID(ensemblId=ensembl_id)

# Pharos target development level
pharos = tu.tools.Pharos_get_target(gene=gene_symbol)
# TDL: Tclin (approved drug) > Tchem (compounds) > Tbio (biology) > Tdark (unknown)

# DGIdb druggability categories
druggability = tu.tools.DGIdb_get_gene_druggability(genes=[gene_symbol])

2C. Structural Tractability

# PDB structures available
if uniprot_id:
    uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
    # Extract PDB cross-references from entry

# AlphaFold prediction
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id)
alphafold_summary = tu.tools.alphafold_get_summary(qualifier=uniprot_id)

# For top PDB structures, analyze binding pockets
# ProteinsPlus DoGSiteScorer for pocket detection
for pdb_id in top_pdb_ids[:3]:
    pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=pdb_id)
    # Returns predicted druggable pockets with scores

2D. Chemical Probes & Enabling Packages

# Chemical probes (validated tool compounds)
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblID(ensemblId=ensembl_id)

# Target Enabling Packages (TEPs)
teps = tu.tools.OpenTargets_get_target_enabling_packages_by_ensemblID(ensemblId=ensembl_id)

Scoring Logic - Druggability

Structural Tractability (0-10):
  - High-res co-crystal structure with ligand: 10
  - PDB structure available, pockets detected: 7
  - AlphaFold only, confident pocket prediction: 5
  - AlphaFold low confidence / no structure: 2
  - No structural data: 0

Chemical Matter (0-10):
  - Known drug-like compounds (IC50 < 100nM): 10
  - Tool compounds (IC50 < 1uM): 7
  - HTS hits only (IC50 > 1uM): 4
  - No known ligands: 0

Target Class Bonus (0-5):
  - Validated druggable family (kinase, GPCR, nuclear receptor): 5
  - Enzyme, ion channel: 4
  - Protein-protein interaction, transporter: 2
  - Novel/unknown class: 0

Phase 3: Known Modulators & Chemical Matter (Feeds into Phase 2 scoring)

Objective: Identify existing chemical starting points for target validation.

3A. ChEMBL Bioactivity

# Search for ChEMBL target
chembl_targets = tu.tools.ChEMBL_search_targets(
    pref_name__contains=gene_symbol, organism="Homo sapiens", limit=10
)

# Get activities for best matching target
target_chembl_id = chembl_targets[0]['target_chembl_id']
activities = tu.tools.ChEMBL_get_target_activities(
    target_chembl_id__exact=target_chembl_id, limit=100
)
# Parse: compound IDs, pChEMBL values, activity types (IC50, Ki, Kd)
# Filter: potent compounds (pChEMBL >= 6.0 = IC50 <= 1uM)

3B. BindingDB Ligands

# Experimental binding data
ligands = tu.tools.BindingDB_get_ligands_by_uniprot(
    uniprot=uniprot_id, affinity_cutoff=10000  # nM
)
# Returns: SMILES, affinity_type (Ki/IC50/Kd), affinity value, PMID

3C. PubChem Bioassays

# HTS screening data
assays = tu.tools.PubChem_search_assays_by_target_gene(gene_symbol=gene_symbol)
# Get details for top assays
for aid in assay_ids[:5]:
    summary = tu.tools.PubChem_get_assay_summary(aid=str(aid))
    targets = tu.tools.PubChem_get_assay_targets(aid=str(aid))
    actives = tu.tools.PubChem_get_assay_active_compounds(aid=str(aid))

3D. Known Drugs Targeting This Protein

# OpenTargets known drugs
drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblID(
    ensemblId=ensembl_id, size=25
)

# ChEMBL drug mechanisms
drug_mechanisms = tu.tools.ChEMBL_search_mechanisms(
    target_chembl_id=target_chembl_id, limit=50
)

# Drug interaction databases
dgidb = tu.tools.DGIdb_get_gene_info(genes=[gene_symbol])

Report Format - Chemical Matter

### 4. Known Modulators & Chemical Matter

#### 4.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Indication | Source |
|------|-----------|-----------|-------|------------|--------|
| Erlotinib | CHEMBL553 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |
| Gefitinib | CHEMBL939 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |

#### 4.2 ChEMBL Bioactivity Summary
**Total Activities**: 12,456 datapoints across 2,341 assays
**Most Potent Compound**: CHEMBL413456 (IC50 = 0.3 nM) [T1]
**Chemical Series**: 8 distinct scaffolds with pChEMBL >= 7.0
**Selectivity Data**: Available for 45 compounds (kinase panel)

#### 4.3 BindingDB Ligands
**Total Ligands**: 856 with measured affinity
**Best Affinity**: 0.1 nM (Ki)
**Affinity Distribution**: <1nM: 23, 1-10nM: 89, 10-100nM: 234, 100nM-1uM: 510

#### 4.4 Chemical Probes
| Probe | Source | Potency | Selectivity | Use |
|-------|--------|---------|-------------|-----|
| SGC-1234 | SGC | IC50=5nM | >100x | In vitro |

Phase 4: Clinical Precedent (0-15 points)

Objective: Assess clinical validation from approved drugs and clinical trials.

4A. FDA-Approved Drugs

# FDA label information
fda_moa = tu.tools.FDA_get_mechanism_of_action_by_drug_name(drug_name=gene_symbol)
fda_indications = tu.tools.FDA_get_indications_by_drug_name(drug_name=known_drug_name)

# DrugBank pharmacology
drugbank_targets = tu.tools.drugbank_get_targets_by_drug_name_or_drugbank_id(
    query=known_drug_name, case_sensitive=False, exact_match=False, limit=10
)

# DrugBank safety info
drugbank_safety = tu.tools.drugbank_get_safety_by_drug_name_or_drugbank_id(
    query=known_drug_name, case_sensitive=False, exact_match=False, limit=10
)

4B. Clinical Trials

# Active clinical trials targeting this protein
trials = tu.tools.search_clinical_trials(
    query_term=gene_symbol,
    intervention=gene_symbol,
    pageSize=50
)

# If specific disease context
if disease_name:
    disease_trials = tu.tools.search_clinical_trials(
        query_term=gene_symbol,
        condition=disease_name,
        pageSize=50
    )

4C. Failed Programs (Learn from Failures)

# Drug warnings and withdrawals
for drug_chembl_id in known_drug_ids:
    warnings = tu.tools.OpenTargets_get_drug_warnings_by_chemblId(chemblId=drug_chembl_id)
    adverse = tu.tools.OpenTargets_get_drug_adverse_events_by_chemblId(chemblId=drug_chembl_id)

Scoring Logic - Clinical Precedent

Clinical Precedent (0-15):
  - FDA-approved drug for SAME disease: 15
  - FDA-approved drug for DIFFERENT disease: 12
  - Phase 3 clinical trial: 10
  - Phase 2 clinical trial: 7
  - Phase 1 clinical trial: 5
  - Preclinical compounds only: 3
  - No clinical development: 0

Adjustment factors:
  - Failed clinical program for safety: -3
  - Drug withdrawal: -5
  - Multiple approved drugs (validated class): +2

Phase 5: Safety & Toxicity Considerations (0-20 points)

Objective: Identify safety risks from expression, genetics, and known adverse events.

5A. OpenTargets Safety Profile

safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblID(ensemblId=ensembl_id)
# Returns: safety liabilities, adverse effects, experimental toxicity

5B. Expression in Critical Tissues

# GTEx tissue expression (identifies essential organ expression)
gtex = tu.tools.GTEx_get_median_gene_expression(
    operation="median", gencode_id=ensembl_versioned_id
)
# If empty, try unversioned ID

# HPA expression
# NOTE: HPA_get_rna_expression_by_source requires gene_name, source_type, source_name
hpa = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
hpa_details = tu.tools.HPA_get_comprehensive_gene_details_by_ensembl_id(ensembl_id=ensembl_id)

# Check expression in safety-critical tissues
# Heart, liver, kidney, brain, bone marrow = high risk if target is expressed

5C. Knockout Phenotypes

# Mouse model phenotypes
mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblID(ensemblId=ensembl_id)

# Genetic constraint (proxy for essentiality)
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
# High pLI = essential gene = potential safety concern

5D. Known Adverse Events from Target Modulation

# For known drugs targeting this protein
for drug_name in known_drug_names:
    fda_adr = tu.tools.FDA_get_adverse_reactions_by_drug_name(drug_name=drug_name)
    fda_warnings = tu.tools.FDA_get_warnings_and_cautions_by_drug_name(drug_name=drug_name)
    fda_boxed = tu.tools.FDA_get_boxed_warning_info_by_drug_name(drug_name=drug_name)
    fda_contraindications = tu.tools.FDA_get_contraindications_by_drug_name(drug_name=drug_name)

5E. Homologs & Off-Target Risks

# Paralogs (close family members that might be hit)
homologs = tu.tools.OpenTargets_get_target_homologues_by_ensemblID(ensemblId=ensembl_id)
# Paralogs with high sequence identity = selectivity challenge

Scoring Logic - Safety

Tissue Expression Selectivity (0-5):
  - Target restricted to disease tissue: 5
  - Low expression in heart/liver/kidney/brain: 4
  - Moderate expression in 1-2 critical tissues: 2
  - High expression in multiple critical tissues: 0

Genetic Validation (0-10):
  - Mouse KO viable, no severe phenotype: 10
  - Mouse KO viable with mild phenotype: 7
  - Mouse KO has concerning phenotype: 3
  - Mouse KO lethal: 0
  - No KO data, low pLI (<0.5): 5
  - No KO data, high pLI (>0.9): 2

Known Adverse Events (0-5):
  - No known safety signals: 5
  - Mild, manageable ADRs: 3
  - Serious ADRs reported: 1
  - Black box warning or drug withdrawal: 0

Phase 6: Pathway Context & Network Analysis

Objective: Understand the target's role in biological networks and disease pathways.

6A. Reactome Pathways

# Map target to pathways
pathways = tu.tools.Reactome_map_uniprot_to_pathways(id=uniprot_id)

# Get pathway details for top pathways
for pathway in top_pathways[:5]:
    detail = tu.tools.Reactome_get_pathway(id=pathway['stId'])
    reactions = tu.tools.Reactome_get_pathway_reactions(id=pathway['stId'])

6B. Protein-Protein Interactions

# STRING network
string_ppi = tu.tools.STRING_get_protein_interactions(
    protein_ids=[gene_symbol], species=9606, confidence_score=0.7
)
# Higher confidence = more reliable

# IntAct interactions (experimental)
intact_ppi = tu.tools.intact_get_interactions(identifier=uniprot_id)

# OpenTargets interactions
ot_ppi = tu.tools.OpenTargets_get_target_interactions_by_ensemblID(ensemblId=ensembl_id)

6C. Functional Enrichment

# GO annotations
go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblID(ensemblId=ensembl_id)

# Direct GO query
go_annotations = tu.tools.GO_get_annotations_for_gene(gene_id=gene_symbol)

# STRING functional enrichment of interaction partners
enrichment = tu.tools.STRING_functional_enrichment(
    protein_ids=[gene_symbol], species=9606
)

Report Format - Pathway Context

### 7. Pathway Context & Network Analysis

#### 7.1 Key Pathways
| Pathway | Reactome ID | Relevance to Disease | Evidence |
|---------|-------------|---------------------|----------|
| EGFR signaling | R-HSA-177929 | Driver pathway in NSCLC | [T1] |
| RAS-RAF-MEK-ERK | R-HSA-5673001 | Downstream effector | [T1] |
| PI3K-AKT signaling | R-HSA-2219528 | Resistance mechanism | [T2] |

#### 7.2 Protein-Protein Interactions
**Total Interactors**: 45 (STRING confidence > 0.7)
**Key Interactors**: GRB2, SHC1, PLCG1, PIK3CA, STAT3

#### 7.3 Pathway Redundancy Assessment
**Compensation Risk**: MODERATE
- Parallel pathways: HER2, HER3 can compensate
- Feedback loops: RAS activation bypasses EGFR
- Downstream convergence: MEK/ERK shared with other RTKs

Phase 7: Validation Evidence (0-10 points)

Objective: Assess existing functional validation data.

7A. DepMap Essentiality (CRISPR/RNAi)

# Gene essentiality in cancer cell lines
deps = tu.tools.DepMap_get_gene_dependencies(gene_symbol=gene_symbol)
# Negative scores = essential (cells die upon KO)
# Score < -0.5: moderately essential
# Score < -1.0: strongly essential

7B. Literature Validation Evidence

# Search for functional studies
validation_papers = tu.tools.PubMed_search_articles(
    query=f'"{gene_symbol}" AND (CRISPR OR siRNA OR knockdown OR knockout OR "loss of function") AND "{disease_name}"',
    limit=30
)

# Search for biomarker studies
biomarker_papers = tu.tools.PubMed_search_articles(
    query=f'"{gene_symbol}" AND (biomarker OR "target engagement" OR "pharmacodynamic")',
    limit=20
)

7C. Animal Model Evidence

# Mouse phenotypes from OpenTargets (already retrieved in Phase 5)
# Reuse mouse_models data

# CTD gene-disease associations (complementary)
ctd_diseases = tu.tools.CTD_get_gene_diseases(input_terms=gene_symbol)

Scoring Logic - Validation Evidence

Functional Studies (0-5):
  - CRISPR KO shows disease-relevant phenotype: 5
  - siRNA knockdown shows phenotype: 4
  - Biochemical assay validates mechanism: 3
  - Overexpression study only: 2
  - No functional data: 0

Disease Models (0-5):
  - Patient-derived xenograft (PDX) response: 5
  - Genetically engineered mouse model: 4
  - Cell line model: 3
  - In silico model only: 1
  - No model data: 0

Phase 8: Structural Insights

Objective: Leverage structural biology for druggability and mechanism understanding.

8A. PDB Structures

# Get PDB entries from UniProt cross-references
uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
# Parse: uniProtKBCrossReferences where database == "PDB"

# Get details for each PDB
for pdb_id in pdb_ids[:10]:
    metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
    quality = tu.tools.pdbe_get_entry_quality(pdb_id=pdb_id)
    summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
    experiment = tu.tools.pdbe_get_entry_experiment(pdb_id=pdb_id)
    molecules = tu.tools.pdbe_get_entry_molecules(pdb_id=pdb_id)

8B. AlphaFold Prediction

alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id)
alphafold_info = tu.tools.alphafold_get_summary(qualifier=uniprot_id)
# Check pLDDT scores for confidence

8C. Binding Pocket Analysis

# ProteinsPlus DoGSiteScorer for best PDB structure
pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=best_pdb_id)
# Returns: pocket locations, druggability scores, volume, surface

# Interaction diagram for co-crystal structures
if has_ligand:
    diagram = tu.tools.ProteinsPlus_generate_interaction_diagram(pdb_id=pdb_id)

8D. Domain Architecture

# InterPro domains
domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=uniprot_id)

# Domain details for key domains
for domain in domains[:5]:
    detail = tu.tools.InterPro_get_domain_details(entry_id=domain['accession'])

Extended Reference: For detailed tool tables, examples, and templates, read REFERENCE.md in this skill directory. The agent can access it via: read skills/tooluniverse-drug-target-validation/REFERENCE.md