bio-clinical-databases-hla-typing - SKILL.md Agent Skill

name: bio-clinical-databases-hla-typing description: Call HLA alleles from NGS data using OptiType, HLA-HD, or arcasHLA for immunogenomics applications. Use when determining HLA genotype for transplant matching, neoantigen prediction, or pharmacogenomic screening. tool_type: cli primary_tool: OptiType measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools: - read_file - run_shell_command

HLA Typing

OptiType (HLA Class I)

From DNA-seq

# Extract HLA reads from BAM
samtools view -h input.bam chr6:28000000-34000000 | \
    samtools fastq -1 hla_R1.fq -2 hla_R2.fq -

# Run OptiType
OptiTypePipeline.py \
    -i hla_R1.fq hla_R2.fq \
    -d \
    -o optitype_output \
    -c config.ini

# Output: optitype_output/sample_result.tsv
# Contains HLA-A, HLA-B, HLA-C alleles (4-field resolution)

From RNA-seq

# RNA mode
OptiTypePipeline.py \
    -i rna_R1.fq rna_R2.fq \
    -r \
    -o optitype_rna_output \
    -c config.ini

OptiType Config

# config.ini
[mapping]
razers3=/path/to/razers3
threads=4

[ilp]
solver=glpk
threads=4

[behavior]
deletebam=true
unpaired_weight=0
use_discordant=false

HLA-HD (Full Resolution)

# HLA-HD for high-resolution typing
# Supports Class I and Class II

# Extract HLA reads
samtools view -b input.bam chr6:28000000-34000000 > hla_region.bam
samtools sort -n hla_region.bam -o hla_sorted.bam
samtools fastq -1 hla_R1.fq -2 hla_R2.fq hla_sorted.bam

# Run HLA-HD
hlahd.sh \
    -t 8 \
    -m 100 \
    -f freq_data \
    hla_R1.fq \
    hla_R2.fq \
    gene_split_filt \
    dictionary \
    sample_name \
    output_dir

# Output includes HLA-A, B, C, DRB1, DQB1, DPB1 at 4-field resolution

arcasHLA (RNA-seq)

# Fast HLA typing from RNA-seq
# Extracts and genotypes in one step

# From STAR-aligned BAM
arcasHLA extract sample.bam -o output_dir
arcasHLA genotype output_dir/sample.extracted.fq.gz -o output_dir

# Output: sample.genotype.json
# {
#   "A": ["A*02:01", "A*24:02"],
#   "B": ["B*35:01", "B*44:03"],
#   "C": ["C*04:01", "C*05:01"]
# }

arcasHLA Merge

# Merge multiple samples
arcasHLA merge output_dir/*.genotype.json -o merged_hla.tsv

HLA Nomenclature

HLA-A*02:01:01:01
     |  |  |  |
     |  |  |  +-- Non-coding variation (optional)
     |  |  +----- Synonymous variation (optional)
     |  +-------- Protein sequence (usually reported)
     +----------- Allele group

Resolution levels:
- 2-field: A*02:01 (protein sequence - clinical standard)
- 4-field: A*02:01:01 (includes synonymous changes)
- Full: A*02:01:01:01 (includes non-coding)

HLA and Pharmacogenomics

# Key HLA-drug associations
HLA_DRUG_ASSOCIATIONS = {
    'B*57:01': {
        'drug': 'Abacavir',
        'reaction': 'Hypersensitivity syndrome',
        'screening': 'Required before prescribing'
    },
    'B*15:02': {
        'drug': 'Carbamazepine',
        'reaction': 'SJS/TEN',
        'populations': 'High risk in Han Chinese, Southeast Asian'
    },
    'B*58:01': {
        'drug': 'Allopurinol',
        'reaction': 'SJS/TEN',
        'populations': 'High risk in Han Chinese, Korean, Thai'
    },
    'A*31:01': {
        'drug': 'Carbamazepine',
        'reaction': 'DRESS',
        'populations': 'European, Japanese'
    }
}

def check_hla_drug_risk(hla_alleles, drug):
    '''Check if patient HLA poses drug reaction risk'''
    risks = []
    for allele in hla_alleles:
        if allele in HLA_DRUG_ASSOCIATIONS:
            assoc = HLA_DRUG_ASSOCIATIONS[allele]
            if assoc['drug'].lower() == drug.lower():
                risks.append({
                    'allele': allele,
                    'drug': drug,
                    'reaction': assoc['reaction']
                })
    return risks

Parse OptiType Results

import pandas as pd

def parse_optitype(result_file):
    '''Parse OptiType TSV output'''
    df = pd.read_csv(result_file, sep='\t')
    # Columns: A1, A2, B1, B2, C1, C2, Reads, Objective
    hla_calls = {
        'HLA-A': [df['A1'].iloc[0], df['A2'].iloc[0]],
        'HLA-B': [df['B1'].iloc[0], df['B2'].iloc[0]],
        'HLA-C': [df['C1'].iloc[0], df['C2'].iloc[0]]
    }
    return hla_calls

def format_hla_report(hla_calls):
    '''Format HLA calls for clinical report'''
    report = []
    for gene, alleles in hla_calls.items():
        allele_str = '/'.join(sorted(set(alleles)))
        report.append(f'{gene}: {allele_str}')
    return '\n'.join(report)

Class I vs Class II

Class	Genes	Function	Typing Priority
Class I	HLA-A, B, C	Present intracellular peptides	Neoantigen, PGx
Class II	HLA-DR, DQ, DP	Present extracellular peptides	Transplant, autoimmune

Tool Comparison

Tool	Input	Classes	Resolution	Speed
OptiType	WGS/WES/RNA	I only	4-field	Fast
HLA-HD	WGS/WES	I and II	4-field	Moderate
arcasHLA	RNA-seq	I and II	4-field	Fast
HLA-LA	WGS	I and II	4-field	Slow

Related Skills

clinical-databases/pharmacogenomics - HLA-drug interactions
variant-calling/clinical-interpretation - Clinical reporting
single-cell/cell-type-annotation - HLA expression