orfverifier

star 26

Verify open reading frames (ORFs) in plasmid sequences. Use when the user asks about verifying protein coding sequences, checking if an ORF is present in a plasmid, finding where a protein is encoded, or doing six-frame translation analysis.

farnunglab By farnunglab schedule Updated 2/20/2026

name: orfverifier description: Verify open reading frames (ORFs) in plasmid sequences. Use when the user asks about verifying protein coding sequences, checking if an ORF is present in a plasmid, finding where a protein is encoded, or doing six-frame translation analysis. allowed-tools: Bash(orf_verifier_cli:*)

ORF Verifier Tool

Verify that expected amino acid sequences are present in plasmid DNA using six-frame translation.

CLI Usage

Use the CLI at scripts/orf_verifier_cli.py:

Basic Verification

scripts/orf_verifier_cli.py --plasmid /path/to/plasmid.gb --aa-sequence "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK" --name "GFP"

With NCBI Accession

scripts/orf_verifier_cli.py --accession MN514974 --aa-sequence "MHHHHHH..." --name "His6-protein"

JSON Output

scripts/orf_verifier_cli.py --plasmid plasmid.fasta --aa-sequence "MHHHHHH..." --json

Options

  • --plasmid, -p FILE: Path to plasmid file (FASTA or GenBank format)
  • --accession, -a ID: NCBI accession number to fetch
  • --aa-sequence, -s SEQUENCE: Amino acid sequence to verify (required)
  • --name, -n NAME: Name for the ORF (default: Query)
  • --allow-alt-start: Allow alternative start codons (GTG, TTG)
  • --min-identity FLOAT: Minimum identity threshold (0-1, default: 1.0)
  • --max-mismatches INT: Maximum allowed amino acid mismatches (default: 0)
  • --disallow-internal-met: Disallow internal methionine start codons
  • --json: Output results as JSON
  • --list-tags: List available tag definitions

Verification Results

Status Types

  • verified: ORF found at exactly one location
  • not-found: ORF not found in any reading frame
  • indeterminate: Multiple possible locations found

Placement Information

  • Strand: Plus (+) or Minus (-) strand
  • Frame: Reading frame (0, 1, or 2)
  • Position: Nucleotide start and end positions (1-indexed in output)
  • Wraps origin: Whether the ORF crosses the plasmid origin

Auto-Detected Tags

The tool automatically detects common protein tags:

  • His6, His10 (polyhistidine)
  • N-His6-TEV, N-His6-MBP-N10-TEV
  • StrepII, FLAG, HA, Myc
  • TEV site, HRV 3C site
  • SUMO
  • GGGGS linkers

View all tags:

scripts/orf_verifier_cli.py --list-tags

Annotation Verification Mode

Batch verify all CDS annotations in pLannotate-annotated GenBank files against UniProt reference sequences.

Usage

# Verify single plasmid
python3 scripts/orf_verifier_cli.py verify-annotations plasmid.gbk --targets-only

# Verify multiple files
python3 scripts/orf_verifier_cli.py verify-annotations *.gbk --targets-only --summary

# Verify directory of sequencing results
python3 scripts/orf_verifier_cli.py verify-annotations sequencing_results/ --targets-only -o report.md

Options

  • --targets-only: Only verify target proteins (HUMAN, MOUSE, BOVIN), skip E. coli vector components
  • --organism, -O CODE: Filter to specific organism (e.g., HUMAN)
  • --summary: Show only summary, not per-protein details
  • --json: Output as JSON
  • --output, -o FILE: Write to file instead of stdout

Result Status

  • PASS: 100% identity to UniProt reference (includes valid N/C-terminal truncations)
  • FAIL: Less than 100% identity (mutations, deletions, insertions)
  • ERROR: Could not verify (UniProt lookup failed, translation error)

Example Output

============================================================
ANNOTATION VERIFICATION SUMMARY
============================================================
Files processed: 10
Total PASS: 42
Total FAIL: 2
Total ERROR: 0

[PASS] plasmid_1.gbk: 7/7 passed
[FAIL] plasmid_2.gbk: 3/4 passed

Handling Truncations

The tool correctly handles N-terminal and C-terminal truncations as PASS if the expressed region is 100% identical to UniProt. Notes indicate truncation position:

[PASS] MED14_HUMAN (O60244)
       Length: 1409 aa (UniProt: 1454 aa)
       Identity: 100.0%
       Note: N-term truncated: starts at UniProt position 46
Install via CLI
npx skills add https://github.com/farnunglab/benchaid --skill orfverifier
Repository Details
star Stars 26
call_split Forks 4
navigation Branch main
article Path SKILL.md
More from Creator