name: gi-promoter description: >- Detect promoter regions in DNA sequences using the Genomic Intelligence G0 transformer (GENA-LM BERT Large), via the hosted /v1/tasks/promoter/predict API. Returns per-window promoter probabilities and called regions. version: 0.1.0 author: ClawBio + Genomic Intelligence domain: genomics license: MIT tags: [genomics, promoter, transcription, regulatory, dna-lm, transformer, gi-api]
inputs:
- name: input_file type: file format: [fa, fasta, fna] description: Single-record FASTA. Any length; the API windows automatically (default model uses 2000 bp context, 1000 bp stride). required: false
outputs:
- name: report type: file format: md description: Markdown report — sequence + model metadata, called promoter regions, headline counts.
- name: result
type: file
format: json
description: Full
{data, meta}response from the GI API plus a flattened summary. - name: reproducibility type: directory description: 'command.sh + environment.json for exact-rerun reproducibility.'
dependencies: python: ">=3.10" packages: - requests>=2.31
demo_data:
- path: example_data/promoter_tp53.fa description: TP53 locus, gene-sense (chr17:7661779-7687546, GRCh38, 25.8 kbp; TP53 is minus-strand, so this is the reverse complement) — bundled real reference sequence.
endpoints: cli: python skills/gi-promoter/gi_promoter.py --input {input_file} --output {output_dir}
metadata: openclaw: requires: bins: [python3] env: [] config: [] always: false emoji: "🧬" homepage: https://docs.genomicintelligence.ai os: [darwin, linux] install: - kind: pip package: requests bins: [] trigger_keywords: - promoter - promoter prediction - predict promoter - find promoter - promoter region - TSS prediction - transcription start site - gi promoter - genomic intelligence promoter - G0 promoter - GENA-LM promoter
🧬 gi-promoter
You are gi-promoter, a ClawBio agent that calls the Genomic Intelligence promoter-prediction model. Given a DNA sequence (any length), it returns per-window promoter probabilities and called regions, all in a few hundred milliseconds via the hosted API.
⚠️ Remote inference — opt-in required. Unlike most ClawBio skills, this skill uploads your FASTA sequence to the hosted Genomic Intelligence API at
https://api.genomicintelligence.ai. Prefer a browser? The same models run interactively at https://genomicintelligence.ai. Do not submit identifiable patient data without an appropriate data-use agreement. Key setup: see Authentication below.
Trigger
Fire this skill when the user says any of:
- "predict promoters in this sequence"
- "find promoters in [gene/region]"
- "is this a promoter?"
- "score this for promoter activity"
- "gi-promoter", "G0 promoter", "GENA-LM promoter"
- "transcription start site prediction", "TSS prediction"
Do NOT fire when:
- The user asks for splice sites →
gi-splice - The user asks for enhancer activity →
gi-enhancer - The user asks for chromatin state →
gi-chromatin - The user asks for gene/transcript structure →
gi-annotation
Why This Exists
- Without it: A user with a multi-kbp sequence has to spin up a GPU, download the GENA-LM weights, tokenize, window, and run inference themselves.
- With it: One CLI call → annotated report in <1 s for typical sequences. The model is hosted; see Authentication for key setup.
- Why ClawBio: Hosted G0 inference plus ClawBio's reproducibility bundle and orchestration chaining (
gi-promoter→gi-expression→variant-annotation).
API Backed
POST https://api.genomicintelligence.ai/v1/tasks/promoter/predict — default model g0-promoter-2000bp (GENA-LM BERT Large, 2000 bp context, 1000 bp prediction window). Override with --model g0-promoter-large-300bp (faster) or --model dnabert-promoter-2000bp (DNABERT, 6-mer tokenization).
Workflow
- Parse: read single-record FASTA via the shared
clawbio.gi.gi_client.read_fastahelper (uppercase, strip non-ACGTN). - POST the full sequence to
/v1/tasks/promoter/predict; the API windows internally. - Render: write
report.md(summary + region table),result.json(full{data, meta}envelope),reproducibility/.
CLI Reference
# Demo — bundled TP53 region
python skills/gi-promoter/gi_promoter.py --demo --output /tmp/gi-promoter-demo
# Your own FASTA
python skills/gi-promoter/gi_promoter.py --input my_region.fa --output report_dir
# Faster 300-bp model
python skills/gi-promoter/gi_promoter.py --demo --model g0-promoter-large-300bp
# Via ClawBio runner
python clawbio.py run gi-promoter --demo
Demo
python clawbio.py run gi-promoter --demo
Bundled fixture is the TP53 locus (19 kbp, GRCh38). Expect ~20 windows, near-zero promoter calls at the default 0.5 threshold (TP53 promoter sits in a small region, not most of the locus) — proves the model is discriminating.
Authentication
The skill requires a Genomic Intelligence partner key in GI_API_KEY. Resolution order:
--api-key <value>CLI flag (explicit override).GI_API_KEYenvironment variable.- Otherwise: the skill raises a
RuntimeErrorpointing here.
Quick start — ClawBio hackathon key
A shared hackathon-tier key ships in .env.example at the repo root (50 concurrent / 120 rpm, opt-in only). From wherever the ClawBio files live on your machine:
# Repo root (git clone) — or ~/.claude/plugins/cache/clawbio/clawbio/<version>/ for plugin installs
cp .env.example .env
set -a && source .env && set +a
Production / heavier use
Request an individual key at contact@genomicintelligence.ai, then:
export GI_API_KEY=gi_yourkeyhere
Gotchas
- Don't pre-window the sequence yourself. Submit the full region — the API stride/window. Pre-windowing inflates rate-limit usage and gives identical results.
- Strand matters — submit gene-sense. The promoter model is strand-sensitive (trained on EPDnew 5'→3' coding-strand sequence). For minus-strand genes, reverse-complement to gene-sense before submission; the plus (genomic) strand returns near-zero (e.g. TP53 on the plus strand finds 0 promoters, on the coding strand finds its real promoters). The bundled TP53 fixture is already gene-sense.
- The hackathon key is shared. If you hit
429, you're sharing 50 concurrent / 120 rpm with everyone else. SetGI_API_KEYto your own key for serious work. - N-content: long stretches of
Nproduce low-confidence calls; pre-trim if the region is mostly gap.
Output Structure
output_dir/
├── report.md # Headline counts, region table, model + timing
├── result.json # Full {data, meta} envelope from the API
└── reproducibility/
├── command.sh # Exact invocation
└── environment.json # API base, model, request_id, timestamp
Integration with Bio Orchestrator
Routes here on: "promoter", "TSS prediction", "find promoter", "score promoter activity".
Chains with: variant-annotation (annotate variants overlapping called promoters), gi-expression (predict expression for sequences scored as promoters), gwas-lookup (look up variants in called promoter regions).
Safety
Research tool. Not a clinical assay. Hosted inference — the sequence you submit traverses the GI API endpoint. Do not submit identifiable patient data without an appropriate agreement.