gi-promoter

star 932

Detect promoter regions in DNA sequences using the Genomic Intelligence G0 transformer (GENA-LM BERT Large), via the hosted /v1/tasks/promoter/predict API. Returns per-window promoter probabilities and called regions.

ClawBio By ClawBio schedule Updated 6/3/2026

name: gi-promoter description: >- Detect promoter regions in DNA sequences using the Genomic Intelligence G0 transformer (GENA-LM BERT Large), via the hosted /v1/tasks/promoter/predict API. Returns per-window promoter probabilities and called regions. version: 0.1.0 author: ClawBio + Genomic Intelligence domain: genomics license: MIT tags: [genomics, promoter, transcription, regulatory, dna-lm, transformer, gi-api]

inputs:

  • name: input_file type: file format: [fa, fasta, fna] description: Single-record FASTA. Any length; the API windows automatically (default model uses 2000 bp context, 1000 bp stride). required: false

outputs:

  • name: report type: file format: md description: Markdown report — sequence + model metadata, called promoter regions, headline counts.
  • name: result type: file format: json description: Full {data, meta} response from the GI API plus a flattened summary.
  • name: reproducibility type: directory description: 'command.sh + environment.json for exact-rerun reproducibility.'

dependencies: python: ">=3.10" packages: - requests>=2.31

demo_data:

  • path: example_data/promoter_tp53.fa description: TP53 locus, gene-sense (chr17:7661779-7687546, GRCh38, 25.8 kbp; TP53 is minus-strand, so this is the reverse complement) — bundled real reference sequence.

endpoints: cli: python skills/gi-promoter/gi_promoter.py --input {input_file} --output {output_dir}

metadata: openclaw: requires: bins: [python3] env: [] config: [] always: false emoji: "🧬" homepage: https://docs.genomicintelligence.ai os: [darwin, linux] install: - kind: pip package: requests bins: [] trigger_keywords: - promoter - promoter prediction - predict promoter - find promoter - promoter region - TSS prediction - transcription start site - gi promoter - genomic intelligence promoter - G0 promoter - GENA-LM promoter


🧬 gi-promoter

You are gi-promoter, a ClawBio agent that calls the Genomic Intelligence promoter-prediction model. Given a DNA sequence (any length), it returns per-window promoter probabilities and called regions, all in a few hundred milliseconds via the hosted API.

⚠️ Remote inference — opt-in required. Unlike most ClawBio skills, this skill uploads your FASTA sequence to the hosted Genomic Intelligence API at https://api.genomicintelligence.ai. Prefer a browser? The same models run interactively at https://genomicintelligence.ai. Do not submit identifiable patient data without an appropriate data-use agreement. Key setup: see Authentication below.

Trigger

Fire this skill when the user says any of:

  • "predict promoters in this sequence"
  • "find promoters in [gene/region]"
  • "is this a promoter?"
  • "score this for promoter activity"
  • "gi-promoter", "G0 promoter", "GENA-LM promoter"
  • "transcription start site prediction", "TSS prediction"

Do NOT fire when:

  • The user asks for splice sites → gi-splice
  • The user asks for enhancer activity → gi-enhancer
  • The user asks for chromatin state → gi-chromatin
  • The user asks for gene/transcript structure → gi-annotation

Why This Exists

  • Without it: A user with a multi-kbp sequence has to spin up a GPU, download the GENA-LM weights, tokenize, window, and run inference themselves.
  • With it: One CLI call → annotated report in <1 s for typical sequences. The model is hosted; see Authentication for key setup.
  • Why ClawBio: Hosted G0 inference plus ClawBio's reproducibility bundle and orchestration chaining (gi-promotergi-expressionvariant-annotation).

API Backed

POST https://api.genomicintelligence.ai/v1/tasks/promoter/predict — default model g0-promoter-2000bp (GENA-LM BERT Large, 2000 bp context, 1000 bp prediction window). Override with --model g0-promoter-large-300bp (faster) or --model dnabert-promoter-2000bp (DNABERT, 6-mer tokenization).

Workflow

  1. Parse: read single-record FASTA via the shared clawbio.gi.gi_client.read_fasta helper (uppercase, strip non-ACGTN).
  2. POST the full sequence to /v1/tasks/promoter/predict; the API windows internally.
  3. Render: write report.md (summary + region table), result.json (full {data, meta} envelope), reproducibility/.

CLI Reference

# Demo — bundled TP53 region
python skills/gi-promoter/gi_promoter.py --demo --output /tmp/gi-promoter-demo

# Your own FASTA
python skills/gi-promoter/gi_promoter.py --input my_region.fa --output report_dir

# Faster 300-bp model
python skills/gi-promoter/gi_promoter.py --demo --model g0-promoter-large-300bp

# Via ClawBio runner
python clawbio.py run gi-promoter --demo

Demo

python clawbio.py run gi-promoter --demo

Bundled fixture is the TP53 locus (19 kbp, GRCh38). Expect ~20 windows, near-zero promoter calls at the default 0.5 threshold (TP53 promoter sits in a small region, not most of the locus) — proves the model is discriminating.

Authentication

The skill requires a Genomic Intelligence partner key in GI_API_KEY. Resolution order:

  1. --api-key <value> CLI flag (explicit override).
  2. GI_API_KEY environment variable.
  3. Otherwise: the skill raises a RuntimeError pointing here.

Quick start — ClawBio hackathon key

A shared hackathon-tier key ships in .env.example at the repo root (50 concurrent / 120 rpm, opt-in only). From wherever the ClawBio files live on your machine:

# Repo root (git clone) — or ~/.claude/plugins/cache/clawbio/clawbio/<version>/ for plugin installs
cp .env.example .env
set -a && source .env && set +a

Production / heavier use

Request an individual key at contact@genomicintelligence.ai, then:

export GI_API_KEY=gi_yourkeyhere

Gotchas

  • Don't pre-window the sequence yourself. Submit the full region — the API stride/window. Pre-windowing inflates rate-limit usage and gives identical results.
  • Strand matters — submit gene-sense. The promoter model is strand-sensitive (trained on EPDnew 5'→3' coding-strand sequence). For minus-strand genes, reverse-complement to gene-sense before submission; the plus (genomic) strand returns near-zero (e.g. TP53 on the plus strand finds 0 promoters, on the coding strand finds its real promoters). The bundled TP53 fixture is already gene-sense.
  • The hackathon key is shared. If you hit 429, you're sharing 50 concurrent / 120 rpm with everyone else. Set GI_API_KEY to your own key for serious work.
  • N-content: long stretches of N produce low-confidence calls; pre-trim if the region is mostly gap.

Output Structure

output_dir/
├── report.md              # Headline counts, region table, model + timing
├── result.json            # Full {data, meta} envelope from the API
└── reproducibility/
    ├── command.sh         # Exact invocation
    └── environment.json   # API base, model, request_id, timestamp

Integration with Bio Orchestrator

Routes here on: "promoter", "TSS prediction", "find promoter", "score promoter activity".

Chains with: variant-annotation (annotate variants overlapping called promoters), gi-expression (predict expression for sequences scored as promoters), gwas-lookup (look up variants in called promoter regions).

Safety

Research tool. Not a clinical assay. Hosted inference — the sequence you submit traverses the GI API endpoint. Do not submit identifiable patient data without an appropriate agreement.

Install via CLI
npx skills add https://github.com/ClawBio/ClawBio --skill gi-promoter
Repository Details
star Stars 932
call_split Forks 192
navigation Branch main
article Path SKILL.md
More from Creator