database-access-skills-index - SKILL.md Agent Skill

id: database_access_index name: Database Access Skills Index description: | Skills for querying and downloading data from genomic, transcriptomic, 3D-genome, and cancer-genomics databases. Covers programmatic access to public repositories, gene annotation, sequence retrieval, processed functional-genomics tracks, Hi-C / Micro-C contact matrices, TCGA-style cohorts, and large-scale single-cell data.

Database Access Skills

Tools and workflows for accessing public biological databases, retrieving sequencing data, querying gene/protein information, pulling processed functional / 3D-genome tracks, fetching cancer-cohort data, and downloading large-scale single-cell datasets.

Quick map — which skill for what

If you need...	Use
Gene / protein / variant metadata (Ensembl, UniProt, NCBI, …)	gget
Raw sequencing reads (FASTQ) from SRA/ENA/GEO/DDBJ/GSA	iSeq
Large-scale single-cell RNA-seq matrices	CELLxGENE Census
Processed functional-genomics tracks (ChIP/ATAC/DNase/RNA-seq bigWig/BAM/peaks)	ENCODE
3D-genome contact matrices (Hi-C / Micro-C / ChIA-PET .mcool / .hic)	4DN
liftOver, UCSC tracks, sequence pulls, large genome catalog	UCSC
Cancer-cohort RNA-seq counts, MAF mutations, CNV, methylation (TCGA / CPTAC)	GDC

The new four (ENCODE / 4DN / UCSC / GDC) are mostly orthogonal to the existing three — they cover processed tracks, 3D genome data, coordinate utilities, and cancer cohorts that gget / iSeq / Census don't reach.

Available Skills

gget — Genomic Database Querying

Python package and CLI tool with 23 interoperable modules for efficiently querying genomic databases including Ensembl, NCBI, UniProt, ARCHS4, Enrichr, COSMIC, OpenTargets, CellxGene, cBioPortal, PDB, and Bgee.

Skill file: gget.md

When to use:

Fetching reference genome/annotation download links (Ensembl)
Searching genes by keyword or retrieving gene metadata
Running BLAST/DIAMOND sequence alignment
Performing enrichment analysis (GO, KEGG, pathway)
Querying cancer mutations (COSMIC) or drug-target associations (OpenTargets)
Retrieving single-cell data from CZ CELLxGENE Discover
Looking up protein structures (PDB, AlphaFold)
Finding tissue expression patterns (ARCHS4, Bgee)
Plotting cancer genomics heatmaps (cBioPortal)

iSeq — Sequencing Data Download

Bash CLI tool for downloading sequencing data and metadata from five public databases (GSA, SRA, ENA, DDBJ, GEO) through a single unified interface. Supports parallel downloads, Aspera transfers, and automatic format conversion.

Skill file: iseq.md

When to use:

Downloading raw sequencing data (FASTQ/SRA) from public repositories
Fetching metadata for projects, experiments, or runs
Downloading from Chinese GSA database (CRA/CRR accessions)
Batch downloading multiple accessions from a file
Converting SRA files to FASTQ format
Merging FASTQ files by experiment, sample, or study

CZ CELLxGENE Census — Single-Cell RNA-seq Data Access

Cloud-based Python API for accessing 217M+ single-cell RNA-seq observations from CZ CELLxGENE Discover via TileDB-SOMA. Supports flexible metadata queries, gene filtering, and pre-computed embeddings (scVI, Geneformer).

Skill file: cellxgene_census.md

When to use:

Querying large-scale single-cell RNA-seq data by tissue, cell type, disease
Downloading count matrices as AnnData objects with metadata filters
Accessing pre-computed embeddings (scVI, Geneformer)
Finding which datasets contain specific genes or cell types
Working with larger-than-memory single-cell datasets via streaming
Exploring CZ CELLxGENE Discover catalog programmatically

ENCODE — Functional Genomics Tracks (ChIP/ATAC/DNase/RNA-seq)

Query and download from the ENCODE Portal — standardised processed files (BAM, bigWig, narrowPeak) for ChIP-seq, ATAC-seq, DNase-seq, RNA-seq, eCLIP, etc. across human / mouse cell types and tissues. Pairs with the igv and gosling LiveViews.

Skill file: encode.md

When to use:

"Find CTCF ChIP-seq peaks in K562 from ENCODE" — TF / histone / chromatin data by target + biosample
ENCODE bigWig signal → IGV track for a paper-quality view
Cross-cell-line / cross-tissue comparison of a regulatory assay
Anywhere you need ENCODE's processed outputs, not raw FASTQ

4DN — 3D Genome Data (Hi-C, Micro-C, ChIA-PET)

Query and download from the 4D Nucleome Data Portal — Hi-C variants, Micro-C, ChIA-PET, SPRITE, GAM, FISH. Returns .mcool / .hic contact matrices and .pairs.gz files. Pairs with the gosling LiveView via the HiGlass back-end, and with cooler / pairix for Python analysis.

Skill file: fourdn.md

When to use:

3D-genome contact map for a cell line / condition
.mcool for use with Cooler / HiGlass / Gosling
ChIA-PET loop calls for a TF
Comparative Hi-C across replicates / conditions

UCSC — Genome Browser API + liftOver

Programmatic access to UCSC's REST API (assemblies, tracks, sequence, gene-symbol search, Track Hubs) and the canonical liftOver coordinate- conversion utility. Use UCSC where Ensembl / NCBI fall short: cross- assembly liftOver, public Track Hubs, the large UCSC assembly catalog (panTro6, calJac4, etc.).

Skill file: ucsc.md

When to use:

liftOver a BED / VCF / single coordinate between assemblies
Find a public Track Hub for a non-model organism
One-off sequence / track pulls without writing pysam boilerplate
Browse UCSC's assembly catalog (assemblies Ensembl doesn't have)

GDC — NCI Genomic Data Commons (TCGA + friends)

Query and download from the NCI GDC — TCGA, CPTAC, TARGET, FM-AD cancer cohorts. Open-tier files (RNA-seq counts, MAF mutations, CNV, methylation, clinical) need no auth; controlled-tier (BAM, gVCF) needs a dbGaP / NCI token. Complements gget cbio (processed cBioPortal view) with the raw GDC files.

Skill file: gdc.md

When to use:

TCGA RNA-seq counts for a project — differential expression input
MAF mutation file for an entire TCGA cohort
CNV / methylation / clinical metadata from TCGA / CPTAC / TARGET
Token-required BAM / gVCF download

Using Skills

Identify your goal: use the quick-map table above to pick the right portal (metadata vs raw reads vs processed tracks vs 3D-genome vs cancer cohort vs single-cell vs coordinate utilities).
Load skill file: Read the full skill document for detailed guidance on its API, filter patterns, and viewer wiring.
Follow examples: each skill's recipes section is copy-paste-able.
Combine tools: they're orthogonal by design — e.g. gget to look up a gene's coordinates → ENCODE for ChIP-seq tracks at that locus → IGV LiveView to display; or GDC for TCGA RNA-seq counts → gget for gene-set enrichment on the result.
Viewer wiring: every new skill includes a "Wire it to a viewer" section showing how to pipe its files into the igv / gosling / molstar LiveViews (download → serve_local_data → viewer track).