jgi-lakehouse

star 33

Skills for querying the JGI Dremio Lakehouse containing GOLD and IMG genomics databases. Use this when users want to explore JGI databases, query GOLD (Genomes OnLine Database), IMG (Integrated Microbial Genomes), or run SQL queries against JGI genomics data. Triggers on mentions of JGI, GOLD database, IMG database, genome metadata, or JGI lakehouse queries.

openscientist-io By openscientist-io schedule Updated 2/15/2026

name: jgi-lakehouse description: Skills for querying the JGI Dremio Lakehouse containing GOLD and IMG genomics databases. Use this when users want to explore JGI databases, query GOLD (Genomes OnLine Database), IMG (Integrated Microbial Genomes), or run SQL queries against JGI genomics data. Triggers on mentions of JGI, GOLD database, IMG database, genome metadata, or JGI lakehouse queries. category: domain

JGI Lakehouse Query

Query the JGI Dremio Lakehouse via CLI.

Setup

pip install linkml-store[dremio]
export DREMIO_USER="your_username"
export DREMIO_PASSWORD="your_password"

Connection

# Base connection (no schema filter)
DB="dremio-rest://lakehouse.jgi.lbl.gov"

# GOLD database
GOLD="dremio-rest://lakehouse.jgi.lbl.gov?schema=gold-db-2 postgresql.gold"

# IMG Core database
IMG="dremio-rest://lakehouse.jgi.lbl.gov?schema=img-db-2 postgresql.img_core_v400"

SQL Queries

Direct SQL via --sql flag:

# Query GOLD studies
linkml-store -d "$DB" query --sql 'SELECT * FROM "gold-db-2 postgresql".gold.study LIMIT 10'

# Count by ecosystem
linkml-store -d "$DB" query --sql 'SELECT ecosystem, COUNT(*) as cnt FROM "gold-db-2 postgresql".gold.study GROUP BY ecosystem ORDER BY cnt DESC'

# Join tables
linkml-store -d "$DB" query --sql 'SELECT s.study_name, p.project_name FROM "gold-db-2 postgresql".gold.study s JOIN "gold-db-2 postgresql".gold.project p ON s.study_id = p.study_id LIMIT 10'

# Output formats
linkml-store -d "$DB" query --sql '...' -O json
linkml-store -d "$DB" query --sql '...' -O csv
linkml-store -d "$DB" query --sql '...' -O yaml

Collection Queries

When schema is set, use collection-based queries:

# List tables
linkml-store -d "$GOLD" list-collections

# Query collection
linkml-store -d "$GOLD" -c study query --limit 10

# With filter
linkml-store -d "$GOLD" -c study query -w "ecosystem=Host-associated" --limit 20

# Describe table
linkml-store -d "$GOLD" -c study describe

Schema Export

# Export LinkML schema with FK relationships
linkml-store -d "$GOLD" schema -O yaml -o gold.linkml.yaml

Key Databases

Schema Path Tables Description
"gold-db-2 postgresql".gold 42 GOLD - genome project metadata
"img-db-2 postgresql".img_core_v400 244 IMG Core - genes, taxa, annotations
"img-db-2 postgresql".img_ext 84 IMG Extended data
"img-db-2 postgresql".img_sat_v450 141 IMG Satellite/experimental

See references/databases.md for complete listing.

Common SQL Patterns

# List schemas
linkml-store -d "$DB" query --sql 'SELECT TABLE_SCHEMA, COUNT(*) as tables FROM INFORMATION_SCHEMA."TABLES" GROUP BY TABLE_SCHEMA ORDER BY TABLE_SCHEMA'

# List tables in schema
linkml-store -d "$DB" query --sql 'SELECT TABLE_NAME FROM INFORMATION_SCHEMA."TABLES" WHERE TABLE_SCHEMA = '\''gold-db-2 postgresql.gold'\'' ORDER BY TABLE_NAME'

# Table columns
linkml-store -d "$DB" query --sql 'SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA."COLUMNS" WHERE TABLE_SCHEMA = '\''gold-db-2 postgresql.gold'\'' AND TABLE_NAME = '\''study'\'' ORDER BY ORDINAL_POSITION'

# Row count
linkml-store -d "$DB" query --sql 'SELECT COUNT(*) FROM "gold-db-2 postgresql".gold.study'

Pre-generated Schemas

LinkML schemas with FK relationships: /Users/cjm/repos/linkml-store/jgi-lakehouse-analysis/

Install via CLI
npx skills add https://github.com/openscientist-io/openscientist --skill jgi-lakehouse
Repository Details
star Stars 33
call_split Forks 6
navigation Branch main
article Path SKILL.md
Occupations
More from Creator
openscientist-io
openscientist-io Explore all skills →