bio-write-sequences - SKILL.md Agent Skill

name: bio-write-sequences description: Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records. tool_type: python primary_tool: Bio.SeqIO measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools: - read_file - run_shell_command

Write Sequences

Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module.

Required Import

from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

Core Functions

SeqIO.write() - Write Records to File

Write one or more SeqRecord objects to a file.

SeqIO.write(records, 'output.fasta', 'fasta')

Parameters:

records - Single SeqRecord, list, or iterator of SeqRecords
handle - Filename (string) or file handle
format - Output format string

Returns: Number of records written (integer)

record.format() - Get Formatted String

Get a string representation without writing to file.

formatted = record.format('fasta')
print(formatted)

Creating SeqRecord Objects

Minimal SeqRecord

record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1')

Full SeqRecord

record = SeqRecord(
    Seq('ATGCGATCGATCG'),
    id='seq1',
    name='sequence_one',
    description='Example sequence for demonstration'
)

With Annotations (for GenBank output)

from Bio.SeqFeature import SeqFeature, FeatureLocation

record = SeqRecord(
    Seq('ATGCGATCGATCG'),
    id='seq1',
    annotations={'molecule_type': 'DNA'}
)
record.features.append(
    SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']})
)

Common Formats

Format	String	Notes
FASTA	`'fasta'`	Most universal, sequence + header only
FASTQ	`'fastq'`	Requires quality scores in letter_annotations
GenBank	`'genbank'`	Requires annotations and molecule_type
EMBL	`'embl'`	Similar requirements to GenBank
Tab	`'tab'`	Simple ID + sequence tabular format

Code Patterns

Write Single Record

record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence')
SeqIO.write(record, 'output.fasta', 'fasta')

Write Multiple Records

records = [
    SeqRecord(Seq('ATGC'), id='seq1'),
    SeqRecord(Seq('GCTA'), id='seq2'),
    SeqRecord(Seq('TTAA'), id='seq3')
]
count = SeqIO.write(records, 'output.fasta', 'fasta')
print(f'Wrote {count} records')

Write to File Handle

with open('output.fasta', 'w') as handle:
    SeqIO.write(records, handle, 'fasta')

Write Modified Records

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

def uppercase_record(rec):
    return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)

records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')

Append to Existing File

with open('output.fasta', 'a') as handle:
    SeqIO.write(new_records, handle, 'fasta')

Write FASTQ with Quality Scores

record = SeqRecord(Seq('ATGCGATCG'), id='read1')
record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30]
SeqIO.write(record, 'output.fastq', 'fastq')

Write GenBank Format

record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example')
record.annotations['molecule_type'] = 'DNA'
record.annotations['topology'] = 'linear'
record.annotations['organism'] = 'Example organism'
SeqIO.write(record, 'output.gb', 'genbank')

Common Errors

Error	Cause	Solution
`TypeError: SeqRecord expected`	Passed raw string/Seq	Wrap in SeqRecord object
`ValueError: missing molecule_type`	GenBank without annotations	Add `record.annotations['molecule_type'] = 'DNA'`
`ValueError: missing quality scores`	FASTQ without phred_quality	Add quality scores to letter_annotations
`ValueError: Sequences must all be the same length`	PHYLIP with unequal lengths	Pad or trim sequences first

Format-Specific Requirements

FASTQ

Must have quality scores:

record.letter_annotations['phred_quality'] = [30] * len(record.seq)

GenBank/EMBL

Must have molecule_type:

record.annotations['molecule_type'] = 'DNA'  # or 'RNA', 'protein'

PHYLIP

All sequences must be same length. IDs truncated to 10 characters.

Related Skills

read-sequences - Read sequences before modifying and writing
format-conversion - Direct format conversion without intermediate processing
filter-sequences - Filter sequences before writing subset
sequence-manipulation/seq-objects - Create SeqRecord objects to write
alignment-files - For SAM/BAM output, use samtools/pysam