name: bio-write-sequences description: Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records. tool_type: python primary_tool: Bio.SeqIO measurable_outcome: Execute skill workflow successfully with valid output within 15 minutes. allowed-tools: - read_file - run_shell_command
Write Sequences
Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module.
Required Import
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
Core Functions
SeqIO.write() - Write Records to File
Write one or more SeqRecord objects to a file.
SeqIO.write(records, 'output.fasta', 'fasta')
Parameters:
records- Single SeqRecord, list, or iterator of SeqRecordshandle- Filename (string) or file handleformat- Output format string
Returns: Number of records written (integer)
record.format() - Get Formatted String
Get a string representation without writing to file.
formatted = record.format('fasta')
print(formatted)
Creating SeqRecord Objects
Minimal SeqRecord
record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1')
Full SeqRecord
record = SeqRecord(
Seq('ATGCGATCGATCG'),
id='seq1',
name='sequence_one',
description='Example sequence for demonstration'
)
With Annotations (for GenBank output)
from Bio.SeqFeature import SeqFeature, FeatureLocation
record = SeqRecord(
Seq('ATGCGATCGATCG'),
id='seq1',
annotations={'molecule_type': 'DNA'}
)
record.features.append(
SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']})
)
Common Formats
| Format | String | Notes |
|---|---|---|
| FASTA | 'fasta' |
Most universal, sequence + header only |
| FASTQ | 'fastq' |
Requires quality scores in letter_annotations |
| GenBank | 'genbank' |
Requires annotations and molecule_type |
| EMBL | 'embl' |
Similar requirements to GenBank |
| Tab | 'tab' |
Simple ID + sequence tabular format |
Code Patterns
Write Single Record
record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence')
SeqIO.write(record, 'output.fasta', 'fasta')
Write Multiple Records
records = [
SeqRecord(Seq('ATGC'), id='seq1'),
SeqRecord(Seq('GCTA'), id='seq2'),
SeqRecord(Seq('TTAA'), id='seq3')
]
count = SeqIO.write(records, 'output.fasta', 'fasta')
print(f'Wrote {count} records')
Write to File Handle
with open('output.fasta', 'w') as handle:
SeqIO.write(records, handle, 'fasta')
Write Modified Records
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def uppercase_record(rec):
return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)
records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')
Append to Existing File
with open('output.fasta', 'a') as handle:
SeqIO.write(new_records, handle, 'fasta')
Write FASTQ with Quality Scores
record = SeqRecord(Seq('ATGCGATCG'), id='read1')
record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30]
SeqIO.write(record, 'output.fastq', 'fastq')
Write GenBank Format
record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example')
record.annotations['molecule_type'] = 'DNA'
record.annotations['topology'] = 'linear'
record.annotations['organism'] = 'Example organism'
SeqIO.write(record, 'output.gb', 'genbank')
Common Errors
| Error | Cause | Solution |
|---|---|---|
TypeError: SeqRecord expected |
Passed raw string/Seq | Wrap in SeqRecord object |
ValueError: missing molecule_type |
GenBank without annotations | Add record.annotations['molecule_type'] = 'DNA' |
ValueError: missing quality scores |
FASTQ without phred_quality | Add quality scores to letter_annotations |
ValueError: Sequences must all be the same length |
PHYLIP with unequal lengths | Pad or trim sequences first |
Format-Specific Requirements
FASTQ
Must have quality scores:
record.letter_annotations['phred_quality'] = [30] * len(record.seq)
GenBank/EMBL
Must have molecule_type:
record.annotations['molecule_type'] = 'DNA' # or 'RNA', 'protein'
PHYLIP
All sequences must be same length. IDs truncated to 10 characters.
Related Skills
- read-sequences - Read sequences before modifying and writing
- format-conversion - Direct format conversion without intermediate processing
- filter-sequences - Filter sequences before writing subset
- sequence-manipulation/seq-objects - Create SeqRecord objects to write
- alignment-files - For SAM/BAM output, use samtools/pysam