bio-genome-engineering-grna-design

star 912

Designs and ranks guide RNAs (sgRNAs) for CRISPR-Cas9/Cas12a gene knockout by scanning a target for PAM sites (NGG SpCas9, NNGRRT SaCas9, TTTV Cas12a, NG SpCas9-NG, near-PAMless SpRY), enumerating candidate spacers, applying hard filters (Pol-III TTTT terminator, 5' G, GC), ranking on-target activity with the context-appropriate model (Rule Set 2/Azimuth for U6/lentiviral, CRISPRscan for T7/embryo, DeepHF for high-fidelity variants, DeepCpf1 for Cas12a), and predicting the indel/frameshift outcome (Bae out-of-frame score, inDelphi, FORECasT, Lindel). Use when selecting sgRNAs to knock out a gene, choosing a nuclease/PAM for a constrained locus, picking which exon to target, or shortlisting guides before an off-target check. Off-target specificity, base/prime editing, and HDR donors are separate skills.

GPTomics By GPTomics schedule Updated 6/15/2026

name: bio-genome-engineering-grna-design description: Designs and ranks guide RNAs (sgRNAs) for CRISPR-Cas9/Cas12a gene knockout by scanning a target for PAM sites (NGG SpCas9, NNGRRT SaCas9, TTTV Cas12a, NG SpCas9-NG, near-PAMless SpRY), enumerating candidate spacers, applying hard filters (Pol-III TTTT terminator, 5' G, GC), ranking on-target activity with the context-appropriate model (Rule Set 2/Azimuth for U6/lentiviral, CRISPRscan for T7/embryo, DeepHF for high-fidelity variants, DeepCpf1 for Cas12a), and predicting the indel/frameshift outcome (Bae out-of-frame score, inDelphi, FORECasT, Lindel). Use when selecting sgRNAs to knock out a gene, choosing a nuclease/PAM for a constrained locus, picking which exon to target, or shortlisting guides before an off-target check. Off-target specificity, base/prime editing, and HDR donors are separate skills. tool_type: python primary_tool: CRISPOR

Version Compatibility

Reference examples tested with: BioPython 1.83+, CRISPOR 5.0+ (web/CLI).

Before using code patterns, verify installed versions match. If versions differ:

  • Python: pip show <package> then help(module.function) to check signatures
  • CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Output depends on inputs more than tool versions: on-target scores are model-specific and not interchangeable (a 0.7 Azimuth score is not a 0.7 CRISPRscan score), and the valid model is set by how the guide is delivered/transcribed, not by preference. Record the nuclease, the delivery context (U6/lentiviral vs in-vitro T7/RNP), and the reference genome build used for any off-target step.

Guide RNA Design

"Design guide RNAs to knock out my gene" -> Establish the delivery context, scan the target for the nuclease's PAM on both strands, drop guides that fail hard filters, rank survivors with the context-valid on-target model, choose the cut site by exon/transcript biology, and prefer guides whose predicted indel spectrum is frameshift-rich.

  • Python: enumerate PAMs and apply hard filters with Bio.Seq + re; compute a Bae-style microhomology out-of-frame score
  • CLI/web: crispor.py <genome> in.fa out.tsv aggregates the context-appropriate on-target score + off-target nomination per genome
  • Web/code: inDelphi / FORECasT / Lindel for the full repair-outcome distribution

The Single Most Important Modern Insight -- a guide produces a reproducible indel distribution, not "a cut", and knockout success is a property of that distribution

Two facts that naive design ignores and that pass review constantly:

  1. On-target efficiency scores are weak, context-locked predictors. Rule Set 2, CRISPRscan, and DeepCas9 scores correlate with measured cutting at only Spearman ~0.4 across realistic contexts (~0.7 is the ceiling even within one matched context; the same guides re-tested in another cell line correlate ~0.37-0.48). Each was trained on one assay -- U6-Pol-III lentiviral vs in-vitro T7 vs RNP -- and does not transfer across nuclease, delivery, promoter, cell type, or temperature (Haeussler 2016). Using CRISPRscan (T7/zebrafish-trained) to rank guides for a U6 lentiviral screen is a category error. Rank to shortlist, then design 3-6 guides and validate -- never trust the rank as truth.

  2. Efficient editing is not knockout. A cut yields a characteristic, reproducible set of indels (Shen 2018; Allen 2019; Chen 2019); roughly 1/3 of indels are in-frame, so a 95%-efficient guide can still leave functional protein. Worse, even a confirmed frameshift may not eliminate protein -- translation reinitiation, exon skipping, NMD escape, and transcriptional adaptation rescue ~1/3 of verified knockouts (Smits 2019; Mou 2017; El-Brolosy 2019). So the modern question is "which guide, at which site, produces a high out-of-frame fraction in an NMD-competent, constitutive transcript region?" -- couple an outcome model to exon biology, not just an efficiency score. Verify the knockout at the protein level.

On-Target Score Taxonomy -- each model is valid for ONE context

Model Citation Trained on (valid for) Notes
Rule Set 1 Doench 2014 Nat Biotechnol 32:1262 U6 mammalian superseded; origin of GC/position rules
Rule Set 2 / Azimuth Doench/Fusi 2016 Nat Biotechnol 34:184 U6/lentiviral mammalian KO -- the default for screens & cell lines gradient-boosted; best U6 predictor (Haeussler 2016)
CRISPRscan Moreno-Mateos 2015 Nat Methods 12:982 in-vitro T7 / embryo injection -- NOT U6 wrong tool for lentiviral screens
DeepSpCas9 Kim 2019 Sci Adv 5:eaax9249 SpCas9 mammalian; strong transfer CNN
DeepHF Wang 2019 Nat Commun 10:4284 conditions on the enzyme variant (WT, eSpCas9, HF1) use when using a high-fidelity Cas9
DeepCpf1 / Seq-deepCpf1 Kim 2018 Nat Biotechnol 36:239 AsCas12a (Deep adds chromatin) use for Cas12a, not Cas9

Treat any score as a rank-and-shortlist signal (Spearman ~0.4 across context), never an oracle.

Nuclease & PAM Taxonomy -- expanding PAM range trades away activity/specificity

Nuclease PAM Guide Cut When
SpCas9 (WT) 5'-NGG-3' 20 nt blunt, ~3 bp 5' of PAM default workhorse; most data, most scores
SaCas9 5'-NNGRRT-3' ~21 nt blunt ~1 kb smaller -> fits a single AAV (Ran 2015)
SpCas9-NG 5'-NG-3' 20 nt blunt relaxed PAM; lower activity at many sites (Nishimasu 2018)
xCas9 NG, GAA, GAT 20 nt blunt broad PAM, high specificity, site-variable/modest activity (Hu 2018)
SpRY near-PAMless (NRN>NYN) 20 nt blunt "target anywhere"; pays in activity + off-target breadth (Walton 2020)
AsCas12a / LbCas12a 5'-TTTV-3' (5' PAM) ~20-23 nt staggered 5' overhang AT-rich targets; self-processing crRNA array = easy multiplexing
enAsCas12a expanded (TTTV + non-canonical) ~20-23 nt staggered ~2x activity + broadened range (Kleinstiver 2019)

Default to WT-SpCas9-NGG; escalate to NG/xCas9/SpRY only when no acceptable NGG sits in the required window, and expect to validate harder (the valid on-target score and the off-target burden both change).

Decision Tree by Scenario

Scenario Recommended Why
Single-gene KO, NGG in an early constitutive exon SpCas9 + Rule Set 2/Azimuth shortlist -> outcome model -> off-target frameshift in an NMD-competent exon kills all isoforms
In-vitro-transcribed / embryo / RNP injection score with CRISPRscan, apply T7 (not U6) filters Rule Set 2 is invalid here; TTTT/5'G Pol-III rules do not apply
AT-rich target, no good NGG; or multiplex KO Cas12a (TTTV) + DeepCpf1 PAM availability and crRNA-array multiplexing, not on-target score, are limiting
AAV in-vivo delivery SaCas9 (NNGRRT) packaging limit dictates the compact nuclease, which dictates the PAM set
Functional/negative-selection screen tile sgRNAs across the conserved functional domain (Shi 2015) domain indels are LoF even in-frame -> more true nulls than 5'-exon targeting
Have ranked candidates, need specificity -> off-target-prediction on-target score does not predict specificity
Scale to many genes -> crispr-screens/library-design pooled library construction
Single base change / no DSB tolerated -> base-editing-design or prime-editing-design scarless, DSB-free; KO-by-stop also avoids indels

Enumerate and Filter Candidate Guides

Goal: Return valid candidate spacers for a target, on both strands, dropping guides that cannot work in the chosen delivery context.

Approach: Scan both strands for the nuclease's PAM, extract the protospacer upstream (Cas9) or downstream (Cas12a) of each PAM, and apply hard filters -- reject TTTT (Pol-III terminator) for U6/H1 expression, flag a missing 5' G for U6 (prepend a G rather than replace the first base), and note GC outside ~40-70% as a soft penalty. Ranking comes from the context-valid model (route to CRISPOR), not from a hand-rolled score.

from Bio.Seq import Seq
import re

GC_MIN, GC_MAX = 0.40, 0.70   # outside this band on-target activity falls off (Doench 2014); soft penalty

def find_guides(sequence, pam='NGG', guide_length=20):
    '''Enumerate SpCas9 (NGG) spacers on both strands; spacer is 5' of the PAM.'''
    seq = sequence.upper()
    guides = []
    for m in re.finditer(r'(?=([ACGT]GG))', seq):
        pos = m.start()
        if pos >= guide_length:
            guides.append({'spacer': seq[pos - guide_length:pos], 'pam': seq[pos:pos + 3],
                           'cut': pos - 3, 'strand': '+'})   # SpCas9 cuts ~3 bp 5' of the PAM
    rc = str(Seq(seq).reverse_complement())
    n = len(seq)
    for m in re.finditer(r'(?=([ACGT]GG))', rc):
        pos = m.start()
        if pos >= guide_length:
            guides.append({'spacer': rc[pos - guide_length:pos], 'pam': rc[pos:pos + 3],
                           'cut': n - (pos - 3), 'strand': '-'})
    return guides

def passes_u6_filters(spacer):
    '''Hard filters for U6/H1 Pol-III expression (NOT applicable to in-vitro T7/RNP).'''
    gc = sum(c in 'GC' for c in spacer) / len(spacer)
    return 'TTTT' not in spacer and GC_MIN <= gc <= GC_MAX   # TTTT terminates Pol III

Rank On-Target Activity in the Valid Context

Goal: Shortlist guides by predicted cutting using the model that matches the delivery context.

Approach: Do NOT hand-roll a scoring matrix. Route to CRISPOR, which selects the context-appropriate score (Rule Set 2/Azimuth for U6/lentiviral, CRISPRscan for T7/embryo) per the Haeussler 2016 logic and also nominates off-targets against the chosen genome. Treat the returned score as a shortlist signal, then carry 3-6 candidates forward.

# CRISPOR: aggregates the context-valid on-target score + off-target nomination per genome
crispor.py hg38 target.fa guides.tsv --maxOcc 60000
# columns include the on-target score (context-selected) and off-target counts/specificity

Choose the Cut Site by Exon Biology (the under-used lever)

KO success is mostly won here, and pure efficiency ranking fails:

  • Target an early, constitutive coding exon (present in all protein-coding isoforms) -- but not the start-ATG region (downstream reinitiation can rescue an N-terminal truncation).
  • Avoid the last exon and the last ~50 nt of the penultimate exon -- PTCs there escape NMD, leaving a stable, possibly-functional truncated protein.
  • Keep the cut away from splice donor/acceptor sites unless splice disruption is the goal -- indels there cause exon skipping that can restore frame (Mou 2017).
  • Confirm the exon is constitutive in the cell type of interest (an exon spliced out of the dominant isoform is a silent failure), and screen for SNPs under the protospacer/PAM in the actual background (mismatch/PAM loss -> allele dropout).
  • For ruthless KO / screens: tile the conserved functional domain (Shi 2015), not the gene start.

Predict the Editing Outcome (frameshift fraction decides KO)

Goal: Prefer guides whose predicted indel spectrum is frameshift-rich (and, for a single-genotype line, dominated by one outcome).

Approach: Cas9 repair outcomes are predictable from the ~30 bp of local sequence flanking the cut. The cheap, no-ML signal is the Bae 2014 microhomology out-of-frame score: enumerate microhomology pairs flanking the cut, weight each predicted MMEJ deletion, and report the fraction whose length is not a multiple of 3. For a full genotype distribution use inDelphi (Shen 2018), FORECasT (Allen 2019), or Lindel (Chen 2019). Rank by (editing efficiency) x (out-of-frame fraction) -- a 70%-efficient guide with frameshift fraction 0.9 beats a 90%-efficient guide at 0.5. (See examples/grna_design.py for a runnable Bae-style out-of-frame implementation.)

Per-Method Failure Modes

"We used the top-ranked guide" with no validation

Trigger: sorting by on-target score and taking #1. Mechanism: scores are Spearman ~0.4 across context. Symptom: confident ranking, poor empirical hit rate. Fix: design 3-6 guides per gene and validate; treat the score as triage.

Score used out of its training context

Trigger: CRISPRscan for a lentiviral screen, or Rule Set 2 for embryo RNP. Mechanism: each model is an assay artifact (Haeussler 2016). Symptom: "principled" but wrong ranking. Fix: pick the score from the delivery context before reading any number.

Efficient cut, no knockout phenotype

Trigger: ranking by editing efficiency. Mechanism: ~1/3 in-frame indels + reinitiation/exon-skipping/NMD-escape/compensation. Symptom: high indel %, residual protein, milder-than-knockdown phenotype. Fix: rank by frameshift fraction (Bae/inDelphi), target early constitutive NMD-competent exons, verify at protein level.

Last-exon / splice-site guide

Trigger: "early exon" applied naively. Mechanism: late PTC escapes NMD; splice-site indel skips the exon. Symptom: stable truncated/reframed protein. Fix: retarget an early constitutive exon away from junctions.

Poly-T or missing 5' G in a U6 construct

Trigger: spacer with TTTT or non-G 5' end expressed from U6/H1. Mechanism: Pol-III termination / poor initiation. Symptom: little or no sgRNA. Fix: reject TTTT; prepend (do not replace) a 5' G. (Irrelevant for in-vitro T7/RNP.)

Allele dropout in a non-reference background

Trigger: designing against GRCh38 for a patient/hybrid/cancer line. Mechanism: a SNP in the seed or PAM blocks one allele. Symptom: heterozygous "knockout" with a retained functional allele. Fix: design against the actual genotype.

Quantitative Thresholds

Parameter Value Source / rationale
On-target score use rank/shortlist only; ~0.4 Spearman across context Haeussler 2016
GC content ~40-70% (soft penalty) Doench 2014
Pol-III terminator reject TTTT (U6/H1 only) Pol-III termination
5' G (U6) prepend a G if absent Pol-III initiation preference
SpCas9 cut ~3 bp 5' of NGG (blunt) Jinek 2012
Bae out-of-frame score prefer >66 Bae 2014 frameshift-reliability recommendation
KO ranking efficiency x out-of-frame fraction frameshift fraction, not cutting, drives KO
Guides per gene 3-6, validate empirically scores are weak; redundancy buys back error
Exon target early, constitutive, NMD-competent (not last exon / last ~50 nt of penult.) PTC must trigger NMD across all isoforms
Residual protein after frameshift expect ~1/3 retain protein Smits 2019

Common Errors

Error / symptom Cause Solution
No guides found no PAM in window / wrong PAM for nuclease try Cas12a (TTTV) for AT-rich; widen window; SpCas9-NG/SpRY as last resort
Guide cuts but no KO phenotype last exon / 3'UTR / in-frame indels / compensation retarget early constitutive exon; rank by frameshift; verify protein
Score looks low for a clearly good guide score used outside its training context use the context-valid model
Heterozygous result in a non-reference line SNP under guide/PAM design against the actual genotype

References

  • Jinek M, et al. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337(6096):816-821.
  • Doench JG, et al. (2014). Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol 32(12):1262-1267.
  • Doench JG, Fusi N, Sullender M, et al. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol 34(2):184-191.
  • Moreno-Mateos MA, et al. (2015). CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods 12(10):982-988.
  • Haeussler M, et al. (2016). Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol 17:148.
  • Kim HK, et al. (2019). SpCas9 activity prediction by DeepSpCas9. Sci Adv 5(11):eaax9249.
  • Wang D, et al. (2019). Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning (DeepHF). Nat Commun 10:4284.
  • Kim HK, et al. (2018). Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity (DeepCpf1). Nat Biotechnol 36(3):239-241.
  • Bae S, Kweon J, Kim HS, Kim JS (2014). Microhomology-based choice of Cas9 nuclease target sites. Nat Methods 11(7):705-706.
  • Shen MW, et al. (2018). Predictable and precise template-free CRISPR editing of pathogenic variants (inDelphi). Nature 563(7733):646-651.
  • Allen F, et al. (2019). Predicting the mutations generated by repair of Cas9-induced double-strand breaks (FORECasT). Nat Biotechnol 37(1):64-72.
  • Chen W, et al. (2019). Massively parallel profiling and predictive modeling of the outcomes of CRISPR-Cas9 double-strand break repair (Lindel). Nucleic Acids Res 47(15):7989-8003.
  • Shi J, et al. (2015). Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nat Biotechnol 33(6):661-667.
  • Smits AH, et al. (2019). Biological plasticity rescues target activity in CRISPR knock outs. Nat Methods 16(11):1087-1093.
  • Mou H, et al. (2017). CRISPR/Cas9-mediated genome editing induces exon skipping by alternative splicing or exon deletion. Genome Biol 18(1):108.
  • El-Brolosy MA, et al. (2019). Genetic compensation triggered by mutant mRNA degradation. Nature 568(7751):193-197.
  • Ran FA, et al. (2015). In vivo genome editing using Staphylococcus aureus Cas9. Nature 520(7546):186-191.
  • Nishimasu H, et al. (2018). Engineered CRISPR-Cas9 nuclease with expanded targeting space (SpCas9-NG). Science 361(6408):1259-1262.
  • Hu JH, et al. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity (xCas9). Nature 556(7699):57-63.
  • Walton RT, et al. (2020). Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants (SpRY). Science 368(6488):290-296.
  • Kleinstiver BP, et al. (2019). Engineered CRISPR-Cas12a variants with increased activities and improved targeting ranges (enAsCas12a). Nat Biotechnol 37(3):276-282.
  • Concordet JP, Haeussler M (2018). CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res 46(W1):W242-W245.

Related Skills

  • off-target-prediction - Check genome-wide specificity after on-target design (a separate axis from activity)
  • base-editing-design - DSB-free knockout via premature stop / splice disruption when indels are unwanted
  • prime-editing-design - Scarless small edits without a double-strand break
  • hdr-template-design - Design the donor when the goal is a precise knock-in, not a knockout
  • crispr-screens/library-design - Pool guides into a screening library (domain tiling, Rule Set 2 logic)
  • crispr-screens/crispresso-editing - Quantify indel/editing outcomes from amplicon sequencing
  • primer-design/primer-basics - Design validation/genotyping primers around the cut
  • primer-design/primer-specificity - Confirm genotyping primers are unique near paralogs/off-targets
  • genome-intervals/gtf-gff-handling - Get exon coordinates to restrict guide placement
Install via CLI
npx skills add https://github.com/GPTomics/bioSkills --skill bio-genome-engineering-grna-design
Repository Details
star Stars 912
call_split Forks 160
navigation Branch main
article Path SKILL.md
More from Creator