name: qpcr-analysis description: > Analyze QuantStudio qPCR data in a Quarto notebook: load TSV output, expand 96→384-well layout, visualize plate layout and CT heatmap, compute ΔCT/ΔΔCT/RQ, run Z-score significance tests, and produce log₂ RQ bar plots per gene. argument-hint: "[data file path and brief description of the experiment]"
qPCR Analysis Skill
Use this skill to interactively build a qPCR analysis notebook. Follow the interactive-notebook skill for the general kernel-first workflow; this skill covers the qPCR-specific steps and rules.
qPCR notebooks must always have embed-resources: true in the Quarto header
so the HTML is fully self-contained.
⚠️ Critical statistics rule
Never use qPCR technical replicates as independent observations. Always average technical replicates first (per sample × target), then compute statistics on per-sample means. Violating this inflates degrees of freedom and produces false-positive p-values.
Step 0 — Gather information before writing any code
Ask the user for:
- Path to the QuantStudio export file
- The 96-well plate layout: which sample is in each well, and which PCR target (gene of interest + housekeeping gene) is in each row
- Any known-bad wells to exclude
- Which samples to use as the reference group for ΔΔCT, and why
- Experiment metadata: date, cell line, any treatment conditions
The 96→384 expansion pattern is almost always: one 96-well plate stamped into a 384-well plate as 2×2 blocks (each 96-well position → four 384-well positions). Confirm this assumption with the user.
Step 1 — Load data
QuantStudio exports are tab-separated. The header row starts with Well\t; there
is often a UTF-8 BOM on the first line. Empty wells sometimes get junk CT values —
only trust wells present in your layout annotation.
def load_quantstudio_tsv(path):
with open(path) as f:
lines = f.readlines()
header_row = next(
i for i, line in enumerate(lines)
if line.lstrip("").startswith("Well\t")
)
df = pd.read_csv(path, skiprows=header_row, sep="\t")
df = df[df["Well Position"].notna() &
df["Well Position"].str.match(r"^[A-P]\d+$")].copy()
df["CT_numeric"] = pd.to_numeric(df["CT"], errors="coerce")
df["row"] = df["Well Position"].str[0]
df["col"] = df["Well Position"].str[1:].astype(int)
return df.reset_index(drop=True)
Print: total wells loaded, % undetermined, CT range. Show the user before proceeding.
Step 2 — Define the 96→384 layout
Build two 16×24 numpy string matrices (sample_matrix, target_matrix).
Each 96-well row R, col C (0-indexed) → 384-well rows 2R, 2R+1 and cols 2C, 2C+1.
sample_matrix = np.full((16, 24), "", dtype=object)
target_matrix = np.full((16, 24), "", dtype=object)
BAD_WELLS_384 = set() # populate with (row_letter, col_int) tuples
for r96, cols in samples_96.items():
for c96, sname in cols.items():
tgt = target_96[r96]
for dr in range(2):
for dc in range(2):
rr, cc = 2*r96 + dr, 2*c96 + dc
if (chr(ord("A") + rr), cc + 1) in BAD_WELLS_384:
continue
sample_matrix[rr, cc] = sname
target_matrix[rr, cc] = tgt
Also define a sample_meta dict (sample_id → whatever metadata fields the
experiment has) and a group_palette dict (sample group → hex color) here —
they're needed by the download section and plots.
Annotate df with sample and target from the matrices, then build df_ann
(wells with non-empty annotation):
df["sample"] = df.apply(lambda w: sample_matrix[ord(w["row"])-ord("A"), w["col"]-1], axis=1)
df["target"] = df.apply(lambda w: target_matrix[ord(w["row"])-ord("A"), w["col"]-1], axis=1)
df_ann = df[(df["sample"] != "") & (df["target"] != "")].copy()
Step 3 — CT heatmap (diagnose before committing to layout)
Show the CT heatmap first so the user can verify the layout is correct and identify any unexpected empty wells or junk values.
Always use interpolation='nearest' — matplotlib's default bilinear
interpolation blurs discrete well grids.
cmap_obj = plt.colormaps["plasma"].copy()
cmap_obj.set_bad("lightgray")
im = ax.imshow(ct_matrix, cmap=cmap_obj, aspect="auto",
vmin=15, vmax=40, interpolation="nearest")
Overlay 96-well block outlines (white rectangles, 2×2 cells each).
Step 4 — Plate layout figure
After the user confirms the layout is correct, show the annotated plate layout:
color by target gene, label each cell with sample name. Use the same
interpolation='nearest' and block-outline pattern. Add dual axes showing
both 384-well (bottom/left) and 96-well (top/right) coordinates.
Save both figures as PDF to $SCRATCH/$USER/agent_plots/.
Step 5 — Raw CT download link
Build a tidy per-well TSV and embed it as a base64 download link. Always include:
experiment_date, experiment_name, well_384, well_96, sample,
target_gene, CT, plus any experiment-specific metadata columns (cell line,
treatment, sample group, sequence, etc.) drawn from sample_meta.
import base64
from IPython.display import display, HTML
def well384_to_96(row_384, col_384):
r96 = (ord(row_384) - ord("A")) // 2
c96 = (col_384 - 1) // 2
return chr(ord("A") + r96), c96 + 1
# build rows from df_ann + sample_meta ...
download_df = pd.DataFrame(rows)
tsv_bytes = download_df.to_csv(index=False, sep="\t").encode("utf-8")
b64 = base64.b64encode(tsv_bytes).decode("utf-8")
fname = f"{EXPERIMENT_DATE}_{EXPERIMENT_NAME}_raw_CT.tsv"
display(HTML(
f'<a href="data:text/tab-separated-values;base64,{b64}" download="{fname}">'
f'⬇ Download raw CT data ({len(download_df)} wells) — {fname}</a>'
))
Step 6 — ΔCT, ΔΔCT, RQ
Average technical replicates, apply housekeeping gene dropout filter, compute ΔCT. The housekeeping gene and dropout threshold (commonly CT > 28–30) should be confirmed with the user.
ct_avg = (df_ann.groupby(["sample", "target"])["CT_numeric"]
.mean().reset_index().rename(columns={"CT_numeric": "CT_mean"}))
ct_pivot = ct_avg.pivot(index="sample", columns="target", values="CT_mean")
HK_GENE = "GAPDH" # confirm with user
HK_THRESHOLD = 28 # confirm with user
NTC_PREFIXES = ("noTemp",) # confirm naming convention with user
ct_filt = ct_pivot[
(ct_pivot[HK_GENE] <= HK_THRESHOLD) &
(~ct_pivot.index.str.startswith(NTC_PREFIXES))
].copy()
ct_filt["dCT_GeneA"] = ct_filt["GeneA"] - ct_filt[HK_GENE]
Reference group for ΔΔCT: discuss with user — it depends on the experiment
design. Confirm which samples serve as controls and compute ΔΔCT as
dCT_sample − mean(dCT_reference), then RQ = 2^(−ΔΔCT).
Print dropped samples and the resulting ct_filt table.
Step 7 — Z-score significance test
Score each test sample's per-sample mean dCT against the reference distribution.
from scipy.stats import norm
from statsmodels.stats.multitest import multipletests
def run_zscore(test_samples, ref_samples, col_dct, ct_filt):
ref_vals = ct_filt.loc[ref_samples, col_dct].dropna()
ref_mean = ref_vals.mean()
ref_sd = ref_vals.std(ddof=1)
rows = []
for s in test_samples:
if s not in ct_filt.index or pd.isna(ct_filt.loc[s, col_dct]):
continue
z = (ct_filt.loc[s, col_dct] - ref_mean) / ref_sd
rows.append({"sample": s, "z": z, "pval": 2*(1 - norm.cdf(abs(z)))})
res = pd.DataFrame(rows).set_index("sample")
_, qvals, _, _ = multipletests(res["pval"], method="fdr_bh")
res["qval"] = qvals
return res, ref_mean, ref_sd
Run once per gene. BH FDR correction is within-gene.
Step 8 — log₂ RQ bar plots (one per gene)
Separate plot per gene to avoid label crowding. Test samples on top, reference
samples below, separated by a dotted horizontal line. Annotate each test sample
bar with p=X.XXX FDR=X.XXX (plus */**/*** if significant). Twin
y-axis shows raw RQ values.
def pval_label(p, q):
stars = "***" if q < 0.001 else "**" if q < 0.01 else "*" if q < 0.05 else ""
return f"p={p:.3f} FDR={q:.3f}{' ' + stars if stars else ''}"
Save each plot as PDF to $SCRATCH/$USER/agent_plots/.
Render
Render from inside the analysis/ directory (where _quarto.yml lives),
which directs output to ../docs/:
cd analysis && conda run -n py_general render_notebook render YYYYMMDD_name.qmd
(render_notebook is a quarto shim in ~/bin/ that records the render env; see compute-kernel skill.)