name: create-investigation description: Create a new Ethereum data investigation page in the notebooks repo. Use when the user wants to add a new investigation, analysis, or research page that queries Xatu/ClickHouse data and visualizes findings with charts.
Create Investigation
Investigations are one-off analyses of something in Ethereum.
Golden Rules:
- They must use a fixed time range for the data so that the analysis is reproducible.
- Agents must actually check the data - do not make up conclusions. It's better to not write a conclusion than to write a conclusion that is not supported by the data.
- Ask the user for clarifying questions. Be thorough.
Requirements to Gather
- Title: Investigation title (e.g., "Head Vote Accuracy by Entity Type")
- Slug: URL-safe identifier (e.g., "head-accuracy-by-entity")
- Description: Brief description for SEO and sidebar hover
- Author: Username from supported list (samcm, parithosh, pk910, savid, skylenet, mattevans, qu0b, barnabasbusa, ethpandaops)
- Tags: Relevant tags for categorization
- Network: mainnet, sepolia, hoodi, etc.
- Time Range: Fixed start and end dates (investigations MUST use fixed time ranges for reproducibility)
- Research Question: The specific question being investigated
File Structure
Create the investigation at pages/YYYY-MM/{slug}/index.md where YYYY-MM is the current year-month.
If the year-month folder doesn't exist, create pages/YYYY-MM/index.md:
---
title: YYYY Mon
sidebar_position: 1
---
Investigations from Month Year.
Page Template
---
title: {Title}
sidebar_position: {N}
description: {Brief description}
date: {YYYY-MM-DDTHH:MM:SSZ}
author: {username}
tags:
- tag1
- tag2
---
<script>
import PageMeta from '$lib/PageMeta.svelte';
import Section from '$lib/Section.svelte';
import SqlSource from '$lib/SqlSource.svelte';
</script>
<PageMeta
date="{YYYY-MM-DD}"
author="{username}"
tags={["tag1", "tag2"]}
description="{Brief description}"
networks={["Ethereum Mainnet"]}
startTime="{YYYY-MM-DD}T00:00:00Z"
endTime="{YYYY-MM-DD}T23:59:59Z"
/>
```sql query_name
SELECT ... FROM xatu_cbt.table_name
Question
{The specific research question being investigated}
Background
{Context and explanation of concepts. Use bold for key terms being defined.}
Investigation
When {Action}
{Explanation of what this analysis shows}
<LineChart data={query_name} x="x_column" y={["Series 1", "Series 2", "Series 3"]} sort=false title="Chart Title" yFmt="num2" chartAreaHeight=400 yMax=100 colorPalette={['#2563eb', '#ea580c', '#16a34a']} echartsOptions={{ title: {left: 'center'}, grid: {bottom: 50, left: 70, top: 60, right: 120}, xAxis: {type: 'category', name: 'X Axis Label', nameLocation: 'center', nameGap: 35}, yAxis: {min: 0, max: 100}, legend: {show: true, right: 10, orient: 'vertical', top: 'center'}, series: [ {name: 'Series 1', lineStyle: {width: 3}}, {name: 'Series 2', lineStyle: {width: 2}}, {name: 'Series 3', lineStyle: {width: 2}} ], graphic: [{ type: 'text', left: 15, top: 'center', rotation: Math.PI / 2, style: { text: 'Y Axis Label', fontSize: 12, fill: '#666' } }] }} />
Takeaways
- Key finding 1
- Key finding 2
- Key finding 3
Critical Rules
SQL queries MUST be at top level - Not wrapped in HTML elements or Sections. Evidence's preprocessor won't process them otherwise.
Fixed time range required - PageMeta MUST have
startTimeandendTimeprops for reproducibility.Escape
<and>in prose - Use inline code backticks:`z < 0`notz < 0.Charts require three labels:
titleprop (centered viatitle: {left: 'center'})- X-axis label via
xAxis: {name: '...', nameLocation: 'center', nameGap: 35} - Y-axis label via
graphicelement (NOTyAxis.namewhich doesn't center properly)
Action-based section headers - Use "When Attesting" not "Attester Analysis".
No "Analysis" suffix in titles - "Analysis" is implied; use "RPC Snooper Overhead" not "RPC Snooper Overhead Analysis".
Don't repeat header - The title from frontmatter is already rendered by the layout.
Sort time series data ascending - SQL queries for charts MUST include
ORDER BY date_column ASCto ensure data is sorted chronologically. Charts will display incorrectly if data is not sorted.Use high-contrast colors - Multi-line charts MUST use
colorPalettewith high-contrast colors. Recommended palette:- Red:
#dc2626 - Blue:
#2563eb - Purple:
#9333ea - Green:
#16a34a - Orange:
#ea580cExample:colorPalette={['#2563eb', '#ea580c', '#16a34a']}
- Red:
Line styling for emphasis - Make the primary metric line thicker (width: 3), use dashed lines for secondary metrics like averages/means:
series: [ {name: 'Primary', lineStyle: {width: 3}}, {name: 'Secondary', lineStyle: {width: 2, type: 'dashed'}} ]Reference lines with markLine - For horizontal reference lines (e.g., "random chance"), use
markLineNOT a separate series (which corrupts the x-axis). Position label inside the chart:series: [{ markLine: { silent: true, symbol: 'none', label: {show: true, position: 'insideEndTop', formatter: 'Label text'}, lineStyle: {type: 'dashed', color: '#888'}, data: [{yAxis: 0.56}] } }]Human-readable SQL column names - Use column aliases that will appear nicely in chart legends:
SELECT hour, round(avg(our_time)) as "Our Node", round(avg(median_time)) as "Network Median"Don't describe chart features that don't exist - Never claim "tight IQR band", "green for negative values", or specific colors in prose unless the chart actually shows them. Verify visually before writing conclusions.
Per-block comparisons for timing analysis - When comparing timing between nodes, calculate metrics per-block first, then aggregate. Don't compare raw averages across all data which can be misleading.
SQL Source Files
For reusable queries, create sources/xatu_cbt/{query_name}.sql:
SELECT
date_column,
column1,
column2
FROM xatu_cbt.table_name
WHERE slot_start_date_time >= '2025-12-21'
AND slot_start_date_time < '2026-01-21'
ORDER BY date_column ASC
Then reference in the page:
SELECT * FROM xatu_cbt.{query_name}
Available Chart Types
LineChart, BarChart, AreaChart, ScatterPlot, DataTable, BigValue, Value