name: datafusion-docs
description: >
Search Apache DataFusion documentation, user guide, and API reference.
Returns relevant documentation for a question or keyword. Searches the
official DataFusion repository and website.
argument-hint:
allowed-tools: Bash
You are helping the user find relevant Apache DataFusion documentation.
Query: $@
Follow these steps in order.
Step 1 — Extract search terms
If the input is a natural language question (e.g. "how do I create an external table"), extract the key technical terms: nouns, function names, SQL keywords. Drop stop words.
If the input is already a function name or technical term (e.g. APPROX_PERCENTILE_CONT, CREATE EXTERNAL TABLE), use it as-is.
Use the extracted terms as SEARCH_QUERY in the next steps.
Step 2 — Search the DataFusion source documentation
The DataFusion user guide is in the GitHub repo under docs/. Search it using gh:
Important: Do NOT quote multi-word search terms as a single string. Pass each word
as a separate token so gh search code matches broadly. For example, use
EXTERNAL TABLE not "EXTERNAL TABLE".
gh search code $SEARCH_QUERY --repo apache/datafusion --language markdown --limit 10
If gh is not available, fall back to the GitHub API:
gh api "search/code?q=$SEARCH_QUERY+repo:apache/datafusion+extension:md&per_page=10" --jq '.items[:10][] | "\(.path)"'
Step 3 — Search for SQL function documentation
DataFusion's built-in functions are documented in docs/source/user-guide/sql/. Check specifically:
gh search code "$SEARCH_QUERY" --repo apache/datafusion --language markdown --limit 5 -- path:docs/source/user-guide/sql/
Also list the available SQL doc files so you can fetch the most relevant one directly:
gh api "repos/apache/datafusion/contents/docs/source/user-guide/sql" --jq '.[].name' 2>/dev/null
Step 4 — Search for code examples
If the query is about API usage or implementation patterns, search Rust source code:
gh search code "$SEARCH_QUERY" --repo apache/datafusion --language rust --limit 5
Step 5 — Fetch and present relevant content
For the most relevant results (top 2-3), fetch the actual content:
gh api "repos/apache/datafusion/contents/<path>" --jq '.content' | base64 -d
If the file is too large, fetch just the relevant section. Look for the search terms in the content and extract the surrounding context (heading + content under that heading).
Step 6 — Present findings
Organize the results by relevance:
- Most relevant: Direct documentation for the queried topic
- Examples: Code examples showing usage
- Related: Related documentation that might be helpful
For each result, provide:
- The document title or section heading
- A brief summary of what it covers
- The source URL (GitHub link to the file)
- Key code snippets if applicable
Step 7 — Suggest follow-ups
If the search didn't find exactly what the user needed:
You can also check the DataFusion user guide at https://datafusion.apache.org/user-guide/ or the API docs at https://docs.rs/datafusion/latest/datafusion/
If the query is about a specific SQL function:
Try running
datafusion-cli -c "SELECT * FROM information_schema.df_settings WHERE name LIKE '%<keyword>%';"to see related configuration options.
Quick reference — Common DataFusion topics
For faster lookups, here are paths to key documentation sections:
| Topic | Path in repo |
|---|---|
| SQL Reference | docs/source/user-guide/sql/ |
| Scalar Functions | docs/source/user-guide/sql/scalar_functions.md |
| Aggregate Functions | docs/source/user-guide/sql/aggregate_functions.md |
| Window Functions | docs/source/user-guide/sql/window_functions.md |
| CREATE EXTERNAL TABLE | docs/source/user-guide/sql/ddl.md |
| Data Types | docs/source/user-guide/sql/data_types.md |
| Configuration | docs/source/user-guide/configs.md |
| Python Bindings | docs/source/user-guide/python/ |
| Library Usage | docs/source/library-user-guide/ |