name: financial-analytics description: Analyze tuition payments and revenue, outstanding balances and payment status, scholarship awards and distribution, payment methods and transaction patterns, financial aid and coverage rates, net tuition after scholarships, and semester-wise financial breakdowns. Use when user asks about tuition, payments, scholarships, financial aid, or student finances.
Quick Start Guide
Before writing ANY SQL query:
Read metadata files (MANDATORY):
data/metadata/financial_summary_by_student.yamldata/metadata/financial_summary_by_student_sample_data.csv
Identify query pattern from user's question:
- "How many..." → Pattern 1 (Simple Counting)
- "...for each [dimension]" → Pattern 2 (Aggregation by Dimension)
- "Which majors..." → Pattern 3 (Ranking Dimensions)
- "Top N students..." → Pattern 4 (Top N)
- "Breakdown by..." → Pattern 5 (Distribution)
Apply Default Assumptions - Filter for active students unless explicitly stated otherwise
Check Standard Metric Definitions for ambiguous terms like "financial support"
Apply pattern guidelines (see Query Pattern Guidelines section)
Verify with Result Validation Checklist before executing
Default Assumptions (READ THIS FIRST)
⚠️ CRITICAL: These defaults apply to ALL queries unless user explicitly states otherwise:
Student Population Rule
Default: ALWAYS filter for Active students
Apply this filter to ALL query patterns (1-5) by default:
- Pattern 1 (Counting): "How many students have X?" → Filter Active students
- Pattern 2 (Aggregation): "Average Y for each major" → Filter Active students
- Pattern 3 (Ranking): "Which majors receive most X?" → Filter Active students
- Pattern 4 (Top N): Consider context - usually Active students
- Pattern 5 (Breakdown): "Breakdown by payment method" → Filter Active students
Only skip this filter when user explicitly says:
- "all students including graduated"
- "all students regardless of status"
- "historical data for all students"
Present tense verbs indicate current/active students:
- "How many students have outstanding balances?" → Active students
- "Which majors receive most support?" → Active students
- "Show me students with high GPA" → Active students
General Rules:
- CRITICAL: When generating SQL query, you should focus on answering user's question, rather than over-analyzing things you are not asked for.
- CRITICAL: Use Query Pattern Guidelines (1-5) for consistent approach
- CRITICAL: For ambiguous metrics like "financial support", include multiple related metrics (total, average, coverage)
- DO NOT sort by id columns
- DO NOT select unrequested columns
- Use ROUND(AVG(), 1) for averages operation
- Use ROUND(SUM(), 2) for dollar amounts when appropriate
Financial Analytics
Primary Table
financial_summary_by_student
- Metadata:
data/metadata/financial_summary_by_student.yaml - Sample data:
data/metadata/financial_summary_by_student_sample_data.csv
CRITICAL: You MUST read BOTH files before writing ANY SQL query:
- Read
data/metadata/financial_summary_by_student.yamlfor complete schema - Read
data/metadata/financial_summary_by_student_sample_data.csvfor actual data examples
Table Structure
One Record Per Student:
- Each row represents ONE student's complete financial summary
- All metrics are pre-aggregated at the student level
- No need to use
COUNT(DISTINCT student_id)- simpleCOUNT(*)counts students - Student count = Row count (when properly filtered)
Key Column Prefixes:
student_*- Student information (id, name, major, GPA, status, enrollment date)student_department_*- Department for student's majortotal_*- Pre-aggregated totals (tuition due, paid, scholarships)*_scholarships_total- Breakdown by scholarship typelast_payment_*- Most recent payment details- Boolean flags:
has_outstanding_balance,is_scholarship_recipient
⚠️ IMPORTANT - Columns That DON'T EXIST:
- ❌
credit_card_payments_total,bank_transfer_total,check_payments_total,financial_aid_total - ❌
total_payments_count,completed_payments_count,pending_payments_count,failed_payments_count - ❌
payment_success_rate_pct,avg_payment_amount - ❌
fall_2024_tuition,fall_2024_scholarships,spring_2024_tuition, etc. - ❌
is_payment_plan_active,is_financial_aid_recipient - ✅ ONLY
last_payment_methodexists (categorical: 'Credit Card', 'Bank Transfer', 'Check', 'Financial Aid')
Natural Language to SQL Mappings
⚠️ CRITICAL: "Current Students" vs "All Students"
When user asks about current/active students, ALWAYS filter:
WHERE student_status = 'Active'
Triggers:
- "currently enrolled students"
- "active students"
- "students with outstanding balances" (implies current)
- "how many students have..." (present tense = current)
- "which students..." (without past context = current)
Exception: Only include ALL statuses when explicitly asked:
- "all students including graduated"
- "historical data"
- "students who have ever..."
⚠️ CRITICAL: Standard Metric Definitions
When user asks about "financial support" or "scholarships", use these standard metrics:
| Metric Name | SQL Expression | When to Use |
|---|---|---|
| Total Financial Support | SUM(total_scholarships_received) |
"Which majors receive the most financial support" (total dollars across all students) |
| Average Support Per Student | AVG(total_scholarships_received) |
"What's the average financial support per student" |
| Scholarship Coverage Rate | AVG(scholarship_coverage_rate_pct) |
"What percentage of tuition is covered by scholarships" |
Default Behavior for Ambiguous Queries:
- "financial support" without qualifier → Include ALL THREE metrics in the query
- "most financial support" → Order by total dollars (
SUM) but include average for context - Always include
COUNT(*) as student_countto show population size
⚠️ CRITICAL: Ambiguous Terms
| User Says | Ambiguity | Standard Interpretation | SQL Pattern |
|---|---|---|---|
| "most/highest financial support" | Could be total $ or average $ | Total dollars (SUM), but include average for context | SUM(total_scholarships_received) + AVG(total_scholarships_received) |
| "average scholarship coverage rate" | Already clear | Use existing calculated column | AVG(scholarship_coverage_rate_pct) |
| "payment method distribution" | ONLY last payment available | Count by last_payment_method | GROUP BY last_payment_method, COUNT(*) |
| "students with balances" | Could mean > 0 or > threshold | Use boolean flag unless threshold specified | has_outstanding_balance = true OR outstanding_balance > X |
| "breakdown by [dimension]" | Need counts and percentages | Group and count with percentage | GROUP BY [dimension], COUNT(*), percentage calculation |
Student Status Values
| Status | Meaning | Include in "Active" queries? |
|---|---|---|
'Active' |
Currently enrolled | ✅ Yes |
'Graduated' |
Completed program | ❌ No |
'On Leave' |
Temporarily not enrolled | ❌ No |
Scholarship Coverage Thresholds
| Range | Interpretation | Query Pattern |
|---|---|---|
| >= 75% | Most tuition covered | WHERE scholarship_coverage_rate_pct >= 75 |
| 50-74% | Significant coverage | WHERE scholarship_coverage_rate_pct BETWEEN 50 AND 74.99 |
| 25-49% | Partial coverage | WHERE scholarship_coverage_rate_pct BETWEEN 25 AND 49.99 |
| < 25% | Minimal coverage | WHERE scholarship_coverage_rate_pct < 25 |
| 0% | No scholarships | WHERE is_scholarship_recipient = false |
| > 100% | Scholarships exceed payments | WHERE scholarship_coverage_rate_pct > 100 |
Outstanding Balance Patterns
| User Asks About | SQL Pattern |
|---|---|
| "students with outstanding balances" | WHERE has_outstanding_balance = true |
| "balances over $X" | WHERE outstanding_balance > X |
| "balances greater than $X" | WHERE outstanding_balance > X |
| "students who paid in full" | WHERE has_outstanding_balance = false |
| "highest balances" | ORDER BY outstanding_balance DESC |
Aggregation Patterns
| User Asks About | SQL Pattern |
|---|---|
| "How many students..." | SELECT COUNT(*) ... (one row per student) |
| "Average scholarship by major" | SELECT student_major, ROUND(AVG(total_scholarships_received), 1) ... GROUP BY student_major |
| "Average coverage rate for each major" | SELECT student_major, ROUND(AVG(scholarship_coverage_rate_pct), 1) ... GROUP BY student_major |
| "Total revenue" | SELECT SUM(total_tuition_paid) ... |
| "Top students by..." | ORDER BY ... DESC LIMIT N |
| "Students by payment method" | SELECT last_payment_method, COUNT(*) ... GROUP BY last_payment_method |
| "Which majors receive most..." | See "Ambiguous Terms" section - could be SUM or AVG |
Query Pattern Guidelines
Identify the query pattern based on user's question structure:
Pattern 1: Simple Counting
Trigger: "How many students [condition]?" Approach:
- Use
COUNT(*)to count students - Include
WHERE student_status = 'Active'by default - Apply the condition filter
Pattern 2: Aggregation by Dimension
Trigger: "[metric] for each [dimension]" OR "[metric] by [dimension]" Approach:
- ALWAYS filter for active students first using the default student status rule (unless user explicitly says "all students")
- SELECT the dimension column
- Apply appropriate aggregation (AVG, SUM) with ROUND()
- Include
COUNT(*) as student_countfor context - GROUP BY the dimension
- ORDER BY the metric (DESC for "most/highest", ASC for "least/lowest")
Examples:
- "average scholarship coverage rate for each major" → Filter Active students + GROUP BY student_major
- "total revenue by department" → Filter Active students + GROUP BY department
- "payment distribution for each major" → Filter Active students + GROUP BY student_major
Pattern 3: Ranking Dimensions (Ambiguous Metrics)
Trigger: "Which [dimension] [verb] the most [metric]?" (e.g., "Which majors receive most financial support?") Approach:
- SELECT the dimension (e.g., student_major)
- Include
COUNT(*) as student_count - For ambiguous terms like "financial support", include ALL THREE metrics:
SUM(total_scholarships_received)as totalAVG(total_scholarships_received)as average per studentAVG(scholarship_coverage_rate_pct)as coverage rate
- GROUP BY dimension
- ORDER BY the primary interpretation (total for "most", average for "average")
- Filter active students by default
Key Point: This pattern addresses ambiguity by providing multiple metrics rather than guessing which one.
Pattern 4: Top N Individual Records
Trigger: "Top N students by [metric]" OR "Show me [N] students with [condition]" Approach:
- SELECT student identifiers (student_id, student_name)
- SELECT the ranking metric
- SELECT ONLY explicitly requested additional columns (e.g., "along with GPA" → include student_gpa)
- ORDER BY ranking metric DESC
- LIMIT N
- Consider whether to filter Active:
- Include filter for "top students with highest balance" (current context)
- Omit filter if user wants to see enrollment_status variety
Key Point: Minimize column selection - only include what user explicitly requested.
Pattern 5: Breakdown/Distribution
Trigger: "Breakdown by [dimension]" OR "Distribution of [dimension]" Approach:
- ALWAYS filter for active students first using the default student status rule (unless user explicitly says "all students")
- SELECT the dimension
- Include
COUNT(*) as student_count - Include percentage calculation:
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) as percentage - GROUP BY dimension
- ORDER BY student_count DESC
Examples:
- "breakdown of students by payment method" → Filter Active students + GROUP BY last_payment_method
- "distribution of majors" → Filter Active students + GROUP BY student_major
Column Selection Guidelines
CRITICAL: Only SELECT columns that are:
- Grouping dimensions (e.g., student_major when GROUP BY student_major)
- Explicitly requested by user (e.g., "along with their GPA" → include student_gpa)
- Calculated metrics (aggregations like COUNT, AVG, SUM)
- Context count (
COUNT(*) as student_countin GROUP BY queries)
DO NOT include:
- ❌ Unrequested additional context columns
- ❌ Extra scholarship breakdowns unless asked
- ❌ All possible metrics "just in case"
- ❌ Department info unless specifically asked
Examples of column selection:
- User: "average scholarship by major" → SELECT: student_major, AVG(total_scholarships_received), COUNT(*)
- User: "top 10 students by GPA" → SELECT: student_id, student_name, student_gpa (no other columns)
- User: "top 10 by GPA along with their major" → SELECT: student_id, student_name, student_gpa, student_major
- User: "which majors receive most support" → SELECT: student_major, SUM, AVG, coverage rate, COUNT (multiple metrics for ambiguous "support")
Payment Method Limitations
⚠️ CRITICAL: ONLY last_payment_method exists (categorical field)
| What User Wants | What's Available | SQL Pattern |
|---|---|---|
| "Count students by payment method" | ✅ Available | GROUP BY last_payment_method with COUNT(*) |
| "Average payment by method" | ✅ Available | GROUP BY last_payment_method with AVG(last_payment_amount) |
| "Total amount paid via credit card" | ❌ NOT Available | NO column exists for payment method totals |
| "Percentage paid via bank transfer" | ❌ NOT Available | Cannot calculate (no payment totals by method) |
When user asks about payment method totals:
- Explain limitation: "Only last payment method is available, not totals by method"
- Offer alternative: "I can show you student count and average payment amount by last payment method"
Data Quality Notes
- One row per student: Simple
COUNT(*)counts students (no DISTINCT needed) - All metrics pre-aggregated: Scholarships, payments already summed per student
- Boolean flags: Use
has_outstanding_balance,is_scholarship_recipientfor filtering - Coverage rate can exceed 100%: Some students receive scholarships > tuition paid
- No temporal data: No semester breakdowns, only aggregated totals
Semantic Keywords Reference
Use this table to identify query intent and select the correct pattern:
| User Keywords | Query Intent | SQL Pattern | Template Reference |
|---|---|---|---|
| "how many", "count" | Simple counting | SELECT COUNT(*) ... WHERE |
Pattern 1 |
| "for each", "by [dimension]" | Aggregation by dimension | SELECT [dim], AVG/SUM ... GROUP BY |
Pattern 2 |
| "which", "what [dimension]" | Ranking dimensions | SELECT [dim], metrics ... ORDER BY |
Pattern 3 |
| "most", "highest", "top" | Descending order | ORDER BY [metric] DESC |
Pattern 3 or 4 |
| "least", "lowest", "bottom" | Ascending order | ORDER BY [metric] ASC |
Pattern 3 or 4 |
| "top N", "first N", "show me N" | Limited results | LIMIT N |
Pattern 4 |
| "breakdown", "distribution" | Group with percentages | GROUP BY ... with percentage calc |
Pattern 5 |
| "average", "mean" | Average aggregation | AVG() |
Pattern 2 |
| "total", "sum" | Sum aggregation | SUM() |
Pattern 3 |
Result Validation Checklist
Before executing SQL, verify these items:
Filtering & Data Scope
- ⚠️ CRITICAL: Applied default student status filter - Used Active students filter UNLESS user explicitly requested "all students"
- Did NOT filter Active when user asks about "enrollment status" variety or distribution by status
- Used
is_scholarship_recipient = truewhen analyzing only scholarship recipients - Applied correct threshold for outstanding balances (e.g.,
> 3000not> 1000)
Query Pattern Compliance
- Matched query to correct Standard Query Pattern (1-5)
- Followed template structure exactly for chosen pattern
- Included
COUNT(*) as student_countin GROUP BY queries for context - Used
LIMIT Nfor "top N" queries
Column Selection
- Selected ONLY explicitly requested columns
- Did NOT add unrequested context columns or extra metrics
- Included grouping dimension in SELECT when using GROUP BY
- Used appropriate column for requested attribute (e.g.,
student_gpanotcumulative_gpa)
Ambiguous Terms Handling
- For "financial support" queries: included total, average, AND coverage rate metrics
- Ordered by primary metric (total dollars for "most")
- Used correct metric definition from Standard Metrics table
SQL Syntax
- Used ROUND(AVG(), 1) for averages
- Used ROUND(SUM(), 2) for dollar amounts when needed
- Did NOT sort by ID columns (student_id, department_id, etc.)
- Did NOT use
COUNT(DISTINCT student_id)(unnecessary - one row per student) - Used percentage calculation:
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1)for breakdowns
Column Existence
- Did NOT reference non-existent columns:
- Payment method totals (credit_card_payments_total, etc.)
- Payment counts (total_payments_count, etc.)
- Semester fields (fall_2024_tuition, etc.)
- Used ONLY
last_payment_methodfor payment method analysis (categorical only)
Edge Cases
- Handled NULL values appropriately
- Considered that scholarship_coverage_rate_pct can exceed 100%
- Used window function for percentages in breakdown queries
Common Mistakes to Avoid
Query Pattern Violations
❌ DON'T:
- Write ad-hoc queries without considering Pattern Guidelines (1-5)
- Provide only single metric for ambiguous terms like "financial support"
- Add unrequested columns
✅ DO:
- Identify which Pattern (1-5) best matches the user's question
- Follow the pattern approach guidelines consistently
- For ambiguous metrics, include multiple related metrics (total, average, coverage)
Column Selection Errors
❌ DON'T over-select columns:
- Including all scholarship breakdowns when user asks for "average scholarship"
- Adding context columns not requested (e.g., department when not asked)
- Selecting all possible metrics "just in case"
✅ DO:
- Select ONLY: grouping columns + requested metrics + student_count
- Add columns ONLY when explicitly requested by user
- For ambiguous terms: include multiple metrics of SAME concept (total, average, coverage for "financial support")
Non-Existent Columns
❌ DON'T reference columns that don't exist:
credit_card_payments_total,bank_transfer_total,check_payments_totalpayment_success_rate_pct,total_payments_count,avg_payment_amountfall_2024_tuition,spring_2024_tuition(no semester breakdowns)is_payment_plan_active,is_financial_aid_recipient
✅ DO:
- Use ONLY
last_payment_method(categorical) for payment method analysis - Use pre-aggregated totals:
total_tuition_paid,total_scholarships_received - Check metadata files before using any column
Filtering Mistakes
❌ DON'T:
- Forget
student_status = 'Active'when user asks about "current students" - Filter Active when user wants to see "enrollment status" variety
- Use wrong threshold (e.g., > 1000 when user said > 3000)
✅ DO:
- Always filter Active for present-tense queries ("how many students have...")
- Omit status filter when showing status as a dimension
- Use exact thresholds specified by user
Aggregation Mistakes
❌ DON'T:
- Use
COUNT(DISTINCT student_id)(unnecessary - one row per student) - Provide only SUM OR AVG for ambiguous "financial support" queries
- Sort by ID columns (student_id, department_id)
✅ DO:
- Use simple
COUNT(*)to count students - Include multiple metrics (total, average, coverage) for ambiguous terms
- Order by meaningful metrics, not IDs
Consistency Guidelines
These pattern guidelines were validated through consistency testing. Following them ensures:
- ✅ Consistent SQL approach across similar queries
- ✅ Predictable results for similar questions
- ✅ Reduced ambiguity in metric interpretation
- ✅ Standard handling of ambiguous terms
Reference: See consistency_analysis_Jan_27__10_13/ for detailed analysis that informed these guidelines.
Visualizations
- Pie Chart: Payment method distribution (last payment), student status distribution
- Bar Chart: Revenue by department, scholarship by major, outstanding balances by major
- Stacked Bar Chart: Scholarship types by major
- Histogram: Outstanding balance distribution, scholarship coverage rate distribution
- Scatter Plot: GPA vs scholarship amount, GPA vs coverage rate