name: data-science-for-intelligence description: Statistical analysis, ML, NLP, time series, and network analysis for European Parliament political data license: MIT
Data Science for Parliamentary Intelligence
Context
This skill applies when:
- Performing statistical analysis of MEP voting patterns using EP MCP Server data
- Applying NLP techniques to EU legislative texts, parliamentary questions, and committee reports
- Building network models of political group alliances and MEP collaboration patterns
- Conducting time series analysis of voting cohesion, attendance, and legislative productivity
- Using dimensionality reduction to map MEP ideological positions from roll-call votes
- Clustering MEPs by policy area specialization and voting behavior
- Forecasting legislative outcomes based on committee vote patterns and amendment analysis
- Detecting anomalies in MEP activity patterns (unusual voting, attendance drops)
All data science activities must comply with Hack23 ISMS policies for data integrity, GDPR-compliant processing, and reproducible analysis.
Rules
- Data Quality First: Validate all EP MCP Server data before analysis — check for missing values, encoding issues in multilingual text, and temporal gaps in voting records
- Appropriate Methods: Match analytical technique to data type — nominal (political group), ordinal (agreement scale), interval (vote margins), network (co-sponsorship)
- Reproducibility: Document all data transformations, model parameters, and analytical decisions — analysis must be reproducible from MCP Server queries
- Statistical Rigor: Report effect sizes, confidence intervals, and p-values — avoid p-hacking by pre-registering hypotheses before querying EP data
- GDPR-Aware Processing: When analyzing MEP personal data, apply data minimization — aggregate where possible, pseudonymize when individual tracking is not analytically necessary
- Bias Awareness: Account for selection bias in roll-call votes (only ~30% of EP votes are roll-call), survivorship bias (only re-elected MEPs span terms), and measurement bias (attendance ≠ engagement)
- Domain Validation: Validate statistical findings against political science domain knowledge — a statistically significant pattern must also be institutionally plausible
- Scalable Pipelines: Design analysis pipelines that can process data across legislative terms (EP5–EP10) and scale to 720+ MEPs per term
- Visualization Standards: Use clear, accessible visualizations — label axes, include legends, use colorblind-safe palettes, and annotate key political events on time series
- Ethical AI: If applying ML models to MEP behavior prediction, disclose model limitations, avoid deterministic predictions about individual MEPs, and never use models for discriminatory profiling
Examples
Ideological Scaling with PCA/MDS
# Map MEP positions from roll-call vote data (conceptual pipeline)
# Data source: EP MCP Server → get_voting_records (vote results)
1. Construct vote matrix: MEPs (rows) × roll-call votes (columns)
- Values: +1 (Yes), -1 (No), 0 (Abstain), NaN (Absent)
2. Handle missing data: impute or filter MEPs with <50% participation
3. Apply dimensionality reduction:
- PCA: First component ≈ left-right dimension
- Second component ≈ pro-/anti-EU integration dimension
4. Validate: Check that political groups cluster as expected
- LEFT/G-EFA on left, EPP/ECR/ID on right
- Pro-EU (S&D, RE, EPP) vs. eurosceptic (ECR, ID) on second axis
5. Visualize: 2D scatter plot with political group coloring
- Compare across legislative terms for ideological drift
NLP Analysis of Legislative Texts
Pipeline for analyzing EP legislative amendments:
Data source: EP MCP Server → search_documents, track_legislation
1. Text Extraction: Parse legislative amendment texts (24 languages)
2. Language Detection: Identify primary language, align translations
3. Topic Modeling (LDA/BERTopic):
- Discover policy themes across committee reports
- Track topic prevalence over time
4. Sentiment/Framing Analysis:
- Classify amendment tone (restrictive vs. permissive)
- Detect regulatory framing (precautionary vs. innovation-friendly)
5. Similarity Analysis:
- Identify duplicate/similar amendments across political groups
- Detect coordinated amendment strategies
Network Analysis of Political Collaboration
Build MEP collaboration networks from EP MCP Server data:
Nodes: MEPs (attributes: country, political group, committee)
Edges: Co-sponsorship, co-rapporteurship, shared amendments
Metrics to compute:
- Degree centrality: Most connected MEPs (cross-group bridges)
- Betweenness centrality: MEPs bridging political groups
- Community detection: Identify voting blocs beyond formal groups
- Assortativity: Do MEPs collaborate within or across national lines?
Data sources:
- get_meps: Node attributes (country, group, committee membership)
- get_voting_records: Co-sponsorship and amendment data
- track_legislation: Rapporteur and shadow rapporteur pairs
Time Series Analysis of Legislative Productivity
Track EP legislative output over time:
Metrics:
- Dossiers completed per plenary session
- Average time from Commission proposal to EP first reading
- Amendment volume per committee per session
- Voting cohesion index per political group over time
Techniques:
- Seasonal decomposition (plenary session calendar effects)
- Change point detection (new legislative term, leadership change)
- Granger causality (do committee votes predict plenary outcomes?)
Data: EP MCP Server → track_legislation, get_voting_records
Anti-Patterns
- Black Box Models: Do NOT apply complex ML models without interpretability — political scientists and policymakers need to understand why a model predicts a particular legislative outcome
- Ignoring Data Provenance: Do NOT skip data validation — EP API data may contain missing votes, encoding errors in non-Latin scripts, or temporal discontinuities between API versions
- Overfitting to Single Term: Do NOT train models on a single EP term and generalize — political dynamics shift significantly between terms due to elections, enlargement, and treaty changes
- Ignoring Institutional Rules: Do NOT apply generic clustering without accounting for EP's formal structure — political groups and committees create non-random patterns that are institutional, not behavioral
- False Causation: Do NOT infer causal relationships from correlational voting data — MEPs voting similarly may reflect group discipline, not genuine agreement
- Neglecting Uncertainty: Do NOT present point estimates without uncertainty quantification — always include confidence intervals, especially for small national delegations
- Privacy Violations: Do NOT build individual MEP behavioral profiles that combine parliamentary data with personal data without GDPR legal basis and data protection impact assessment