alterlab-geopandas

star 26

Reads, writes, and analyzes geospatial vector data with the GeoPandas Python library (shapefiles, GeoJSON, GeoPackage), with PostGIS support and integration with matplotlib, folium, and cartopy. Use for spatial analysis and geometric operations — buffer analysis, spatial joins and overlays between datasets, dissolving boundaries, clipping, calculating areas and distances, reprojecting coordinate systems, choropleth mapping, or converting between vector file formats. Part of the AlterLab Academic Skills suite.

AlterLab-IEU By AlterLab-IEU schedule Updated 6/6/2026

name: alterlab-geopandas description: Reads, writes, and analyzes geospatial vector data with the GeoPandas Python library (shapefiles, GeoJSON, GeoPackage), with PostGIS support and integration with matplotlib, folium, and cartopy. Use for spatial analysis and geometric operations — buffer analysis, spatial joins and overlays between datasets, dissolving boundaries, clipping, calculating areas and distances, reprojecting coordinate systems, choropleth mapping, or converting between vector file formats. This is for tabular vector data; for raster/satellite/DEM work, spectral indices (NDVI), or spatial ML on earth observation prefer the geomaster skill. Part of the AlterLab Academic Skills suite. license: MIT allowed-tools: Read Write Edit Bash(python:*) compatibility: No API key required. Runs locally via uv run python; requires the geopandas Python package. metadata: skill-author: AlterLab version: "1.0.0"


GeoPandas

GeoPandas extends pandas to enable spatial operations on geometric types. It combines the capabilities of pandas and shapely for geospatial data analysis.

Installation

uv pip install geopandas

Optional Dependencies

# For interactive maps
uv pip install folium

# For classification schemes in mapping
uv pip install mapclassify

# For faster I/O operations (2-4x speedup)
uv pip install pyarrow

# For PostGIS database support
uv pip install psycopg2
uv pip install geoalchemy2

# For basemaps
uv pip install contextily

# For cartographic projections
uv pip install cartopy

Quick Start

import geopandas as gpd

# Read spatial data
gdf = gpd.read_file("data.geojson")

# Basic exploration
print(gdf.head())
print(gdf.crs)
print(gdf.geometry.geom_type)

# Simple plot
gdf.plot()

# Reproject to different CRS
gdf_projected = gdf.to_crs("EPSG:3857")

# Calculate area (use projected CRS for accuracy)
gdf_projected['area'] = gdf_projected.geometry.area

# Save to file
gdf.to_file("output.gpkg")

Core Concepts

Data Structures

  • GeoSeries: Vector of geometries with spatial operations
  • GeoDataFrame: Tabular data structure with geometry column

See data-structures.md for details.

Reading and Writing Data

GeoPandas reads/writes multiple formats: Shapefile, GeoJSON, GeoPackage, PostGIS, Parquet.

# Read with filtering
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))

# Write with Arrow acceleration
gdf.to_file("output.gpkg", use_arrow=True)

See data-io.md for comprehensive I/O operations.

Coordinate Reference Systems

Always check and manage CRS for accurate spatial operations:

# Check CRS
print(gdf.crs)

# Reproject (transforms coordinates)
gdf_projected = gdf.to_crs("EPSG:3857")

# Set CRS (only when metadata missing)
gdf = gdf.set_crs("EPSG:4326")

See crs-management.md for CRS operations.

Common Operations

Geometric Operations

Buffer, simplify, centroid, convex hull, affine transformations:

# Buffer by 10 units
buffered = gdf.geometry.buffer(10)

# Simplify with tolerance
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)

# Get centroids
centroids = gdf.geometry.centroid

See geometric-operations.md for all operations.

Spatial Analysis

Spatial joins, overlay operations, dissolve:

# Spatial join (intersects)
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects')

# Nearest neighbor join
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)

# Overlay intersection
intersection = gpd.overlay(gdf1, gdf2, how='intersection')

# Dissolve by attribute
dissolved = gdf.dissolve(by='region', aggfunc='sum')

See spatial-analysis.md for analysis operations.

Visualization

Create static and interactive maps:

# Choropleth map
gdf.plot(column='population', cmap='YlOrRd', legend=True)

# Interactive map
gdf.explore(column='population', legend=True).save('map.html')

# Multi-layer map
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
gdf1.plot(ax=ax, color='blue')
gdf2.plot(ax=ax, color='red')

See visualization.md for mapping techniques.

Detailed Documentation

Common Workflows

Load, Transform, Analyze, Export

# 1. Load data
gdf = gpd.read_file("data.shp")

# 2. Check and transform CRS
print(gdf.crs)
gdf = gdf.to_crs("EPSG:3857")

# 3. Perform analysis
gdf['area'] = gdf.geometry.area
buffered = gdf.copy()
buffered['geometry'] = gdf.geometry.buffer(100)

# 4. Export results
gdf.to_file("results.gpkg", layer='original')
buffered.to_file("results.gpkg", layer='buffered')

Spatial Join and Aggregate

# Join points to polygons
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')

# Aggregate by polygon
aggregated = points_in_polygons.groupby('index_right').agg({
    'value': 'sum',
    'count': 'size'
})

# Merge back to polygons
result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)

Multi-Source Data Integration

# Read from different sources
roads = gpd.read_file("roads.shp")
buildings = gpd.read_file("buildings.geojson")
parcels = gpd.read_postgis("SELECT * FROM parcels", con=engine, geom_col='geom')

# Ensure matching CRS
buildings = buildings.to_crs(roads.crs)
parcels = parcels.to_crs(roads.crs)

# Perform spatial operations
buildings_near_roads = buildings[buildings.geometry.distance(roads.union_all()) < 50]

Performance Tips

  1. Use spatial indexing: GeoPandas creates spatial indexes automatically for most operations
  2. Filter during read: Use bbox, mask, or where parameters to load only needed data
  3. Use Arrow for I/O: Add use_arrow=True for 2-4x faster reading/writing
  4. Simplify geometries: Use .simplify() to reduce complexity when precision isn't critical
  5. Batch operations: Vectorized operations are much faster than iterating rows
  6. Use appropriate CRS: Projected CRS for area/distance, geographic for visualization

Best Practices

  1. Always check CRS before spatial operations
  2. Use projected CRS for area and distance calculations
  3. Match CRS before spatial joins or overlays
  4. Validate geometries with .is_valid before operations
  5. Use .copy() when modifying geometry columns to avoid side effects
  6. Preserve topology when simplifying for analysis
  7. Use GeoPackage format for modern workflows (better than Shapefile)
  8. Set max_distance in sjoin_nearest for better performance
Install via CLI
npx skills add https://github.com/AlterLab-IEU/AlterLab-Academic-Skills --skill alterlab-geopandas
Repository Details
star Stars 26
call_split Forks 4
navigation Branch main
article Path SKILL.md
More from Creator
AlterLab-IEU
AlterLab-IEU Explore all skills →