streamlit

name: streamlit description: "[Applies to: **/*.py] Enforce modern, performant, and maintainable Streamlit development practices, focusing on caching, modularity, and robust dependency management." source: "cursor_mdc"

Streamlit Best Practices

This guide outlines the definitive best practices for building robust, performant, and maintainable Streamlit applications. Adhere to these rules to ensure your apps are efficient, scalable, and easy to collaborate on.

1. Code Organization and Structure

Always separate UI logic from data processing and business logic. Modularize your application into distinct files or directories.

1.1 Project Structure

Organize your project for clarity and maintainability.

❌ BAD: Monolithic app.py

# app.py
import streamlit as st
import pandas as pd

def load_and_process_data(file_path):
    df = pd.read_csv(file_path)
    # ... complex processing ...
    return df

st.title("My Data App")
uploaded_file = st.file_uploader("Upload CSV")
if uploaded_file:
    data = load_and_process_data(uploaded_file)
    st.dataframe(data)
    # ... more UI and logic ...

✅ GOOD: Modularized Structure

my_streamlit_app/
├── app.py                  # Main entry point, orchestrates pages
├── pages/
│   └── dashboard_page.py   # Specific page UI and logic
├── services/
│   └── data_service.py     # Data loading and processing functions
├── components/
│   └── custom_widgets.py   # Reusable UI components
├── requirements.txt
└── .gitignore

app.py:

# app.py
import streamlit as st
from pages import dashboard_page # Assuming dashboard_page.py has a main() function

st.set_page_config(layout="wide")

# Example of multi-page app structure
PAGES = {
    "Dashboard": dashboard_page,
    # "Another Page": another_page_module,
}

with st.sidebar:
    st.title('Navigation')
    selection = st.radio("Go to", list(PAGES.keys()))

page = PAGES[selection]
page.main() # Each page module must have a main() function

pages/dashboard_page.py:

# pages/dashboard_page.py
import streamlit as st
from services.data_service import load_data, process_data

def main():
    st.title("Dashboard Overview")

    uploaded_file = st.file_uploader("Upload CSV", type=["csv"])
    if uploaded_file:
        raw_df = load_data(uploaded_file)
        st.write("Raw Data:")
        st.dataframe(raw_df.head())

        processed_df = process_data(raw_df)
        st.write("Processed Data:")
        st.dataframe(processed_df)

        # Further UI elements specific to the dashboard

services/data_service.py:

# services/data_service.py
import pandas as pd
import streamlit as st
from typing import Union

@st.cache_data
def load_data(file_path: Union[str, bytes]) -> pd.DataFrame:
    """Loads data from a file-like object or path."""
    return pd.read_csv(file_path)

@st.cache_data
def process_data(df: pd.DataFrame) -> pd.DataFrame:
    """Performs complex data processing."""
    # Example: clean, transform, aggregate
    processed_df = df.dropna().copy()
    processed_df['new_col'] = processed_df['value'] * 2 # Example transformation
    return processed_df

2. Performance Considerations

Streamlit reruns the entire script on every interaction. Efficient caching and state management are critical.

2.1 Caching Expensive Operations

Always cache functions that load data, perform heavy computations, or interact with external resources. Use st.cache_data for serializable data and st.cache_resource for unserializable objects like database connections or ML models.

❌ BAD: Uncached data loading

import streamlit as st
import pandas as pd

def get_large_dataset():
    # Simulates a slow data load
    return pd.read_csv("https://example.com/large_data.csv")

df = get_large_dataset() # Runs on every interaction
st.dataframe(df)
st.button("Rerun App")

✅ GOOD: Cached data loading

import streamlit as st
import pandas as pd

@st.cache_data
def get_large_dataset() -> pd.DataFrame:
    """Loads a large dataset from a URL."""
    # Simulates a slow data load
    return pd.read_csv("https://example.com/large_data.csv")

df = get_large_dataset() # Only runs once per unique URL/function code
st.dataframe(df)
st.button("Rerun App")

2.2 Session State for User-Specific Data

Use st.session_state to store and persist user-specific variables across reruns. This is essential for interactive elements where state must be maintained.

❌ BAD: Global variables or re-initializing state

import streamlit as st

if 'counter' not in st.session_state:
    counter = 0 # This will reset on every rerun!

st.session_state.counter = counter # Incorrect assignment

if st.button("Increment"):
    st.session_state.counter += 1 # This will fail if counter is not in session_state initially

st.write(f"Counter: {st.session_state.counter}")

✅ GOOD: Proper st.session_state initialization and usage

import streamlit as st

# Initialize state only if it doesn't exist
if 'counter' not in st.session_state:
    st.session_state.counter = 0

st.write(f"Counter: {st.session_state.counter}")

if st.button("Increment"):
    st.session_state.counter += 1

# Example for input widgets
name = st.text_input("Enter your name", key="user_name_input")
st.write(f"Hello, {st.session_state.user_name_input}!")

3. Common Patterns and Anti-patterns

3.1 Separate UI from Logic

Keep Streamlit UI calls (st.write, st.button, etc.) distinct from core business logic.

❌ BAD: Intermingled UI and logic

import streamlit as st

def calculate_and_display(a, b):
    result = a + b
    st.write(f"The sum is: {result}") # UI call inside logic

num1 = st.number_input("First number", value=1)
num2 = st.number_input("Second number", value=2)
calculate_and_display(num1, num2)

✅ GOOD: Clear separation

import streamlit as st

def calculate_sum(a: int, b: int) -> int:
    """Calculates the sum of two numbers."""
    return a + b

num1 = st.number_input("First number", value=1)
num2 = st.number_input("Second number", value=2)

sum_result = calculate_sum(num1, num2)
st.write(f"The sum is: {sum_result}") # UI call outside logic

3.2 Use Context Managers for Layout

Leverage st.sidebar, st.columns, st.expander, st.container, and st.tabs as context managers for cleaner layout code.

❌ BAD: Manual layout with repeated elements

import streamlit as st

col1, col2 = st.columns(2)
col1.write("Column 1 content")
col2.write("Column 2 content")

st.sidebar.header("Sidebar")
st.sidebar.write("Sidebar content")

✅ GOOD: Context managers for structured layout

import streamlit as st

with st.sidebar:
    st.header("App Controls")
    option = st.selectbox("Choose an option", ["A", "B", "C"])

col1, col2 = st.columns(2)
with col1:
    st.subheader("Data Input")
    st.write(f"Selected option: {option}")
    # ... more content ...

with col2:
    st.subheader("Visualization")
    st.line_chart([1, 2, 3])
    # ... more content ...

4. Common Pitfalls and Gotchas

4.1 Modifying Cached Objects

st.cache_data returns a copy of the cached object, preventing unintended mutations. st.cache_resource returns the original object, so be cautious with mutations.

❌ BAD: Modifying st.cache_resource return value without care

import streamlit as st

@st.cache_resource
def get_shared_list():
    return [1, 2, 3]

my_list = get_shared_list()
st.write(f"Original list: {my_list}")

if st.button("Append to list"):
    my_list.append(4) # This modifies the cached object for ALL users/sessions!
    st.write("Appended 4")

st.write(f"Current list: {my_list}")

✅ GOOD: Treat st.cache_resource objects as immutable or explicitly copy

import streamlit as st

@st.cache_resource
def get_shared_list():
    return [1, 2, 3]

# Option 1: Treat as immutable
st.write(f"Shared list: {get_shared_list()}")

# Option 2: Create a local copy if modification is needed
my_list_copy = list(get_shared_list())
st.write(f"Original copy: {my_list_copy}")

if st.button("Append to list"):
    my_list_copy.append(4) # Only modifies the copy for this session
    st.write("Appended 4")

st.write(f"Current copy: {my_list_copy}")

4.2 Widget Keys

Always provide unique key arguments for widgets that appear multiple times or whose state needs to be explicitly managed. This prevents unexpected behavior when widgets are dynamically rendered or reordered.

❌ BAD: Missing or duplicate keys

import streamlit as st

for i in range(3):
    st.checkbox(f"Option {i}") # Keys will be implicitly generated, might conflict

st.text_input("Enter value")
st.text_input("Enter another value") # These will implicitly get the same key if not specified

✅ GOOD: Unique and descriptive keys

import streamlit as st

for i in range(3):
    st.checkbox(f"Option {i}", key=f"checkbox_{i}")

st.text_input("Enter value", key="first_input")
st.text_input("Enter another value", key="second_input")

5. Type Hints

Always use type hints for function signatures and complex variables. This improves code readability, enables static analysis, and aids collaboration.

❌ BAD: Untyped function

def process_data(data):
    # What is 'data'? A list, DataFrame, dict?
    return len(data)

✅ GOOD: Type-hinted function

import pandas as pd
from typing import List, Dict, Union

def process_data(data: Union[pd.DataFrame, List[Dict]]) -> int:
    """Processes data and returns its length."""
    if isinstance(data, pd.DataFrame):
        return len(data)
    elif isinstance(data, list):
        return len(data)
    else:
        raise TypeError("Unsupported data type")

6. Packaging and Dependencies

Manage your dependencies rigorously using requirements.txt for Python packages and packages.txt for system-level dependencies.

6.1 Python Dependencies (`requirements.txt`)

Pin exact versions or use clear version constraints.

❌ BAD: Vague requirements.txt

pandas
streamlit
matplotlib

✅ GOOD: Specific requirements.txt

streamlit==1.52.1
pandas>=2.1.0,<3.0.0
matplotlib~=3.8.0

6.2 System Dependencies (`packages.txt`)

If your app requires non-Python libraries (e.g., for image processing, database connectors), list them in packages.txt for Streamlit Community Cloud.

packages.txt example:

build-essential
libgl1-mesa-glx

7. Testing Approaches

While Streamlit apps are primarily UI-driven, you must test your underlying logic.

7.1 Unit Testing Core Logic

Isolate and unit test all non-UI functions (e.g., data processing, calculations, API calls). Use standard Python testing frameworks like pytest.

test_data_service.py:

# tests/test_data_service.py
import pandas as pd
from services.data_service import process_data

def test_process_data_dropna():
    df = pd.DataFrame({'value': [1, 2, None, 4]})
    processed_df = process_data(df)
    assert len(processed_df) == 3
    assert 'new_col' in processed_df.columns
    assert processed_df['new_col'].iloc[0] == 2

def test_process_data_empty_df():
    df = pd.DataFrame({'value': []})
    processed_df = process_data(df)
    assert processed_df.empty

7.2 End-to-End Testing (Optional, for complex apps)

For complex applications, consider tools like Playwright or Selenium to simulate user interactions and verify UI behavior. However, prioritize robust unit tests for core logic first, as E2E testing adds significant overhead.