Python Package

Access and analyze Shedding Hub data programmatically

The shedding-hub Python package provides tools for loading datasets, performing statistical analyses, and creating visualizations. Available on PyPI, it enables reproducible research and seamless integration with your analysis workflows.

Installation

Install the package using pip:

pip install shedding-hub

For the latest development version from GitHub:

pip install git+https://github.com/shedding-hub/shedding-hub.git

Quick Start

Load a Dataset

import shedding_hub as sh

# Load a specific dataset by identifier
data = sh.load_dataset('woelfel2020virological')

# Access dataset structure
print(data['title'])
print(data['analytes'].keys())
print(f"Number of participants: {len(data['participants'])}")

Calculate Summary Statistics

# Per-participant shedding summary
summary = sh.calc_shedding_summary(data, specimen='sputum')
print(summary[['participant_id', 'shedding_duration', 'peak_value', 'clearance_status']])

# Detection rate over time
detection = sh.calc_detection_summary(data, specimen='sputum', time_bin_size=7)
print(detection[['time', 'n_tested', 'proportion', 'ci_lower', 'ci_upper']])

# Basic dataset metadata
info = sh.calc_dataset_summary(data)
print(f"Biomarkers: {info['biomarkers']}")
print(f"Specimens: {info['specimens']}")

Create Visualizations

# Plot individual shedding trajectories
fig = sh.plot_time_course(data, specimen='sputum')

# Heatmap of shedding patterns
fig = sh.plot_shedding_heatmap(data, specimen='sputum', value='concentration')

# Aggregate trajectory with uncertainty bands
fig = sh.plot_mean_trajectory(data, specimen='sputum', value='concentration')

# Kaplan-Meier clearance curve
fig = sh.plot_clearance_curve(data, specimen='sputum')

Function Reference

Data Loading

  • load_dataset(dataset, ref=None, pr=None, local=None) - Load a dataset from GitHub or local directory

Statistics

  • calc_shedding_summary(dataset) - Per-participant summary (duration, peak, clearance)
  • calc_detection_summary(dataset) - Detection rate by time bin with Wilson CI
  • calc_clearance_summary(dataset) - Kaplan-Meier clearance statistics
  • calc_value_summary(dataset) - Value distribution by time bin
  • calc_dataset_summary(dataset) - Basic dataset metadata
  • compare_datasets(datasets) - Cross-study comparison table

Shedding Duration

  • calc_shedding_duration(dataset) - Calculate duration statistics
  • calc_shedding_durations(dataset_ids) - Batch calculate across studies
  • plot_shedding_duration(df) - Plot individual durations
  • plot_shedding_durations(df) - Compare durations across studies

Visualization

  • plot_time_course(dataset) - Individual shedding trajectories
  • plot_time_courses(datasets) - Compare trajectories across studies
  • plot_shedding_heatmap(dataset) - Heatmap of shedding patterns
  • plot_mean_trajectory(dataset) - Aggregate trajectory with uncertainty
  • plot_value_distribution_by_time(dataset) - Box/violin plots by time
  • plot_detection_probability(dataset) - Detection rate over time
  • plot_clearance_curve(dataset) - Kaplan-Meier survival curve

Shedding Peak

  • calc_shedding_peak(dataset) - Calculate peak statistics
  • calc_shedding_peaks(dataset_ids) - Batch calculate across studies
  • plot_shedding_peak(df) - Plot peak values
  • plot_shedding_peaks(df) - Compare peaks across studies

Common Parameters

Most functions accept these filtering parameters:

Parameter Description Example
biomarker Filter by specific biomarker biomarker='SARS-CoV-2'
specimen Filter by specimen type specimen='sputum'
value Filter by value type value='concentration' or value='ct'
time_bin_size Size of time bins in days time_bin_size=7
time_range Limit time range time_range=(0, 30)

Resources

PyPI

View package information, version history, and installation statistics.

Visit PyPI

GitHub

Source code, issue tracking, and contribution guidelines.

View on GitHub

Example Workflows

Compare Shedding Across Multiple Studies

import shedding_hub as sh

# Load multiple datasets
data1 = sh.load_dataset('woelfel2020virological')
data2 = sh.load_dataset('kim2020viral')
data3 = sh.load_dataset('young2020epidemiologic')

# Compare key statistics across studies
comparison = sh.compare_datasets(
    [data1, data2, data3],
    specimen='sputum',
    value='concentration'
)
print(comparison[['dataset_id', 'n_participants', 'median_shedding_duration',
                  'median_peak_value', 'pct_cleared']])

Analyze Clearance Patterns

import shedding_hub as sh

# Load dataset
data = sh.load_dataset('woelfel2020virological')

# Get Kaplan-Meier clearance statistics
clearance = sh.calc_clearance_summary(data, specimen='sputum')
print(f"Median clearance time: {clearance['median_clearance_time']} days")
print(f"Clearance rate: {clearance['clearance_rate']*100:.1f}%")

# Plot the survival curve
fig = sh.plot_clearance_curve(data, specimen='sputum', show_ci=True)

Visualize Population-Level Shedding

import shedding_hub as sh
import matplotlib.pyplot as plt

# Load dataset
data = sh.load_dataset('woelfel2020virological')

# Create a multi-panel figure
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Heatmap of individual trajectories
plt.sca(axes[0, 0])
sh.plot_shedding_heatmap(data, specimen='sputum', value='concentration')

# Mean trajectory with 95% CI
plt.sca(axes[0, 1])
sh.plot_mean_trajectory(data, specimen='sputum', value='concentration')

# Detection probability over time
plt.sca(axes[1, 0])
sh.plot_detection_probability(data, specimen='sputum')

# Clearance curve
plt.sca(axes[1, 1])
sh.plot_clearance_curve(data, specimen='sputum')

plt.tight_layout()
plt.savefig('shedding_analysis.png', dpi=300)

Contributing

We welcome contributions to the shedding-hub package! Whether you're fixing bugs, adding new features, improving documentation, or suggesting enhancements, your help is appreciated.

Please visit our Contributing Guidelines to get started. For major changes, please open an issue first to discuss what you would like to change.