Python Package - Shedding Hub

The shedding-hub Python package provides tools for loading datasets, performing statistical analyses, and creating visualizations. Available on PyPI, it enables reproducible research and seamless integration with your analysis workflows.

Installation

Install the package using pip:

pip install shedding-hub

For the latest development version from GitHub:

pip install git+https://github.com/shedding-hub/shedding-hub.git

Quick Start

Load a Dataset

import shedding_hub as sh

# Load a specific dataset by identifier
data = sh.load_dataset('woelfel2020virological')

# Access dataset structure
print(data['title'])
print(data['analytes'].keys())
print(f"Number of participants: {len(data['participants'])}")

Calculate Summary Statistics

# Per-participant shedding summary
summary = sh.calc_shedding_summary(data, specimen='sputum')
print(summary[['participant_id', 'shedding_duration', 'peak_value', 'clearance_status']])

# Detection rate over time
detection = sh.calc_detection_summary(data, specimen='sputum', time_bin_size=7)
print(detection[['time', 'n_tested', 'proportion', 'ci_lower', 'ci_upper']])

# Basic dataset metadata
info = sh.calc_dataset_summary(data)
print(f"Biomarkers: {info['biomarkers']}")
print(f"Specimens: {info['specimens']}")

Create Visualizations

# Plot individual shedding trajectories
fig = sh.plot_time_course(data, specimen='sputum')

# Heatmap of shedding patterns
fig = sh.plot_shedding_heatmap(data, specimen='sputum', value='concentration')

# Aggregate trajectory with uncertainty bands
fig = sh.plot_mean_trajectory(data, specimen='sputum', value='concentration')

# Kaplan-Meier clearance curve
fig = sh.plot_clearance_curve(data, specimen='sputum')

Function Reference

Data Loading

load_dataset(dataset, ref=None, pr=None, local=None) - Load a dataset from GitHub or local directory

Statistics

calc_shedding_summary(dataset) - Per-participant summary (duration, peak, clearance)
calc_detection_summary(dataset) - Detection rate by time bin with Wilson CI
calc_clearance_summary(dataset) - Kaplan-Meier clearance statistics
calc_value_summary(dataset) - Value distribution by time bin
calc_dataset_summary(dataset) - Basic dataset metadata
compare_datasets(datasets) - Cross-study comparison table

Shedding Duration

calc_shedding_duration(dataset) - Calculate duration statistics
calc_shedding_durations(dataset_ids) - Batch calculate across studies
plot_shedding_duration(df) - Plot individual durations
plot_shedding_durations(df) - Compare durations across studies

Visualization

plot_time_course(dataset) - Individual shedding trajectories
plot_time_courses(datasets) - Compare trajectories across studies
plot_shedding_heatmap(dataset) - Heatmap of shedding patterns
plot_mean_trajectory(dataset) - Aggregate trajectory with uncertainty
plot_value_distribution_by_time(dataset) - Box/violin plots by time
plot_detection_probability(dataset) - Detection rate over time
plot_clearance_curve(dataset) - Kaplan-Meier survival curve

Shedding Peak

calc_shedding_peak(dataset) - Calculate peak statistics
calc_shedding_peaks(dataset_ids) - Batch calculate across studies
plot_shedding_peak(df) - Plot peak values
plot_shedding_peaks(df) - Compare peaks across studies

Common Parameters

Most functions accept these filtering parameters:

Parameter	Description	Example
`biomarker`	Filter by specific biomarker	`biomarker='SARS-CoV-2'`
`specimen`	Filter by specimen type	`specimen='sputum'`
`value`	Filter by value type	`value='concentration'` or `value='ct'`
`time_bin_size`	Size of time bins in days	`time_bin_size=7`
`time_range`	Limit time range	`time_range=(0, 30)`

Resources

PyPI

View package information, version history, and installation statistics.

Visit PyPI

GitHub

Source code, issue tracking, and contribution guidelines.

View on GitHub

Example Workflows

Compare Shedding Across Multiple Studies

import shedding_hub as sh

# Load multiple datasets
data1 = sh.load_dataset('woelfel2020virological')
data2 = sh.load_dataset('kim2020viral')
data3 = sh.load_dataset('young2020epidemiologic')

# Compare key statistics across studies
comparison = sh.compare_datasets(
    [data1, data2, data3],
    specimen='sputum',
    value='concentration'
)
print(comparison[['dataset_id', 'n_participants', 'median_shedding_duration',
                  'median_peak_value', 'pct_cleared']])

Analyze Clearance Patterns

import shedding_hub as sh

# Load dataset
data = sh.load_dataset('woelfel2020virological')

# Get Kaplan-Meier clearance statistics
clearance = sh.calc_clearance_summary(data, specimen='sputum')
print(f"Median clearance time: {clearance['median_clearance_time']} days")
print(f"Clearance rate: {clearance['clearance_rate']*100:.1f}%")

# Plot the survival curve
fig = sh.plot_clearance_curve(data, specimen='sputum', show_ci=True)

Visualize Population-Level Shedding

import shedding_hub as sh
import matplotlib.pyplot as plt

# Load dataset
data = sh.load_dataset('woelfel2020virological')

# Create a multi-panel figure
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Heatmap of individual trajectories
plt.sca(axes[0, 0])
sh.plot_shedding_heatmap(data, specimen='sputum', value='concentration')

# Mean trajectory with 95% CI
plt.sca(axes[0, 1])
sh.plot_mean_trajectory(data, specimen='sputum', value='concentration')

# Detection probability over time
plt.sca(axes[1, 0])
sh.plot_detection_probability(data, specimen='sputum')

# Clearance curve
plt.sca(axes[1, 1])
sh.plot_clearance_curve(data, specimen='sputum')

plt.tight_layout()
plt.savefig('shedding_analysis.png', dpi=300)

Contributing

We welcome contributions to the shedding-hub package! Whether you're fixing bugs, adding new features, improving documentation, or suggesting enhancements, your help is appreciated.

Please visit our Contributing Guidelines to get started. For major changes, please open an issue first to discuss what you would like to change.

Report an Issue Submit a Pull Request