A Snakemake workflow for processing paired-end sequencing data from FASTQs to high-confidence variant calls. Handles quality control, UMI-based consensus BAMs, alignment, base recalibration, and variant calling for tumor and clonal hematopoiesis variants. Configurable via YAML, supports multiple callers, automated VCF post-processing, and reproducible conda environments. Can be run in HPC environments or locally. Most of my published work uses this framework.
A collection of modular scripts and pipelines for comprehensive genomic data analysis. Supports copy number variation, extrachromosomal DNA ecDNA characterization, HLA typing, variant processing, and custom visualizations. Includes batch-ready tools for variant filtering, IGV snapshot generation, and downstream curation. Designed for reproducibility and flexibility across diverse datasets, with Python and R implementations and integration with common genomic tools like CNVkit, GridSS, AmpliconArchitect, FreeBayes, Mutect2, VarDict, OptiType, LOHHLA, and DASH.
A Python simulation study investigating how dominant alleles affect hybrid fitness and speciation under parallel and divergent selection. Simulations explore the role of dominant alleles in adaptation, showing that they can either reduce or increase hybrid fitness depending on selection type. Results are based on Fisher’s geometric model of adaptation and highlight how allele dominance influences progress toward speciation.