My long-term research interests lie at the interface between biology and computer/physical science, focusing on high-throughput sequencing as an approach for understanding the biodiversity of (historically neglected) microbial eukaryote taxa. Microbial eukaryotes (organisms 38μm-1mm, such as nematodes, fungi, protists, etc.) are abundant and ubiquitous across every ecosystem on earth, performing key functions such as nutrient cycling and sediment stability in marine habitats. High-throughput biodiversity research specifically aims to 1) improve our ability to understand, predict and mitigate human impacts on natural ecosystems (e.g. anthropogenic impacts stemming from energy production, such as oil spills), and 2) develop new analytical tools that bridge the disciplines of biology and computer/physical science.
High-throughput analysis of microbial eukaryotes is still an emerging field; we currently have a poor understanding of genomic evolution in eukaryotes (e.g. factors influencing intragenomic variation across nuclear rRNA gene arrays, which confounds our ability to link amplicon data to biological species), have very little genomic data available for most taxa, and continue to lack the cyberinfrastructure needed for effective interpretation of large sequence datasets.
I work closely with computer scientists and software engineers to inform the development of cutting-edge tools for the primary analysis of large sequence datasets (millions of reads), including: assessing how the clustering of raw sequence reads into Operational Taxonomic Units (OTUs) can differentially affect the biological interpretation of sequence data, phylogenetic pipelines as a robust tool for identifying divergent environmental lineages (interpreting short reads in an evolutionary context), comprehensive ecological analyses (e.g. linking different types of –omic data, OTU network analysis) and visualization as an innovative tool for enabling novel scientific discovery. Over the last few years, I have directly fostered such collaborative and interdisciplinary efforts to bridge biology with computational and mathematical disciplines. This focus recently culminated in a funding award for a Catalaysis Meeting at the National Evolutionary Synthesis Center (NESCent)—enabling an intellectually transformative meeting attended by a diverse group of organismal biologists, ecologists, computer scientists, and bioinformaticians.
Biodiversity research must adhere to a trifecta of biology, bioinformatics, and database resources; none of these foci can exist in isolation, and each area must serve to inform the others. Biological questions drive high-throughput studies, and so computational pipelines and cyberinfrastructure need to functionally inform our knowledge of ecosystem processes. Likewise, computational resources must be complementary, whereby bioinformatic outputs are effectively databased, and evolving database resources produce continuing refinements in analytical pipelines. By driving efforts to synthesize research projects across these three areas, my research interests aim to ensure steady progress in biodiversity research over the next decade.