Data Visualization for High-Throughput Sequence Data
This project aims to produce a new web-based scientific visualization framework for the analysis of high-throughput biological sequence data (initially focusing on rRNA amplicon data, but eventually extending to general biological sample by observation contingency tables). This work is a collaboration between UC Davis and Pitch Interactive (a data visualization studio based in Berkeley, CA). We aim to produce intuitive, interactive visualization tools that can be used to explore and analyze microbial community patterns in large environmental datasets (Illumina/454). This project is taking advantage of standard file formats from computational pipelines (e.g. BIOM tables) in order to bridge the gap between biological software (e.g. QIIME) and existing data visualization capabilities (harnessing the flexibility and scalability of tools such as WebGL and HTML5). As project PI, I am responsible for defining biological questions that drive visualization needs, working closely with Pitch Interactive on data parsing frameworks and user interface design, and liaising with prospective end-users in the biological research community for feedback and feature requests. The funded grant proposal for this project has been posted to Figshare.
Environmental Sequencing of Arctic Meiofauna
This project is applying high-throughput DNA sequencing techniques in the study of meiofaunal communities in the Arctic. The work is funded by the North Pacific Research Board, and represents a collaboration between UC Davis and the University of Alaska, Fairbanks (led by PI Sarah Hardy and Co-PI Arny Blanchard). Marker gene (18S rRNA) and metagenomic sequencing will be carried out alongside standard microscopic approaches and morphological taxonomy. Data will be used to assess meiofaunal community structure and diversity in the US Arctic (benthic habitats in the Beaufort and Chukchi Seas), and identify possible environmental drivers of community structure. As Co-PI, I will be working closely with graduate students at the University of Alaska to generate Illumina datasets and carry out bioinformatic analyses.
NSF Research Coordination Network EukHiTS
RCN EukHiTS (Eukaryotic biodiversity research using High-Throughput Sequencing) is a collaborative project between the University of New Hampshire and the University of California, Davis, funded by a Research Coordination Network award from the National Science Foundation.
Microscopic eukaryote species (organisms <1mm, such as nematodes, fungi, protists, etc.) are abundant and ubiquitous, yet invisible to the naked eye, in every ecosystem on earth. The biodiversity and geographic distributions for most of these species are largely unknown, and represent one of the major knowledge gaps in biology. High-throughput DNA sequencing technologies now allow for deep examination of virtually all microscopic organisms present in an environmental sample. For microbial eukaryote taxa, en masse biodiversity assessment using traditional loci (rRNA genes) can be conducted at a fraction of the time and cost required for traditional (morphological) approaches. Despite this promise, current bottlenecks include the lack of useful distributed tools for analysis and common data standards to allow global comparisons across individual studies as well as missing links between molecules and morphology. RCN EukHiTS is focusing on developing community capabilities for computational approaches focused on eukaryotic taxa and the infrastructure, both cyber and human, needed for effective interpretation of large high-throughput datasets. The steering committee of RCN EukHiTS includes expertise from computational biology, functional genomics, computer science, taxonomy, ecology, database resource management, and representatives of end user communities to ensure that all aspects of the community are well-represented.
This NSF RCN builds on two previous community meetings organized my myself and Kelly Thomas: a 2014 SMBE Satellite Meeting on Eukaryotic -Omics and a 2011 NESCent Catalysis Meeting on “High-Throughput Biodiversity Assessment using Eukaryotic Metagenetics”.
PhyloSift is a software pipeline for phylogenetic analysis of genomes and metagenomes. Using any biological sequence as input data (nucleotide or amino acid), PhyloSift uses a reference database of profile HMMs to identify candidate sequences matching phylogenetically-informative marker genes. Candidate sequences identified from input data are then subjected to phylogenetic placement approaches, where short reads are inserted into reference phylogenies and given taxonomic assignments based on this tree placement. I have been actively involved with the development and documentation of PhyloSift. Software download and extensive documentation are available on the main PhyloSift website, and the code is maintained as an open source repository on Github.
microBEnet is the online destination for resources related to the microbiology of the built environment. Funded by a grant from the Alfred P. Sloan Foundation to Jonathan Eisen at the University of California, Davis, this project is a collaboration between the Eisen Lab and Hal Levin at the Building Ecology Research Group. microBEnet focuses on three main categories of tasks, including 1) organizing meetings and workshops, 2) leveraging social media to facilitate communication and collaboration, 3) curation and creation of online resources to facilitate work in the Built Environment and to build a culture of openness and sharing. I am involved with curating online resources and social media tools for microBEnet, as well as providing bioinformatics oversight and design advice for microBEnet’s citizen science and undergraduate research projects.