What processes and systems produce truth in computational research? My research seeks to answer this question by integrating understanding of statistical and computational inference at massive scale, computational infrastructures and platforms for scientific discovery, and effective social research ecosystems. Specific questions I'm interested in are:

  • When computation is used in research, it becomes part of the methods used to derive a result. What information is needed to verify and replicate data science and computational findings? How should these steps be made available to the community? How can datasets and software be repurposed to catalyze new discoveries?

  • What actions are needed for research communities to leverage data and computational tools, while maintaining (or improving) standards of reproducibility and interpretability of results, and other values such as inclusivity and equity?

  • What characteristics of tools and computational environments enable data science? We have an opportunity to think about data science as a life cycle -- from experimental design and databases through algorithms and methodology to the identification and dissemination of scientific findings -- and design tools and environments that enable reliable scientific investigation and inference, in other words enable the science aspect of data science in silico.