|
Hi I'm Victoria! I'm a data scientist working on open data. My group focuses on understanding the effect of big data and computation on scientific inference. We are interested in: How effectively does statistical methodology translate to big data settings?
Instead of collecting data to test a particular hypothesis, researchers are now generating hypotheses by direct inspection of the data, then using the data to test those hypotheses. What counts as a significant finding in this case? Can we estimate how likely that finding is to be replicated in a new sample? What information is needed to verify and replicate data science findings?
When computation is used in research, it becomes part of the methods used to derive a result. How should these steps be made openly available to the community for inspection, verification, replication, and re-use? What tools and computational environments are needed for data science?
We have an opportunity to think about data science as a life cycle, from experimental design and databases through to the scientific findings, and design tools and environments that enable reliable scientific inference at scale. |