My
Google Scholar Page.
Publications
"On Emergent Limits to Knowledge—Or, How to Trust the Robot Researchers: A Pocket Guide,"
V. Stodden, Harvard Data Science Review, 6(1), 2023. https://doi.org/10.1162/99608f92.dcaa63bc
"Research Reproducibility,"
M. Parashar, M. Heroux, V. Stodden, Computer, vol. 55, 2022. https://doi.org/10.1109/MC.2022.3176988
"Learning from Reproducing Computational Results: Introducing Three Principles and the Reproduction Package,"
M. S. Krafczyk, A. Shi, A. Bhaskar, D. Marinov, and V. Stodden, Philosophical Transactions of the Royal Society A, Mathematical, Physical, and
Engineering Sciences. https://doi.org/10.1098/rsta.2020.0069 (Also available here)
Schweinsberg, M., Feldman, M., Staub, N., van den Akker, O. R., van Aert, R. C. M., van Assen, M. A. L. M., Liu, Y., Althoff, T., Heer, J., Kale, A., Mohamed, Z., Amireh, H., Venkatesh Prasad, V., Bernstein, A., Robinson, E., Snellman, K., Amy Sommer, S., Otner, S. M. G., Robinson, D., ... Luis Uhlmann, E. (2021). Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational Behavior and Human Decision Processes, 165, 228-249.
https://doi.org/10.1016/j.obhdp.2021.02.003.
"Domain-Specific
Fixes for Flaky Tests with Wrong Assumptions on Underdetermined Specifications,"
P. Zhang, Y. Jiang, A. Wei, V. Stodden, D. Marinov, A. Shi, ICSE 2021 Technical Track.
"Theme Editor's Introduction to Reproducibility and Replicability in Science,"
V. Stodden, Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.c46a02d4
"Highlights of the US National Academies Report on “Reproducibility and Replicability in Science”,"
H. Fineberg, V. Stodden, and X.L. Meng, Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.cb310198
"Trust but Verify: How to Leverage Policies, Workflows, and Infrastructure to Ensure Computational Reproducibility in Publication,"
C. Willis and V. Stodden, Harvard Data Science Review, 2(4). https://doi.org/10.1162/99608f92.25982dcf
"Understanding Reproducibility and Characteristics of Flaky Tests Through Test Reruns in Java Projects,"
W. Lam, S. Winter, A. Astorga, V. Stodden, D. Marinov, International Symposium on Software Reliability Engineering (ISSRE20).
"The Data Science Life Cycle: A Disciplined Approach to Advancing Data Science as a Science,"
V. Stodden, Communications of the ACM, 63(7), 2020. DOI:10.1145/3360646 Also available here.
"Beyond Open Data: A Model for Linking Digital Artifacts to Enable Reproducibility of Scientific Claims," V. Stodden,
Third International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'20),
June 2020. Also available here.
"Building a Vision for Reproducibility in the Cyberinfrastructure Ecosystem:
Leveraging Community Efforts," D. Chapp, V. Stodden, M. Taufer. Supercomputing Frontiers and Innovations, 7(1), 2020. DOI:10.14529/jsfi200106
"Toward Enabling Reproducibility for Data-Intensive Research Using the Whole Tale Platform,"
K. Chard, N. Gaffney, M. Hategan, K. Kowalik, B. Ludäscher, T. McPhillips, J. Nabrzyski, V. Stodden, I. Taylor, T. Thelen, M. J. Turk, C. Willis.
Advances in Parallel Computing, Volume 36: Parallel Computing: Technology Trends, DOI:10.3233/APC200107.
Also available here.
"Open Access journals need to become first choice, in invasion ecology and beyond," Jeschke,
Börner, Stodden, and Tockner. NeoBiota 52: 1-8, DOI:10.3897/neobiota.52.39542
"Open Access to Research Artifacts: Implementing the Next Generation Data Management Plan,"
Stodden, Ferrini, Gabanyi, Lehnert, Morton, and Berman.
Association for Information Science & Technology
2019 Annual Meeting, October 2019, Melbourne, Australia. Also available here.
For implementation see https://ezdmp.org.
"Application of BagIt-Serialized Research Object Bundles for Packaging and Re-execution of
Computational Analyses," K. Chard, N. Gaffney, M. B. Jones, K. Kowalik, B. Ludäscher, J. Nabrzyski,
V. Stodden, I. Taylor, T. Thelen, M. J. Turk, C. Willis. Workshop on Research Objects (RO 2019), 24 Sept 2019, San Diego, CA, USA.
2019 IEEE 15th International Conference on e-Science (e-Science), pp. 514–521.
Preprint: https://doi.org/10.5281/zenodo.3381754
Cite as: https://doi.org/10.1109/eScience.2019.00068 (in print)
"Ambitious Data Science Can Be Painless," Monajemi, Murri, Jonas, Liang, Stodden, and Donoho, Harvard Data Science Review,
June 2019. https://doi.org/10.1162/99608f92.02ffc552. Also available here.
"Implementing Computational Reproducibility in the Whole Tale Environment,"
Chard et al.,
Second International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'19),
June 2019. https://doi.org/10.1145/3322790.3330594. Also available here.
For code see https://github.com/whole-tale/ and implementation at https://wholetale.org/.
"Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context,"
Krafczyk, Shi, Bhaskar, Marinov, and Stodden,
Second International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'19),
June 2019. https://doi.org/10.1145/3322790.3330595. Also available here.
(For code see https://github.com/ReproducibilityInPublishing/j.jcp.2016.08.012 (Commit ba16911 at time of publication)
https://travis-ci.org/ReproducibilityInPublishing/j.jcp.2016.08.012,
and https://github.com/ReproducibilityInPublishing/10.1016_S0377-0427-03-00650-2
(Commit 227b842 at time of publication) https://travis-ci.org/ReproducibilityInPublishing/10.1016_S0377-0427-03-00650-2.)
"Initial Thoughts on Cybersecurity And Reproducibility," Deelman, Stodden, Taufer, Welch,
Second International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'19),
June 2019. https://doi.org/10.1145/3322790.3330593. Also available here.
"Computing Environments for Reproducibility: Capturing the 'Whole Tale'"
Brinkman et al. Future Generation Computer Systems, May 2019. https://doi.org/10.1016/j.future.2017.12.029
"Reproducibility and Replicability in Science" Committee Members,
National Academies Report, May 2019.
"Assessing Reproducibility: An Astrophysical Example of Computational Uncertainty in the HPC Context," Stodden and Krafczyk,
The
1st Workshop on Reproducible, Customizable and Portable Workflows for HPC,
November 2018, Dallas, TX, USA, co-located with Supercomputing 2018. Also available here.
For code see https://github.com/victoriastodden/ComputationalUncertaintyHPC.
"Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility,"
Stodden, Krafczyk, and Bhaskar, First International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS'18),
June 2018. https://doi.org/10.1145/3214239.3214242. Also available here
and here is our poster.
For code see https://github.com/ReproducibilityInPublishing/P-RECS-2018-Enabling-Verification.
"AIM: An Abstraction For Improving Machine Learning Prediction," Stodden, Wu,
and Sochat, 2018 IEEE Data Science Workshop, June 2018. Also available here. And our poster.
https://github.com/AIM-Project/AIM-Manuscript.
"Toward a Compatible Reproducibility Taxonomy for Computational and Computing Sciences,"
Heroux, Barba, Parashar, Stodden, and Taufer, Sandia National Lab Report SAND2018-11186, 2018. https://doi.org/10.2172/1481626. Also available here.
"Realizing the Potential of Data Science,"
Berman et al., Communications of the ACM, March 2018. Also available here.
"An empirical analysis of journal policy effectiveness for computational reproducibility," Stodden et al.,
PNAS, March 13 2018. Also available here.
"Reproducibility of research: Issues and proposed remedies," Allison, Shriffin, and Stodden, PNAS, March 13, 2018. Also available here.
"Computing Environments for Reproducibility: Capturing the 'Whole Tale'," Brinkman et al., Future Generation Computer
Systems, Feb 2018. Also available here.
"Reproducibility and replicability of rodent phenotyping in preclinical studies," Kafkafi et al.,
Neuroscience & Biobehavioral Reviews, January 2018. Also available here.
"Four Simple Recommendations to Encourage Best Practices in Research Software," Jiminez et al.,
F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1), June 13, 2017.
"Fostering Integrity in Research" Committee Members,
National Academies Report, April 11, 2017.
"Structuring Supplemental Materials in Support of Reproducibility" Greenbaum, Rozowsky, Stodden and Gerstein,
Genome Biology, April 7, 2017.
"Enhancing Reproducibility for Computational Methods" with co-authors,
Science, 354(6317), Dec 9, 2016.
"Making Massive Computational Experiments Painless" with co-authors,
IEEE BigData 2016, Open Science in Big Data (OSBD 2016), Dec 5, 2016.
"Capturing the 'Whole Tale' of Computational Research: Reproducibility in Computing Environments," with co-authors,
Gateways 2016 Nov 3, 2016. Also on arxiv.
"Reproducibility and replicability of rodent phenotyping in preclinical studies," Kafkafi et al, bioarxiv, 2016.
"Facilitating Reproducibility in Scientific Computing: Principles and Practice," David H. Bailey, Jonathan M. Borwein and Victoria Stodden,
in Harald Atmanspacher and Sabine Maasen, eds, Reproducibility: Principles, Problems, Practices, John Wiley and Sons, New York, 2016.
"Reproducible Research in the Mathematical Sciences," with Donoho,
The Princeton Companion to Applied Mathematics, Edited by Nicholas J. Higham, 2015. Draft available here.
"Self-correction in Science at Work," with co-authors,
Science, 26 June 2015: Vol. 348 no. 6242 pp. 1420-1422
DOI: 10.1126/science.aab3847. Also available here.
"ResearchCompendia.org:
Cyberinfrastructure for Reproducibility and Collaboration in Computational Science,"
with S. Miguez and J. Seiler, Computing in Science and Engineering, Jan/Feb 2015.
Also available here.
"Reproducing Statistical Results,"
Annual Reviews of Statistics and Its Application, Vol. 2, 2015. Also available here.
"Standing Together
for Reproducibility in Large-Scale Computing: Report on reproducibility@XSEDE, An XSEDE14 Workshop," principal editors:
Doug James, Nancy Wilkins-Diehr, Victoria Stodden, Dirk Colbry, and Carlos Rosales. Dec 17, 2014.
"Best
Practices for Computational Science: Software Infrastructure and Environments
for Reproducible and Extensible Research,"
with S. Miguez, Journal of Open Research Software 2(1), 2014.
http://dx.doi.org/10.5334/jors.ay, also available here.
"The Reproducible Research Movement in Statistics,"
Statistical Journal of the IAOS,
Volume 30 (2014). DOI10.3233/SJI-140818
"Provisioning Reproducible Computational Science Information,"
with S. Miguez, reproducibility@XSEDE: An XSEDE14 Workshop, July 2014.
"Enabling Reproducibility in Big Data Research: Balancing Confidentiality and Scientific Transparency," chapter in Lane, J., Stodden, V., Bender, S., and Nissenbaum, H. (eds). 2014.
Privacy, Big Data, and the Public Good: Frameworks for Engagement. Cambridge University Press. Available here.
Privacy, Big Data, and the Public Good: Frameworks for Engagement, Lane, J., Stodden, V., Bender, S., and Nissenbaum, H. (eds). 2014.
"What Computational Scientists Need to Know About Intellectual Property Law: A Primer," chapter in Stodden, V., Leisch, F., and Peng, R. (eds). 2014.
Implementing Reproducible Computational Research.
Boca Raton: Chapman & Hall/CRC), also available here.
"RunMyCode.org: A Research-Reproducibility Tool for Computational Sciences," with C. Hurlin and C. Perignon, chapter in Stodden, V., Leisch, F., and Peng, R. (eds). 2014.
Implementing Reproducible Computational Research. Boca Raton: Chapman & Hall/CRC).
Implementing Reproducible Computational Research, Stodden, V., Leisch, F., and Peng, R. (eds). 2014. Boca Raton: Chapman & Hall/CRC).
"What? Me Worry? What to Do About Privacy, Big Data, and Statistical Research," with J. Lane,
Amstat News, Dec 1, 2013.
"Best Practices for Computational Science: Software Infrastructure and Environments
for Reproducible and Extensible Research,"
with S. Miguez, WSSSPE, Nov 2013. (and on SSRN)
"Resolving Irreproducibility in Empirical and Computational Research," IMS Bulletin, Nov 17, 2013.
"Toward Reproducible Computational Research: An Empirical Analysis of Data and Code
Policy Adoption by Journals," with P. Guo and Z. Ma, PLoS ONE, June 21, 2013. *Chosen to be part of the PLOS Open Data Collection.*
"'Setting the Default to Reproducible' in Computational Science Research," with D. Bailey and J. Borwein, SIAM News, June 2013.
"Set the Default to 'Open'," with D. Bailey and J. Borwein, Notices of the AMS, June 2013.
Testified at the House Committee on Science, Space and Technology
for the March 5, 2013 hearing on Scientific
Integrity & Transparency (video on the hearing website or here (~600MB). Summary document
here)
"Setting the Default to Reproducible: Reproducibility in Computational and Experimental Mathematics,"
ICERM Workshop report, with D. Bailey, J. Borwein, R. LeVeque, W. Rider, and W. Stein.
"Intellectual Property and Computational Science," S. Bartling, S. Friesike (eds). (2014).
Opening Science:
The Evolving Guide on How the Web is Changing Research, Collaboration and Scholarly Publishing. Springer. Also available here.
"Best Practices For Researchers Publishing
Computational Results: An Open Source Community Resource," the wiki:
http://wiki.stodden.net/
"Software Patents as a Barrier to Scientific Transparency: An Unexpected Consequence of Bayh-Dole," With I. Reich,
The Seventh Annual Conference on Empirical Legal Studies (CELS 2012), Stanford, CA. Nov, 2012.
"RunMyCode.org: a novel dissemination and collaboration platform for executing
published computational results," with C. Hurlin and C. Perignon.
Analyzing and Improving Collaborative eScience with Social Networks (eSoN 12);
Workshop with IEEE e-Science 2012;
Monday, 8 October 2012, Chicago, IL, USA.
Also available on IEEE Xplore.
"How Journals are Adopting Open Data and Code Policies," with P. Guo and
Z. Ma, The First Global Thematic IASC Conference on the Knowledge Commons: Governing Pooled Knowledge Resources, Louvain-la-Neuve,
Belgium, Sept 12, 2012. Draft version available here.
Guest editor introduction to a special issue on Reproducible Research:
"Reproducible Research: Tools and Strategies for Scientific Computing," IEEE Computing in Science and Engineering, 14(4), July/Aug
2012, pp. 11-12. Also available for noncommercial purposes here.
"Reproducible Research for Scientific Computing:
Tools and Strategies for Changing the Culture," with Ian Mitchell and Randall LeVeque, IEEE Computing in Science and Engineering, 14(4), July/Aug
2012, pp. 13-17. Also available for noncommercial purposes here.
"Scientists,
Share Secrets or Lose Funding," with Sam Arbesman, Bloomberg View, Jan 10, 2012.
"Trust your Science? Open Your Data and Code,"
Amstat News, July 1, 2011.
with participants, "Changing the Conduct of Science: Summary Report of Workshop Held on November 12, 2010," at the National Science Foundation Workshop Changing the Conduct of Science in the Information Age, June, 2011.
"White Paper for
Expert Panel Discussion on Data Policies," for a Workshop of the National Science Board
Expert Panel on Data Policies, March 27-29, 2011.
"Innovation and Growth through Open Access to Scientific
Research: Three Ideas for High-Impact Rule Changes," in Rules for Growth: Promoting Innovation and Growth
Through Legal Reform, edited by The Kauffman Task Force on Law, Innovation, and Growth. February, 2011.
"Data Sharing in Social Science Repositories: Facilitating Reproducible Computational Research," NIPS workshop: Computational Science and the Wisdom of Crowds, Dec 2010.
"Cyber Science and Engineering: A Report of the NSF Advisory Committee on Cyberinfrastructure," Task Force on Grand Challenges, Nov 2010.
Remarks presented before The National Academies Committee on
The Impact of Copyright Policy on Innovation in the Digital Era, Washington DC, Oct 15, 2010.
"Reproducible Research: Addressing the Need for Data and Code Sharing in Computational Science," with Yale Roundtable Participants, IEEE Computing in Science and Engineering, 12(5), pp. 8-13, Sep/Oct 2010, doi:10.1109/MCSE.2010.113
"Reproducible Research Concepts and Tools for Cancer Bioinformatics," with Vincent Carey, chapter in Biomedical Informatics for Cancer Research, Ochs, Michael F., Casagrande, John T., Davuluri, Ramana V. (Eds.).
"Open Science:
Policy Implications for the Growing Phenomenon of User-Led Scientific Innovation," Journal of Science Communication
, 9(1), 2010. Available here.
"The Scientific Method in Practice: Reproducibility in the Computational Sciences", MIT Sloan Research Paper No. 4773-10, submitted.
"Prepublication Data Sharing", with the Toronto International Data Release Workshop
Authors, Nature, Vol 461, Issue 10, September 2009, p. 168-70.
"A Global Empirical Evaluation of New Communication Technology Use and Democratic
Tendency", with Meier, nominated for best paper, 3rd IEEE/ACM International Conference on Information and Communication
Technologies and
Development, Doha, Qatar, April 2009.
"Enabling
Reproducible Research: Open Licensing
For Scientific Innovation", winner of the Access to Knowledge
Kaltura prize, International Journal of Communications Law and Policy, Issue 13, 2009.
"Reproducible Research in Computational Harmonic Analysis",
with Donoho et al. IEEE Computing in Science and Engineering, 11(1), January
2009, p.8-18. (also download here with an earlier version here).
"The Legal Framework for Reproducible Research in the Sciences:
Licensing and Copyright",
IEEE Computing in Science and Engineering, 11(1), January 2009, p.35-40. (also download here with an earlier version here).
"Virtual
Northern Analysis of the Human Genome", with Hurowitz, Drori, Donoho, and Brown.
PLoS ONE, May 23, 2(5), 2007.
"About SparseLab", with Donoho and Tsaig. Documentation for SparseLab,
MATLAB scripts associated with papers
seeking sparse solutions to linear systems of equations, March 2007.
"SparseLab Architecture", with Donoho and Tsaig. Detailed documentation for SparseLab, MATLAB scripts associated
with papers
seeking sparse solutions to linear systems of equations, March 2007.
"Model Selection When the Number of Variables Exceeds the Number of Observations"
Doctoral Dissertation, Department of Statistics, Stanford University, 2006.
Release of SparseLab, a collaborative library of
MATLAB routines for finding sparse solutions to underdetermined systems. Code and data from 13 papers
by 12 authors are included in this platform, making the results in these papers reproducible.
"Fast l1 Minimization for Genome-wide Analysis of mRNA Lengths", with Hurowitz and Drori.
IEEE International Workshop on Genomic Signal Processing and Statistics, 2006.
"Breakdown Point of Model Selection when the Number
of Variables Exceeds the Number of Observations", with Donoho.
Proceedings of the International Joint Conference on Neural Networks, 2006. This paper is part of the library of MATLAB scripts SparseLab.
"Fast l1
Minimization for Genomewide Analysis of mRNA Lengths", with Hurowitz, Drori.
Proceedings of the IEEE International Workshop on Genomic Signal Processing and
Statistics, 2006.
"Multiscale Representations for Manifold-Valued
Data", with Donoho, Drori, Schroeder, and Ur Rahman. Multiscale Modeling and
Simulation, 4(4), 2005. https://doi.org/10.1137/050622729. This paper is part of the library of MATLAB scripts
SymmLab and also available here.
"When Does Non-Negative Matrix Factorization Give a Correct Decomposition into
Parts?", with Donoho. Proceedings of Neural Information Processing
Systems, 2003. [`Swimmer' dataset]