Introduction

Understanding how individual scientists interact with one another and how such interaction impacts research productivity and knowledge diffusion is important for understanding the dynamics of scientific research collaboration. At the same time, information about patterns of collaboration and their consequences have implications for science policy. In quantitative research on collaboration networks, publication co-authorships and citation-linkages have been the primary source of data. As large data repositories, one of the signposts for cyberinfrastructure-enabled, data-driven science, become increasingly prevalent, however, they offer an alternative source of information about networks of scientific collaboration. This project investigates research collaboration networks emerging around one such international data repository, GenBank, and develops data products to support data-driven science policymaking and research. By utilizing this novel data source the project provides an unprecedented opportunity to validate and expand the theory of complex networks while generating rich data outputs and products to support science policy research and policymaking. This study fills a number of theoretical and methodological gaps identified by the 2008 roadmap for Science of Science Policy (SoSP), with a specific focus on how scientific collaboration networks form and evolve. The outcomes of this study address the lack of models and tools for network analysis, visual analytics, and science mapping outlined in the 2008 roadmap for SoSP. To accomplish the data collection and processing required for this project new computational programs will be developed to parse, extract, store, transform, split, merge, and filter the data; these will be applicable to the analysis of other similar data sources for science policy and innovation research.