Events, Places, and Dynamics of Research Networks: Mapping Genetics Research with an Interactive Timeline
Authors: Jian Qin, Mark Costa, Sarah Bratt, Qianqian Chen, Yingxue Xiao (School of Information Studies, Syracuse University)
Genbank Timeline is showcase of the events, places, and dynamics of research networks derived from the metadata describing the DNA/RNA sequences submitted to NCBI’s international data repository “GenBank.” The metadata contain information on coauthorship, taxonomic lineages, publications associated with the datasets, and geographic locations of authors. In the time period (1983-2013) covered by this dataset, there were 175,889,683 DNA data annotations covering 814,196 organisms deposited in GenBank. The submissions included 688,737 direct submissions of data and 330,348 unique references to journal articles. After author named entity resolution, 545,345 unique authors were identified as having contributed to the community. The GenBank Timeline is an attempt to tie evolving collaboration networks of the international research community with external epidemiological and policy events as well as technical advances in genomics research.
The format of a timeline offers a different way to look at the Big (Meta)Data from data repositories such as GenBank. The events and places that are coupled with visualizations of the research networks enable exploratory analysis of such data in the context of public health, policy, and technology influencers, thus providing some context to the results obtained through complex network analytic (CNA) approaches to the studies of scientific communities.
The metadata from GenBank were processed and stored in a database for manipulation and analysis, and the statistical analysis software R was used. Macroscopic and statistical analysis revealed the “small world” reality of genetics research, with a highly connected giant component, some solo-publishing scientists, and evidence of the cyclic patterns of collaboration, as well as scientists’ structural roles in the collaboration network.
The featured data in the GenBank Timeline tool focus on scientific collaboration networks and provide opportunities for inspecting, manipulating, and exploring the research networks associated with outbreaks such as West Nile Virus, and Severe Acute Respiratory Syndrome (SARS) as well as technological advances (e.g. PCR sequencing technologies) and policy changes. Subsets of data were extracted, analyzed, and visualized using R, which were then processed and showcased using Timeline JS software. The results were categorized into disease outbreaks, important developments in Genetics R&D, technological developments, and policy decisions. This timeline is an ongoing work and future data analyses will be added when they are available.
To view GenBank Timeline
- Metadata description of Genbank Timeline:http://metadatalab.syr.edu/node/46
- Interactive GenBank Timeline: http://metadatalab.syr.edu/node/50
- Related projects/works: metadatalab.syr.edu
Genbank Timeline is easily accessible and sharable via the web link through the timeline software. Simply navigate your browser to the link and proceed through the chronologically arranged events and visualizations. Google Fusion tables allow the learner to zoom in on the scientists (represented by scientist id, a long integer), continents where research is conducted, and ask questions such as: “Does NIH funding correlate with genetic submissions?,” “How did PCR technology change the structure of research communities?,” “Is there a relationship between authorship of publications and genetic sequence submissions?,” “How has international scientific collaboration changed from 1982-2012?,” and “Are there Nobel Peace prize-winners in the Genbank database?” Interactive networks visualizations allow the user multiple opportunities for hands-on explorations of scientific collaboration network data, and to play with the data by linking directly to a subset of processed data, such as the West Nile Virus community data. Users may simply click the left and right arrows to navigate through the timeline. To understand the type of event in genetics research featured based, tags of each event and affordances beneath the timeline allow for a contextualizing overview of the featured events.
References & Additional Reading
Costa, M. R., Qin, J. (2012). Analysis of networks in cyberinfrastructure‐enabled research communities: A pilot study. Proceedings of the American Society for Information Science and Technology. Vol 49, Issue 1 Pages (1-4).
Costa, M.R., Qin, J., & Wang, J. (2014). Research networks in data repositories. In: JCDL '14 14th ACM/IEEE-CS Joint Conference on Digital Libraries, London, England, September 22 - 26, 2014. Available from: http://jianqin.metadataetc.org/wp-content/uploads/2014/07/JCDL2014_resea...
Qin, J., Costa, M., Wang, J. (2015). Methodological and Technical Challenges in Big Scientometric Data Analytics. In iConference 2015 Proceedings. Available from:
* All Genbank Timeline photos are in the public domain (Flickr) or original works of the authors.