We are looking for undergraduates with data analysis and coding skills to join our team.
The GenBank metadata mining project (http://metadatalab.syr.edu/) has a large social network data set. We are looking for two undergraduates who are interested in learning about research and data analysis to join our team for a year. This is a great opportunity for undergraduates who are interested in learning more about research in general. You will have the opportunity to get involved in a National Science Foundation funded project, and be given the opportunity to design and conduct a research project related to our research.
If you're interested in social network analysis, big data, NoSQL databases or scientific communication, this is a great opportunity to learn in a hands-on environment. We have millions of rows of submission data to an international data repository. This data contain co-authorship information, some geographical and temporal data, and information on research focuses. We need to identify and extract relationship patterns out of this data and want you to help!
Who can apply:
- Undergraduate students in third or fourth year
- Major in information management and technology, biotechnology/bioinformatics, statistics, math, and computer science
We expect you (with help) to:
- Identify a problem.
- Conduct a literature review.
- Design the data analysis and extraction procedures.
- Conduct the proposed analysis.
- Prepare a report of the results.
What you will get:
- An opportunity to learn the research process.
- Personal mentorship from faculty and a PhD candidate.
- Access to computational resources and lots of data.
- A stipend ($4000 each for fall 2014 and spring 2015) cover 20 hours of work per week.
Ideas for projects:
- Automated quality assurance - we downloaded the data and parsed it into a structured database. How we do know if this was done accurately?
- NoSQL vs MySQL - what are our options for storing the data, both for analysis and presentation? It turns out that MySQL has performance limitations, so we tried a graph DB. Maybe you can take that a bit further and give us some quantitative metrics to compare performance.
- Visual data products - we have a tremendous amount of social network data. How do we best design and present interactive subsets of that data?
- Collaboration networks by taxonomic class: how do DNA sequence data submissions and collaboration networks distribute by taxonomic class? Are there any associations with external data such as funding, outbreaks, or technology advances?
These are just ideas, if something else interests you please feel free to suggest it.
How to apply:
Instead of sending just a resume, what we'd like to see is a very short (< 2 pages) proposal on what you think you'd like to do. This proposal is a demonstration of your self-motivation, and should include a summary of your idea and what you would like to explore, as well as a review of related work. Don't worry about solving the problems listed above in the posting, instead worry about demonstrating your willingness and ability to be engaged in our work. You get bonus points if you demonstrate a willingness to try and demo your ability to use a technology to solve the problem.
Please send a resume and the proposal mentioned above to Mark [firstname.lastname@example.org].