Skip to Content

Opportunities to Explore Data Science for Summer Interns

In the future, biomedical research, and research in general, will depend increasingly on data science, computation, and related disciplines. The large amounts of data investigators can now generate will require secure storage and mechanisms for sharing datasets; standardization of data elements; creation, support, and sharing of new tools and workflows; involvement of data scientists in most research projects; and the training of a data science workforce. In short, we can predict that ALL scientists will have to understand and use data science in the future.

The Virtual Opportunities to Explore Data Science planned for summer 2021 will provide trainees at various levels with a variety of options to learn and improve their computational skills as applied to biomedical research.  The material will range from the basics of learning to code, to using Supercomputers and Cloud-based services to mine, analyze, and visualize data. The offerings listed below were planned in collaboration with the NIH Office of Data Science Strategy. They are open to summer interns at all levels.

Introduction to Supercomputing and NIH Biowulf
ANVIL
Introduction to R’ and R’ Studio
Finding Information in NCBI Databases: Tools to Help You Do What You Need to Do
Introduction to R’ Data Types
NCBI Blast and Sequence Alignment Analysis Tools
A Research Initiative for All of Us: The All of Us Research Program
Reproducible Data Science: What Are Data Scientists Doing to Ensure Their Findings Are Replicable and How’s It Working?
2021 NIH Summer Intern Cloud Computing Workshop
NCBI Resources for Human Genome and Gene Research
End of Summer Codeathon



NOTE:
One of the Summer Bootcamps focuses on coding. Take a look at Learn to Code: Python for Beginners.


 

DATES AND TIMES

All programs are open to summer interns at all levels.

Introduction to Supercomputing and NIH Biowulf
Tuesday, June 15, 2021; 10:00-11:00am

This workshop will provide a basic overview of what a supercomputer is and how it is used.  Hear more about Biowulf, the NIH supercomputer, and the many things it is used for at the NIH.
Read more and register

ANVIL
Monday, June 21, 2021; 1:00-3:00pm, ET

AnVIL is a tool for deploying your data science more easily.  Topics to be covered include:

  • AnVIL as a platform for scalable & flexible genomics research
  • Introduction to genome assembly, alignment, & variant calling (on AnVIL and beyond)
  • AnVIL as the platform for Telomere-to-Telomere (T2T) variant calling analysis with implications for population and clinical genomics

Read more and register

Introduction to R' and R' Studio
Tuesday, June 22, 2021; 12:00-1:30 pm, ET (Registration through the NIH Library begins June 15)

Learn the basics of using R’ and R’ Studio to analyze large datasets
Read more

Finding Information in NCBI databases: Tools to Help You Do What You Need to Do
Thursday, June 24, 2021; 10:00am–12:00pm ET

This workshop will discuss how to quickly find and organize literature and biological data at NCBI (the National Center for Biotechnology Information) for use in your next research project.
Read more and register


Introduction to R' Data Types
Monday, June 28, 2021; 2:00-3:30 pm, ET (Registration through the NIH Library begins June 21)

A companion to Introduction to R’ and R’ Studio, this workshop will introduce the different data types in R’ to help you analyze your own data.
Read more

NCBI BLAST and Sequence Alignment Analysis Tools
Thursday,July 8, 2021; 10:00 am-12:00 pm, ET

Learn how to effectively use NCBI BLAST and other important sequence analysis services including Genome BLAST, Primer BLAST and COBALT
Read more and register

A Research Initiative for All of Us: The All of Us Research Program
Monday, July 12, 2021; 1:00-2:00 pm

This webinar event is an opportunity for trainees to learn about the NIH’s All of Us Research Program, an ambitious initiative to advance precision medicine by building one of the world’s largest and most diverse databases for health research. Webinar attendees will hear about the program’s mission and core values, and also learn how to register, access, and analyze data in the All of Us Research Hub. Trainees with R or Python coding experience are particularly encouraged to attend this event.
Read more and register

Reproducible Data Science: What Are Data Scientists Doing to Ensure Their Findings Are Replicable and How’s It Working?
Tuesday, July 20, 2021; 11:00am-12:00 pm, ET

Most people assume that published science is accurate and true. In what has come to be known as "The Reproducibility Crisis", we have learned that many, if not most, reported findings, cannot be independently replicated and may be simply wrong. In this webinar we'll discuss what data scientists are doing to ensure their findings are replicable and how well it's working.
Read more and register

NIH Summer Intern Cloud Computing Workshop
Monday, July 26, 2021; 10:00 am-2:00 pm, ET

The NIH Office of Intramural Training and Education (OITE) and the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative have partnered to bring the NIH summer interns a half day event that will expose attendees to a variety of cloud computing concepts and use cases for biomedical research.
Read more and register.

 

NCBI Resources for Human Genome and Gene Research
Thursday, July 29, 10:00 am-12:00 pm, ET

Learn about human genome and gene information at NCBI and how to find and use these data for your research project
Read more and register

Summer Internship Codeathon

The National Institutes of Health Office of Data Science Strategy is pleased to announce a virtual code-athon (August 9th to 12th) to bring the summer opportunities to explore data science to an exciting close.

Participants will work in teams to build tools and pipelines for advanced analysis of biological datasets including but not limited to text, images, next generation sequencing data, proteomics, and metadata.

More information still to come! Register