Data Science/NLP/Computational Researcher in the Luna Lab (NLM/NCI) (Postdoctoral Fellow)
National Library of Medicine, Bethesda, MD and surrounding area
About the position
The Luna lab is jointly affiliated with the National Library of Medicine (NLM) and the National Cancer Institute (NCI). The dual ambitions of the lab are to make biomedical data and information accessible, as well as, to advance cancer research that helps people live longer, healthier lives. We seek outstanding, highly motivated, and skilled candidates to join our team with the goal of understanding how researchers interpret large data sets and enable them to explore and gain insight from data sets through interactive systems to advance healthcare.
This position offers a unique opportunity to push beyond the traditional scope of data extraction and management techniques to address the new challenges of large-scale, heterogeneous data. The group combines expertise in data management, visualization, information retrieval, and network data, and we will work in areas of data integration, data uncertainty, data summarization, visual analytics (using both visualization and machine learning) to explore novel assistive interfaces and data query methods that leverage large language models (LLMs) and vision (multimodal) models. Through this work, selected applicants will contribute collaboratively to research in the field of cancer therapeutics and precision medicine that spans both basic and translational science objectives. Successful candidates will develop novel bioinformatic machine learning methodologies in the areas of: 1) multimodal data summarization from pharmacogenomics datasets, 2) structuring complex scientific information from unstructured manuscript data (e.g., supplementary data, experimental protocols, images) and 3) retrieval-augmented generation for LLM-backed agents for information retrieval from cancer-related multi-omics datasets.
We are offering full-time postdoctoral fellow positions, available immediately and renewable on a yearly basis. The NIH offers a competitive salary and comprehensive health insurance. Initial appointments will be for 1-2 year(s), with possible extensions up to 5 years. The NIH is dedicated to building a diverse community in its training and employment programs as well as the continued education and career development of all its research staff. These positions are subject to background checks.
Apply for this vacancy
What you'll need to apply
What you'll need to apply
- Cover letter (1 page max) describing your 1) research experiences, 2) training goals, and 3) preferred starting date. The letter should be tailored to our group, mentioning recent articles and explaining your potential role with the lab
- Updated CV including bibliography
- It is strongly suggest that links to one or more code repository URLs be included in your application with code attributable
- Contact information (name, institute, email, phone) for 3 references
to Augustin Luna, Ph.D., via email only. No calls, please. Write ""Postdoctoral Application"" in the subject heading. If we are interested, you will be contacted by Dr. Luna.
- PhD in a relevant field, including: Statistics, Mathematics, Data Science, Computer Science/Engineering, Electrical Engineering, Medical Informatics, or a degree related to Biology with substantial experience in computational and statistical work. Individuals in the final stages of PhD submission will be considered as well as PhD graduates within 5 years of graduation.
- Excellent knowledge of theory and practice of LLM and foundation model, as well as deep learning neural networks
- Excellent coding skills in modeling and conversational interface design for real-time interaction (e.g., PyTorch/TensorFlow and Python proficiency)
- Rapid prototyping environment such as Python; C++ and parallel programming (e.g., CUDA)
- Experience multimodal generative language models, personalized LLM, and/or fine-tuning LLMs with/for reinforcement learning planning
- Technical expertise in machine learning and/or mathematical modeling
- An interest in applying computational methods to biological problems
- A demonstrated ability to generate and pursue independent research ideas
- Excellent communication skills, written and verbal as evidenced by publications, preprints, and/or conference presentations in conversational artificial intelligence venues (e.g., CoLing, EMNLP, ACL, NAACL, IJCAI, ICLR, NeurIPS, AAAI, CVPR, IEEE, JAMIA, etc)
- Dedication to reproducible research and open science
- Ph.D. thesis in neural conversational systems or closely related area
- Foundational knowledge in Bioinformatics, Systems Biology, and/or similar fields
- Foundational knowledge in Mathematics, Statistics, and/or Data Science
- Familiarity with software development practices and high-performance computing
- Experience with analysis using the R programming language (the lab has a significant, existing codebase in R)
- Experience with using network-based analyses (graph theory) and software/resources (graph and/or pathway databases) is highly desirable
- Experience with biomedical ontologies
- Development and execution of annotation tasks with teams of experts
- Experience working in collaborative interdisciplinary environments
The NIH is dedicated to building a diverse community in its training and employment programs and encourages the application and nomination of qualified women, minorities, and individuals with disabilities.
- Google Scholar: https://scholar.google.com/citations?user=u2dgLp8AAAAJ&sortby=pubdate
- Summary of NIH as a Training Environment (Reputation, Cost of Living, Social Climate): https://github.com/cannin/nih
About the National Institutes of Health (NIH)
The National Institutes of Health is made up of 27 separate institutes and centers that include the National Library of Medicine (NLM) and the National Cancer Institute (NCI).
About the NLM IRP
The National Library of Medicine (NLM, https://www.nlm.nih.gov/research/index.html) pioneers new ways to make biomedical data and information more accessible; and builds tools for better data management and personal health. NLM’s cutting-edge research and training programs (with a focus on artificial intelligence (AI), machine learning, computational biology, and biomedical informatics and health data standards) help catalyze basic biomedical science, data-driven discovery, and health care delivery.
About the NCI/CCR/DTB
The National Cancer Institute Center for Cancer Research (NCI-CCR, https://ccr.cancer.gov/) is the largest division of the NCI; it encompasses various branches such as the NCI Developmental Therapeutics Branch. The NCI CCR has a mandate to confront the special challenges presented by rare cancers as well as cancers that may be predominant in medically underserved populations. One way in which the NCI CCR addresses this mandate is by conducting clinical trials that recruit patients with rare cancers thereby generating unique data to advance research in these cancers. While rare cancers affect low numbers of patients, as a group, they account for about a quarter of all cancers, as well as a quarter of all cancer deaths each year (https://www.cancer.gov/pediatric-adult-rare-tumor/rare-tumors/about-rare-cancers).