Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Job

Data Science/NLP/Computational Researcher in the Luna Lab (NLM/NCI) (Postdoctoral Fellow)

About the position

The Luna lab is jointly affiliated with the National Library of Medicine (NLM) and the National Cancer Institute (NCI). The dual ambitions of the lab are to make biomedical data and information accessible, as well as, to advance cancer research that helps people live longer, healthier lives. We seek outstanding, highly motivated, and skilled candidates to join our team with the goal of understanding how researchers interpret large data sets and enable them to explore and gain insight from data sets through interactive systems to advance healthcare.

This position offers a unique opportunity to push beyond the traditional scope of data extraction and management techniques to address the new challenges of large-scale, heterogeneous data. The group combines expertise in data management, visualization, information retrieval, and network data, and we will work in areas of data integration, data uncertainty, data summarization, visual analytics (using both visualization and machine learning) to explore novel assistive interfaces and data query methods that leverage large language models (LLMs) and vision (multimodal) models. Through this work, selected applicants will contribute collaboratively to research in the field of cancer therapeutics and precision medicine that spans both basic and translational science objectives. Successful candidates will develop novel bioinformatic machine learning methodologies in the areas of: 1) multimodal data summarization from pharmacogenomics datasets, 2) structuring complex scientific information from unstructured manuscript data (e.g., supplementary data, experimental protocols, images) and 3) retrieval-augmented generation for LLM-backed agents for information retrieval from cancer-related multi-omics datasets.

We are offering full-time postdoctoral fellow positions, available immediately and renewable on a yearly basis. The NIH offers a competitive salary and comprehensive health insurance. Initial appointments will be for 1-2 year(s), with possible extensions up to 5 years. The NIH is dedicated to building a diverse community in its training and employment programs as well as the continued education and career development of all its research staff. These positions are subject to background checks.

Recent Articles

Apply for this vacancy

What you'll need to apply

What you'll need to apply

Please send

  • Cover letter (1 page max) describing your 1) research experiences, 2) training goals, and 3) preferred starting date. The letter should be tailored to our group, mentioning recent articles and explaining your potential role with the lab
  • Updated CV including bibliography
  • It is strongly suggest that links to one or more code repository URLs be included in your application with code attributable
  • Contact information (name, institute, email, phone) for 3 references

to Augustin Luna, Ph.D., via email only. No calls, please. Write ""Postdoctoral Application"" in the subject heading. If we are interested, you will be contacted by Dr. Luna.

Contact name

Augustin Luna

Contact email

augustin@nih.gov

Qualifications

Essential

  • PhD in a relevant field, including: Statistics, Mathematics, Data Science, Computer Science/Engineering, Electrical Engineering, Medical Informatics, or a degree related to Biology with substantial experience in computational and statistical work. Individuals in the final stages of PhD submission will be considered as well as PhD graduates within 5 years of graduation.
  • Excellent knowledge of theory and practice of LLM and foundation model, as well as deep learning neural networks
  • Excellent coding skills in modeling and conversational interface design for real-time interaction (e.g., PyTorch/TensorFlow and Python proficiency)
  • Rapid prototyping environment such as Python; C++ and parallel programming (e.g., CUDA)
  • Experience multimodal generative language models, personalized LLM, and/or fine-tuning LLMs with/for reinforcement learning planning
  • Technical expertise in machine learning and/or mathematical modeling
  • An interest in applying computational methods to biological problems
  • A demonstrated ability to generate and pursue independent research ideas
  • Excellent communication skills, written and verbal as evidenced by publications, preprints, and/or conference presentations in conversational artificial intelligence venues (e.g., CoLing, EMNLP, ACL, NAACL, IJCAI, ICLR, NeurIPS, AAAI, CVPR, IEEE, JAMIA, etc)
  • Dedication to reproducible research and open science

Desirable

  • Ph.D. thesis in neural conversational systems or closely related area
  • Foundational knowledge in Bioinformatics, Systems Biology, and/or similar fields
  • Foundational knowledge in Mathematics, Statistics, and/or Data Science
  • Familiarity with software development practices and high-performance computing
  • Experience with analysis using the R programming language (the lab has a significant, existing codebase in R)
  • Experience with using network-based analyses (graph theory) and software/resources (graph and/or pathway databases) is highly desirable
  • Experience with biomedical ontologies
  • Development and execution of annotation tasks with teams of experts
  • Experience working in collaborative interdisciplinary environments

Disclaimer/Fine Print

The NIH is dedicated to building a diverse community in its training and employment programs and encourages the application and nomination of qualified women, minorities, and individuals with disabilities.

Additional Links

About the National Institutes of Health (NIH)

The National Institutes of Health is made up of 27 separate institutes and centers that include the National Library of Medicine (NLM) and the National Cancer Institute (NCI).

About the NLM IRP

The National Library of Medicine (NLM, https://www.nlm.nih.gov/research/index.html) pioneers new ways to make biomedical data and information more accessible; and builds tools for better data management and personal health. NLM’s cutting-edge research and training programs (with a focus on artificial intelligence (AI), machine learning, computational biology, and biomedical informatics and health data standards) help catalyze basic biomedical science, data-driven discovery, and health care delivery.

About the NCI/CCR/DTB

The National Cancer Institute Center for Cancer Research (NCI-CCR, https://ccr.cancer.gov/) is the largest division of the NCI; it encompasses various branches such as the NCI Developmental Therapeutics Branch. The NCI CCR has a mandate to confront the special challenges presented by rare cancers as well as cancers that may be predominant in medically underserved populations. One way in which the NCI CCR addresses this mandate is by conducting clinical trials that recruit patients with rare cancers thereby generating unique data to advance research in these cancers. While rare cancers affect low numbers of patients, as a group, they account for about a quarter of all cancers, as well as a quarter of all cancer deaths each year (https://www.cancer.gov/pediatric-adult-rare-tumor/rare-tumors/about-rare-cancers).