Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.


Postdoctoral Fellow - Bioinformatics Analyst

National Library of Medicine, Bethesda, MD and surrounding area

About the position

Come join a dynamic interdisciplinary team advancing the boundaries of single cell genomics and machine learning with Dr. Richard H. Scheuermann, the Scientific Director of the National Library of Medicine (NLM).

NLM is the world’s largest biomedical library and a leader in research, development, and training in biomedical informatics and health information technology. NLM is legislatively mandated to support the essential work of acquiring, organizing, preserving, and disseminating biomedical information, a field that is changing at a more rapid pace than ever before. NLM plays a pivotal role in translating biomedical research into practice. NLM’s research and information services support scientific discovery, health care, and public health, enabling researchers, clinicians, and the public to use the vast wealth of biomedical data to improve health. The NLM Intramural Research Program (IRP) develops and applies computational approaches to a broad range of information problems in biology, biomedicine, and human health.

The Bioinformatics Analyst will be a member of an interdisciplinary project team developing a Cell Phenotype to Disease Knowledge Base (C2DKB) as a definitive public reference resource of information about cell phenotypes, including cell types, cell states and developmental trajectories, using Linked Open Data (LOD) approaches. C2DKB will be designed to support three core disease-related use cases: diagnostic biomarker discovery, therapeutic target identification, and mechanistic insight detection, with an initial focus on priority disease processes in two physiological systems: neurodegeneration of the nervous system and autoimmune and inflammatory diseases of the immune system. The Bioinformatics Analyst will identify, extract, and analyze high quality use case-driven benchmark sc/snRNA-seq and other single cell omics datasets of healthy, perturbed, and diseased anatomical structures from experiment data repositories selected based on rigorous quality control assessment to ensure transparency and that the knowledge captured in C2DKB is of the highest standards. The Bioinformatics Analyst will also develop and implement statistically-rigorous cell type matching approaches to provide for incremental knowledge growth by determining if existing and/or novel cell types have been identified. The Bioinformatics Analyst will collaborate with other members of the project team to develop a FAIR-compliant cell phenotype representational model (a semantic schema) based on OBO Foundry ontologies and related standards and an extraction, translation, and loading (ETL) protocol for translating processed assay results, including transcriptional biomarkers produced using a standardized machine learning pipeline, and experiment metadata from the datasets selected into standardized semantically-structured assertions (SSS assertions) about cell phenotypes for loading into the C2DKB graph knowledgebase. The end delivered product will be an open access reference knowledgebase about healthy and diseased cell phenotypes designed to meet the needs of the general biomedical research community.

Specific tasks include:

  • Identify, extract, and analyze high quality use case-driven benchmark sc/snRNA-seq and other single cell omics datasets of healthy, perturbed, and diseased anatomical structures from experiment data repositories.
  • Process the selected datasets using standard protocols for gene expression determination and quality control filtering.
  • Perform clustering and 2D embeddings of the resulting data matrices using approaches including Louvain and Leiden unsupervised clustering and tSNE and UMAP embedding.
  • Identify cell type specific marker genes using approaches including differential gene expression analysis and NS-Forest marker gene selection.
  • Establish a reference database to store marker gene expression matrices and metadata information.
  • Establish a computational system to compare new single cell expression datasets with the developed reference database using approaches including FR-Match and Azimuth.
  • Collaborate with a graph database developer to load cell phenotype information into a reference knowledgebase.

Apply for this vacancy

What you'll need to apply

Candidates should send the following application materials directly to Dr. Richard H. Scheuermann at and copy Dr. Virginia Meyer at

  • Current curriculum vitae
  • Cover letter/statement of research interest
  • Contact information for three references

Contact name

Richard H. Scheuermann


  • PhD and/or MD degree in a biomedical science, computer science or related field.
  • Experience participating on data service, product, or project teams.
  • Skill in using programming languages and environments used in data science, e.g., Python and/or R as well as associated programming libraries, e.g., numpy, scipy, and bioconductor.
  • Strong working knowledge of tools for processing, reporting, and visualizing of scientific data, including web-based applications, e.g., Javascript, Java, Python, C/C++, R/Shiny apps.
  • Strong working knowledge of software tools and algorithms used to process and analyze single cell RNA sequencing data, including CellRanger, scanpy, Seurat, Leiden clustering, and UMAP/tSNE embedding.
  • Strong working knowledge of the Linux operating system as well as software development and deployment tools, e.g., Docker, Git.
  • Strong working knowledge of parametric and non-parametric statistical and machine learning approaches commonly used in data analysis and data visualization, including random forest and neural network machine learning and sklearn libraries.
  • Familiarity with deploying, constructing, and querying data systems, including relational database management systems (RDBMS), non-SQL systems (e.g., graph databases, RDF subject-predicate-object triple stores) and SPARQL and Cypher query languages.
  • Familiarity with data and metadata standards and application of metadata in a biomedical repository setting, including experience with biomedical ontologies.
  • Knowledge of scientific, biomedical research, and health-related terminology.
  • Demonstrable skills in interpersonal communication, oral and written communication, and an ability to work collaboratively in cross-functional working groups.
  • Experience in maintaining relationships and/or partnerships with other institutions and vendors.
  • Ability to successfully work on multiple projects simultaneously.
  • Attention to detail and ability to adapt to changes in work requirements.

Additional Information

The NIH is dedicated to building a community in its training and employment programs and encourages the application and nomination of qualified women, minorities, and individuals with disabilities.