Postdoctoral Fellow - Graph Database Developer
National Library of Medicine, Bethesda, MD and surrounding area
About the position
Come join a dynamic interdisciplinary team advancing the boundaries of single cell genomics and machine learning with Dr. Richard H. Scheuermann, the Scientific Director of the National Library of Medicine (NLM).
NLM is the world’s largest biomedical library and a leader in research, development, and training in biomedical informatics and health information technology. NLM is legislatively mandated to support the essential work of acquiring, organizing, preserving, and disseminating biomedical information, a field that is changing at a more rapid pace than ever before. NLM plays a pivotal role in translating biomedical research into practice. NLM’s research and information services support scientific discovery, health care, and public health, enabling researchers, clinicians, and the public to use the vast wealth of biomedical data to improve health. The NLM Intramural Research Program (IRP) develops and applies computational approaches to a broad range of information problems in biology, biomedicine, and human health.
The Graph Database Developer will be a member of an interdisciplinary project team developing a Cell Phenotype to Disease Knowledge Base (C2DKB) as a definitive public reference resource of information about cell phenotypes, including cell types, cell states and developmental trajectories, using Linked Open Data (LOD) approaches. C2DKB will be designed to support three core disease-related use cases: diagnostic biomarker discovery, therapeutic target identification, and mechanistic insight detection, with an initial focus on priority disease processes in two physiological systems: neurodegeneration of the nervous system and autoimmune and inflammatory diseases of the immune system. The Graph Database Developer will collaborate with a biomedical ontologist to develop a FAIR-compliant cell phenotype representational model (a semantic schema) based on OBO Foundry ontologies and related standards and an extraction, translation, and loading (ETL) protocol for translating processed assay results, including transcriptional biomarkers produced using a standardized machine learning pipeline, and experiment metadata from the datasets selected into standardized semantically-structured assertions (SSS assertions) about cell phenotypes for loading into the C2DKB graph knowledgebase. The Graph Database Developer will integrate these C2DKB cell phenotypes with disease, drug, and other complementary information from external public graph knowledgebases (e.g., SPOKE) to facilitate the core mechanistic, diagnostic, and therapeutic discovery use cases. The Graph Database Developer will also develop and implement an intuitive user-friendly query, visualization, and analysis interface for semantic network exploration, graphical machine learning pattern discovery, and computational comparison of new datasets for cell type matching, with a focus on maximizing user experience (UX). The end delivered product will be an open access reference knowledgebase about healthy and diseased cell phenotypes designed to meet the needs of the general biomedical research community.
Specific tasks include:
- Provide guidance on use of information retrieval and extraction techniques to increase the utility of unstructured, semi-structured, and structured data, including application of semantic and ontology-based methods for organization and query of scientific knowledge.
- Lead the creation of strategies for the deployment, construction, and querying of data systems, including relational database management systems (RDBMS), NoSQL systems (e.g., graph databases, RDF subject-predicate-object triple stores) and SPARQL and Cypher query languages to gather and analyze biomedical research data.
- Collaborate with the development team to design, develop, and maintain knowledge graph databases for intelligence analysis purposes.
- Collaborate with intelligence analysts and subject matter experts to understand requirements and translate them into effective database designs.
- Support implementation of the data integration processes to extract, transform, and load structured and unstructured data from various sources into the knowledge graph database.
- Collaborate to develop data models and ontologies to represent entities, relationships, and attributes within the knowledge graph.
- Develop algorithms and techniques for relationship mapping, clustering, and trend analysis within the knowledge graph.
- Ensure data quality and integrity by implementing data validation and cleansing procedures.
- Optimize query performance and implement indexing strategies to enhance database retrieval and analysis capabilities.
- Collaborate with software developers to integrate the knowledge graph database into analytical tools and platforms.
- Stay updated with the latest advancements in knowledge graph technologies and propose innovative solutions to enhance database capabilities.
- Administer tools and services that increase researcher access to NIH data and knowledge management resources, and ensure that data and metadata meet principles of FAIR (Findable, Accessible, Interoperable, Reusable) data practices to enable reproducibility, including use of containerization (Docker), notebooks (Jupyter), etc.
- Aid in the creation of strategies for the development of web-based interfaces that are user-friendly and will promote usage of organizational analytical tools and databases by experts in other data domains and the lay public.
Apply for this vacancy
What you'll need to apply
Richard H. Scheuermann
- PhD and/or MD degree in a biomedical science, computer science or related field.
- Experience participating on data service, product, or project teams.
- Strong working knowledge of deploying, constructing, and querying data systems, including relational database management systems (RDBMS) as well as non-SQL systems (e.g., Neo4j graph databases, RDF subject-predicate-object triple stores) and SPARQL and Cypher query languages.
- Strong working knowledge of data and metadata standards and application of metadata in a biomedical repository setting, including experience with biomedical ontologies.
- Experience working with taxonomies, ontologies, and controlled vocabularies, including OBO Foundry ontologies, especially the Cell Ontology, Ontology of Biomedical Investigation, and UMLS/MeSH.
- Strong working knowledge of Semantic Web technologies (RDF/s, OWL), query languages (SPARQL) and validation/reasoning approaches and standards.
- Familiarity with other scientific, biomedical research, and health-related terminology (e.g., SNOMED-CT).
- Familiarity with programming languages and environments used in data science, e.g., Python and/or R as well as associated programming libraries, e.g., numpy, scipy, and bioconductor.
- Familiarity with the Linux operating system as well as software development and deployment tools, e.g., Docker, Git.
- Familiarity with data and metadata standards and application of metadata in a biomedical repository setting, including experience with biomedical ontologies.
- Knowledge of scientific, biomedical research, and health-related terminology.
- Demonstrable skills in interpersonal communication, oral and written communication, and an ability to work collaboratively in cross-functional working groups.
- Experience in maintaining relationships and/or partnerships with other institutions and vendors.
- Ability to successfully work on multiple projects simultaneously.
- Attention to detail and ability to adapt to changes in work requirements.
The NIH is dedicated to building a community in its training and employment programs and encourages the application and nomination of qualified women, minorities, and individuals with disabilities.