Postdoctoral Fellow - Biomedical Ontologist
National Library of Medicine, Bethesda, MD and surrounding area
About the position
Come join a dynamic interdisciplinary team advancing the boundaries of single cell genomics and machine learning with Dr. Richard H. Scheuermann, the Scientific Director of the National Library of Medicine (NLM).
NLM is the world’s largest biomedical library and a leader in research, development, and training in biomedical informatics and health information technology. NLM is legislatively mandated to support the essential work of acquiring, organizing, preserving, and disseminating biomedical information, a field that is changing at a more rapid pace than ever before. NLM plays a pivotal role in translating biomedical research into practice. NLM’s research and information services support scientific discovery, health care, and public health, enabling researchers, clinicians, and the public to use the vast wealth of biomedical data to improve health. The NLM Intramural Research Program (IRP) develops and applies computational approaches to a broad range of information problems in biology, biomedicine, and human health.
The Biomedical Ontologist will be a member of an interdisciplinary project team developing a Cell Phenotype to Disease Knowledge Base (C2DKB) as a definitive public reference resource of information about cell phenotypes, including cell types, cell states and developmental trajectories, using Linked Open Data (LOD) approaches. C2DKB will be designed to support three core disease-related use cases: diagnostic biomarker discovery, therapeutic target identification, and mechanistic insight detection, with an initial focus on priority disease processes in two physiological systems: neurodegeneration of the nervous system and autoimmune and inflammatory diseases of the immune system. The Biomedical Ontologist will develop a FAIR-compliant cell phenotype representational model (a semantic schema) based on OBO Foundry ontologies and related standards and an extraction, translation, and loading (ETL) protocol for translating processed single cell transcriptomic assay results, including transcriptional biomarkers produced using a standardized machine learning pipeline, and experiment metadata from the datasets selected into standardized semantically-structured assertions (SSS assertions) about cell phenotypes for loading into the C2DKB graph knowledgebase. The Biomedical Ontologist will collaborate with other project team members to integrate these C2DKB cell phenotypes with disease, drug, and other complementary information from external public graph knowledgebases (e.g., SPOKE) to facilitate the core mechanistic, diagnostic, and therapeutic discovery use cases. The end delivered product will be an open access reference knowledgebase about healthy and diseased cell phenotypes designed to meet the needs of the general biomedical research community.
Specific tasks include:
- Provide guidance on use of information retrieval and extraction techniques to increase the utility of unstructured, semi-structured, and structured data, including application of semantic and ontology-based methods for organization and query of scientific knowledge.
- Aid in the creation of strategies for the deployment, construction, and querying of data systems, including relational database management systems (RDBMS), NoSQL systems (e.g., graph databases, RDF subject-predicate-object triple stores) and SPARQL and Cypher query languages to gather and analyze biomedical research data.
- Collaborate with the development team to design, develop, and maintain knowledge graph databases for intelligence analysis purposes.
- Support implementation of the data integration processes to extract, transform, and load structured and unstructured data from various sources into the knowledge graph database.
- Develop data models and ontologies to represent entities, relationships, and attributes within the knowledge graph, especially as applied to the development of a Provisional Cell Ontology that captures information about cell phenotypes.
- Semantically integrate these cell phenotypes with disease, drug, and other complementary information from external public graph knowledgebases (e.g., SPOKE) to facilitate the core mechanistic, diagnostic, and therapeutic discovery use cases.
- Administer tools and services that increase researcher access to NIH data and knowledge management systems, and ensure that data and metadata meet principles of FAIR (Findable, Accessible, Interoperable, Reusable) data practices to enable reproducibility, including use of containerization (Docker), notebooks (Jupyter), etc.
- Aid in the creation of strategies for the development of web-based interfaces that are user-friendly and will promote usage of organizational analytical tools and databases by experts in other data domains and the lay public.
Apply for this vacancy
What you'll need to apply
Richard H. Scheuermann
- PhD and/or MD degree in a biomedical science, computer science or related field.
- Experience participating on data service, product, or project teams.
- Experience managing taxonomies, ontologies, and controlled vocabularies, including OBO Foundry ontologies, especially the Cell Ontology, Ontology of Biomedical Investigation, and UMLS/MeSH
- Strong working knowledge of data and metadata standards and application of metadata in a biomedical repository setting, including experience with biomedical ontologies.
- Strong working knowledge of other scientific, biomedical research, and health-related terminology (e.g., SNOMED-CT).
- Strong working knowledge of commonly used ontology and terminology development tools (e.g., Protégé, Ontology Lookup Service, Ontology Development Kit)
- Strong working knowledge of Semantic Web technologies (RDF/s, OWL), query languages (SPARQL) and validation/reasoning approaches and standards.
- Familiarity with programming languages and environments used in data science, e.g., Python and/or R as well as associated programming libraries, e.g., numpy, scipy, and bioconductor.
- Familiarity with deploying, constructing, and querying data systems, including relational database management systems (RDBMS) as well as non-SQL systems (e.g., graph databases, RDF subject-predicate-object triple stores) and SPARQL and Cypher query languages.
- Familiarity with the Linux operating system as well as software development and deployment tools, e.g., Docker, Git.
- Demonstrable skills in interpersonal communication, oral and written communication, and an ability to work collaboratively in cross-functional working groups.
- Experience in maintaining relationships and/or partnerships with other institutions and vendors.
- Ability to successfully work on multiple projects simultaneously.
- Attention to detail and ability to adapt to changes in work requirements.
The NIH is dedicated to building a community in its training and employment programs and encourages the application and nomination of qualified women, minorities, and individuals with disabilities.