Job
Postdoctoral Fellowship - Graph Database Developer
Organization
National Library of Medicine, National Institutes of Health, Bethesda, MD and surrounding area
Scientific focus area
Computational Biology
About the position
Come join a dynamic interdisciplinary team advancing the boundaries of single cell genomics and machine learning with Dr. Richard Scheuermann, the Scientific Director of the National Library of Medicine (NLM) located on the NIH Campus in Bethesda, Maryland. NLM is the world’s largest biomedical library and a leader in research, development, and training in biomedical informatics and health information technology. NLM is legislatively mandated to support the essential work of acquiring, organizing, preserving, and disseminating biomedical information, a field that is changing at a more rapid pace than ever before. NLM plays a pivotal role in translating biomedical research into practice. NLM’s research and information services support scientific discovery, health care, and public health, enabling researchers, clinicians, and the public to use the vast wealth of biomedical data to improve health. The NLM Intramural Research Program (IRP) develops and applies computational approaches to a broad range of information problems in biology, biomedicine, and human health. The Graph Database Developer will be a member of an interdisciplinary project team developing a Cell Phenotype Knowledge Base (CTKB) as a definitive public reference resource of information about cell phenotypes, including cell types, cell states and developmental trajectories, using Linked Open Data (LOD) approaches. CTKB will be designed to support three core disease-related use cases: diagnostic biomarker discovery, therapeutic target identification, and mechanistic insight exploration, with an initial focus on priority disease processes in three physiological systems: respiratory diseases of the lung, neurodegeneration of the nervous system, and autoimmune and inflammatory diseases of the immune system. The Graph Database Developer will collaborate with a biomedical ontologist to develop a FAIR-compliant cell phenotype representational model (a semantic schema) based on OBO Foundry ontologies and related standards and an extraction, translation, and loading (ETL) protocol for translating processed assay results, including transcriptional biomarkers produced using a standardized machine learning pipeline, and experiment metadata from the datasets selected into standardized semantically-structured assertions (SSS assertions) about cell phenotypes for loading into the CTKB graph knowledgebase. The Graph Database Developer will integrate these CTKB cell phenotypes with disease, drug, and other complementary information from public data repositories managed by the NLM National Center for Biotechnology Information (NCBI) to facilitate the core mechanistic, diagnostic, and therapeutic discovery use cases. The Graph Database Developer will also collaborate with a team of software developers to develop and implement an intuitive user-friendly query, visualization, and analysis interface for semantic network exploration, graphical machine learning pattern discovery, and computational comparison of new datasets for cell type matching, with a focus on maximizing user experience (UX). The end delivered product will be an open access reference knowledgebase about healthy and diseased cell phenotypes designed to meet the needs of the general biomedical research community. Specific tasks include: • Provide guidance on use of information retrieval and extraction techniques to increase the utility of unstructured, semi-structured, and structured data, including application of semantic and ontology-based methods for organization and query of scientific knowledge. • Lead the creation of strategies for the deployment, construction, and querying of data systems, including relational database management systems (RDBMS), NoSQL systems (e.g., graph databases, RDF subject-predicate-object triple stores) and SPARQL, Cypher, and GraphML query languages to gather and analyze biomedical research data. • Collaborate with the development team to design, develop, and maintain knowledge graph databases for intelligence analysis purposes. • Collaborate with intelligence analysts and subject matter experts to understand requirements and translate them into effective database designs. • Support implementation of the data integration processes to extract, transform, and load structured and unstructured data from various sources into the knowledge graph database. • Collaborate to develop data models and ontologies to represent entities, relationships, and attributes within the knowledge graph. • Develop algorithms and techniques for relationship mapping, clustering, and trend analysis within the knowledge graph. • Ensure data quality and integrity by implementing data validation and cleansing procedures. • Optimize query performance and implement indexing strategies to enhance database retrieval and analysis capabilities. • Collaborate with software developers to integrate the knowledge graph database into analytical tools and platforms. • Keep abreast of the latest developments in knowledge graph technologies and suggest creative solutions to enhance database capabilities. • Administer tools and services that increase researcher access to NIH data and knowledge management resources, and ensure that data and metadata meet principles of FAIR (Findable, Accessible, Interoperable, Reusable) data practices to enable reproducibility, including use of containerization (Docker/Singularity), notebooks (Jupyter), etc. • Aid in the creation of strategies for the development of web-based interfaces that are user-friendly and will promote usage of organizational analytical tools and databases by experts in other data domains and the lay public.
Apply for this vacancy
What you'll need to apply
Prospective candidates are encouraged to submit the following application materials to Dr. Richard Scheuermann at richard.scheuermann@nih.gov, with a copy to nlmirpjobs@mail.nih.gov. Please ensure you reference the position title “Graph Database Developer Fellowship” in your cover letter and/or e-mail subject line. • Current curriculum vitae • Cover letter/statement of research interest • Contact information for three references
Contact name
Richard Scheuermann, PhD
Contact email
Qualifications
- Doctoral degree in biomedical science, computer science, or a related field.
- Experience collaborating within data service, product, and project teams.
- Proficient in deploying, and querying data systems, encompassing relational database management systems (RDBMS) and non-SQL systems such as Neo4j or ArangoDB graph databases, RDF subject-predicate-object triple stores, and proficiency in SPARQL, Cypher, and/or GraphML query languages.
- Knowledge of scientific, biomedical research, and health-related terminologies.
- Strong working knowledge of data and metadata standards and application of metadata in a biomedical repository setting, including experience with biomedical ontologies.
- Experience working with taxonomies, ontologies, and controlled vocabularies, including OBO Foundry ontologies, especially the Cell Ontology, Ontology of Biomedical Investigation, and UMLS/MeSH.
- Strong working knowledge of Semantic Web technologies (RDF/s, OWL), query languages (SPARQL) and validation/reasoning approaches and standards
- Familiarity with other scientific, biomedical research, and health-related terminology (e.g., SNOMED-CT).
- Familiarity with programming languages and environments used in data science, e.g., Python and/or R as well as associated programming libraries, e.g., numpy, scipy, and bioconductor.
- Familiarity with the Linux operating system as well as software development and deployment tools, e.g., Docker, Git.
- Demonstrable skills in interpersonal communication, oral and written communication, and an ability to work collaboratively in cross-functional working groups.
- Experience in maintaining relationships and/or partnerships with other institutions and vendors.
- Capability to handle multiple projects concurrently, with meticulous attention to detail and adaptability to changing work requirements.
View more jobs
View all jobs-
Post Doctoral Scholar
-
Post Doctoral Scholar
-
Staff Scientist 1 - Sequence Read Archive Development Team (SRADT)
The National Institutes of Health (NIH), National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), Information Engineering Branch (IEB) performs applied research in data representation and analysis, including the development of computer-based systems for the storage, management, and retrieval of knowledge relating to molecular biology, genetics, and biochemistry.