Syntax Guide for Turtle, SPARQL, and SQL in FHIR RDF and OMOP CDM

Syntax

Welcome to our syntax page! In this page, we delve into the intricacies of the syntaxes of Turtle, SPARQL, and SQL – that are fundamental to data representation and manipulation in FHIR RDF and OMOP CDM.

Turtle Syntax

We invite you to explore Turtle (Terse RDF Triple Language), a specialized syntax designed for articulating data in the RDF (Resource Description Framework) format. Turtle, while sharing some syntactic similarities with SPARQL (SPARQL Protocol and RDF Query Language), stands out due to its unique application and functionality. It is primarily used for representing and conveying RDF data efficiently, offering a format that is both compact and easily understandable.

Key Elements of Turtle/RDF Syntax:

IRIs: IRIs are a superset of URLs that are utilized to identify resources, properties, or values, IRIs in Turtle are typically enclosed in angle brackets, like <>. Commonly IRIs take the form of URLs when we deal with web resources.
Prefixes: In Turtle, these are employed to abbreviate lengthy IRI namespaces, enhancing the readability.
Triples: The core of RDF syntax, a triple consists of a subject (the resource or entity being described), a predicate (defining the connection or association between subject and object), and an object.
Literals: Often used as objects in RDF, literals are enclosed in quotation marks. They represent values such as strings, numbers, and dates.
Other links and References:

Illustrative screenshot showing Turtle syntax for a FHIR Condition resource and its corresponding graphical representation. The image features PREFIX declarations for namespaces such as FHIR, OWL, and XMLSchema at the top.
Below, the syntax details a Condition resource identified by 'f201'. The attributes include an identifier '12345', a coded condition 'Fever' with SNOMED code '386661006', subject reference to 'Patient/f201', onset date '2013-04-02', abatement around 'April 9, 2013',
and recorded date '2013-04-04'. Each attribute in the Turtle syntax is visually mapped to a graphical diagram on the right, showing the structure of the resource with links and nodes labeled to reflect elements like 'id', 'identifier', 'code', 'subject', and date fields.

https://hl7.org/fhir/R4/condition-example-f201-fever.ttl2.html

Note: simplified graph does not represent accurate IRIs in FHIR RDF. See other display options below for fully formed IRIs.

FHIR R4 (simplified graph) FHIR R4 FHIR R5

This section describes the triple patterns that http://hl7.org/fhir/Condition/f201 has and the locations where the values of those objects can be found.

Turtle Syntax key points

In Turtle and RDF, a is a predicate that is shorthand for rdf:type or <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> when written as an absolute IRI
Many of the lines end with a semicolon ";" rather than a period. The semicolon in Turtle syntax serves as a separator between predicate-object pairs within a subject block.

When a subject has multiple predicate-object pairs associated with it, using semicolons presents the relationships in a more compact form.
If there is only one predicate-object pair for a subject, the semicolon can be omitted, and the triple should be ended with a period "."

A similar pattern is observed by using commas "," to continue associating objects to the preceding subject and predicate.
The term fhir:value appears frequently above. In FHIR, fhir:value is used to denote the value of a specific property or element within a FHIR resource. It is commonly combined with a series of other property qualifiers to provide additional context or constraints for a value.

Below are some additional line-by-line examples of Turtle Syntax

Example 1:

Turtle syntax and graphical mapping for a FHIR Patient resource. The image displays PREFIX declarations for FHIR, OWL, XMLSchema, and RDF at the top.
Below, the Turtle syntax details attributes of a Patient resource, including 'gender' set to 'male', 'birthDate' on '1960-03-13', 'deceasedBoolean' marked as 'false', and communication preferences indicating a language of 'Dutch' coded under the system 'urn:ietf:bcp:47' and code 'nl-NL'.
The managing organization is referenced as 'Organization/f201' with the display name 'AUMC'. Each attribute from the Turtle syntax is correspondingly illustrated in a graphical diagram on the right, showing the connections between subjects, predicates, and objects to visually depict the data structure

https://hl7.org/fhir/R4/patient-example-f201-roel.ttl.html

Note: simplified graph does not represent accurate IRIs in FHIR RDF. See other display options below for fully formed IRIs.

FHIR R4 (simplified graph) FHIR R4 FHIR R5

Example 2:

Turtle syntax and graphical representation for a FHIR Medication resource. The image displays PREFIX declarations at the top for namespaces such as FHIR, OWL, and XMLSchema.
Below, the Turtle syntax details a Medication resource identified as 'med0309'. This includes attributes like medication code, manufacturer, form, ingredient, and ingredient strength. Specifically, the medication code '50580-506-02' from the NDC system represents 'Tylenol PM'.
The manufacturer is referenced by '#org2', the form is identified as a 'Film-coated tablet' with SNOMED code '385057009', and the active ingredient is 'Acetaminophen 500 MG' with RXNORM code '315266'. The ingredient strength is quantified with a ratio: 500 mg as the numerator and 1 tablet as the denominator.
The diagram on the right visually represents these components, mapping the structure of the resource with connecting lines to demonstrate the relationships between the subject and object nodes along with their respective values.

https://hl7.org/fhir/R4/medicationexample0309.ttl.html

Note: simplified graph does not represent accurate IRIs in FHIR RDF. See other display options below for fully formed IRIs.

FHIR R4 (simplified graph) FHIR R4 FHIR R5

SPARQL Syntax

Syntax and Structure of SPARQL

Similar to SQL queries, SPARQL queries consist of several clauses, including but not limited to SELECT, WHERE, OPTIONAL, and FILTER. These clauses work together to retrieve and filter data from RDF graphs, allowing for powerful and precise data manipulation.
The SELECT clause details what data to retrieve, while the WHERE clause specifies the data's patterns and conditions. OPTIONAL allows for the inclusion of data that may not meet all query conditions, and FILTER refines the results to meet specified criteria. Similar to Turtle, it's common to begin with a prefix section to simplify writing commonly used IRIs/URLs and enhance readability.

TIP: see more of the SPARQL syntax used in the OMOP VKG Demo Playground

SPARQL Example 1

Image illustrating a SPARQL query structure for querying a FHIR RDF graph. The left side displays the query text, starting with PREFIX declarations for namespaces such as rdf, rdfs, fhir, and w5, enhancing the readability and manageability of the query. The core of the query, 'SELECT * WHERE { }', aims to retrieve all variables where specified conditions are met,
including '?p a fhir:Patient', indicating that '?p' is a type of FHIR Patient, and additional conditions to query patients' gender and birthDate properties. On the right, a detailed explanation outlines the purpose and function of each segment: the PREFIX section for namespace shortcuts, the SELECT WHERE statement for data retrieval from the RDF graph, and the triple patterns used to query the patient's gender and birthDate properties.
A 'LIMIT 10' clause at the bottom of the query limits the result set to the first 10 entries.

Some triple patterns include square brackets "[]", which denote blank nodes, allowing for the extraction of values where explicit data may not be available. This is illustrated in the line ?p fhir:gender[fhir:value ?gender], where ?p is a variable representing a patient, and the blank node provides a flexible way to query associated gender values.

TODO: consult and see how we want to explain why blank nodes are used in FHIR compared to a literal

SPARQL Example 2

In many cases, you may want to explore the characteristics of a resource that is the object of another resource. For this, the fhir:reference and fhir:link predicate is used to establish the connection.
For example, to examine the characteristics of a patient associated with a particular condition, the query would be constructed as follows:

Image illustrating a complex SPARQL query structured for querying FHIR RDF data, accompanied by a detailed breakdown of its components. The left side of the image shows the query text, starting with PREFIX declarations for rdf, rdfs, fhir, and w5 namespaces, which help enhance the query's readability and specificity.
The main query, 'SELECT DISTINCT ?condition ?code ?system ?display ?patient ?gender ?birthdate WHERE { }', is designed to retrieve distinct records detailing conditions with their code, system, display values, and associated patient details including gender and birthdate. It specifies various conditions such as '?condition a fhir:Condition' to identify resources as FHIR Conditions;
'?condition fhir:Condition.code' with related sub-patterns to extract coding details; '?condition fhir:Condition.subject' linking to the patients; and employs a FILTER to refine results based on the condition type, using regex to identify descriptions that contain 'acute'. The right side of the image provides a breakdown explanation of each part of the query, detailing the roles of variable bindings, triple patterns,
and FILTER usage, clarifying the structure and logic of the query.

In the example above, semicolons ";" are used at the end of lines within the SPARQL query. A semicolon indicates that the following predicate-object pairs are associated with the same subject from the previous line. This allows for the grouping of related information about a single subject neatly. For instance:
In the line
?patient a fhir:Patient; fhir:Patient.gender [ fhir:value ?gender ] ; fhir:Patient.birthDate [ fhir:value ?birthdate ] .
This query snippet is seeking to retrieve patient gender and birthdate information where the subject ?patient has associated values the gender and birthdate properties.

SPARQL's syntax is designed to accommodate complex queries within the semantic web, offering intricate mechanisms for pattern matching, optional values, and condition filtering. Its capabilities are essential for querying and navigating the interconnected data found in RDF triples.

SQL Syntax

SQL, or Structured Query Language, is the most widely adopted language used to query relational databases. It encompasses a range of operations from data querying to database modification and management. While a standard has been declared, many different implementations do not fully cover the full standard and introduce their own idiosyncrasies.

Common Key SQL clauses for querying data include:

SELECT: Specifies the columns to be displayed in the query results.
FROM: Indicates the tables from which data is to be retrieved.
WHERE: Applies conditions to filter the dataset.
JOIN: Combines rows from multiple tables based on a related column.
GROUP BY: Aggregates rows with identical values in certain columns.
ORDER BY: Sorts the results in either ascending (default) or descending order.
LIMIT: Sets a cap on the number of rows to be returned in the result set.

For example, if we want to use SQL to query OMOP CDM to locate the name and age of female patients, we can use the following query.

Illustrates a basic SQL SELECT WHERE statement. The left side of the image features a SQL query:
'SELECT name, age FROM patient WHERE gender = 'Female' ORDER BY age;'. This query is aimed at selecting the name and age from a 'patient' table where the gender is Female, ordering the results by age.
On the right side, an explanatory guide details each part of the statement: 'SELECT' clause specifies the columns to retrieve; 'FROM' identifies the table to query; 'WHERE' applies a condition filtering for Female gender; 'ORDER BY' sorts the results by age.

Healthcare professionals often encounter complex data tasks, such as identifying the most recent patient observations. For such tasks, if a date record column is present in the database, use the MAX function to select the latest date, pinpointing the most current observations.

Image illustrating a more complex SQL query structure used to retrieve patient data from a relational database. The left side of the image features an SQL query that selects the patient ID, name, and observation values, using a JOIN operation between the patient and observation tables based on patient ID.
The query filters observations for a given concept ID and fetches the most recent observation date using a subquery. The SQL syntax includes: SELECT, FROM, JOIN, WHERE, and a subquery within WHERE to determine the maximum observation date. The right side of the image provides explanation of each SQL clause: the SELECT clause specifies the columns to retrieve;
FROM indicates the table source; JOIN describes how tables are combined; WHERE sets the conditions for the data retrieval; and the subquery is used to limit results to the most recent observations.

SQL's versatility extends beyond querying. It includes clauses like UPDATE and SET for modifying data, as well as commands like CREATE TABLE, ALTER TABLE, DROP TABLE, INSERT INTO, and DELETE for database structure management. Its powerful and straightforward approach to database manipulation makes SQL indispensable for relational database management.

Summary

SQL, SPARQL, and Turtle each play a vital role in the healthcare industry, empowering professionals with diverse capabilities for querying and managing data. SQL's robust transactional commands make it ideal for structured, table-oriented data management in relational databases. On the other hand, SPARQL and Turtle shine in the semantic web domain; SPARQL's sophisticated querying enables intricate searches across complex data relationships, while Turtle's syntax provides a human-readable way to represent RDF data, making semantic data more accessible.
These languages represent different facets of data handling: SQL for traditional database management, SPARQL for navigating RDF data within linked data structures, and Turtle for concisely conveying RDF content. Together, they offer a comprehensive set of tools that address the specific needs of the evolving digital landscape in healthcare data management.