Mike's Notes
An excellent discussion titled Design Pattern Ontology recently occurred on the Ontolog Forum. Igor Toujilov started the thread on November 26, 2024.
It covered a lot of ground, and a point made by John Sowa is copied below.
Eventually, it led to a discussion about the relationship between data models and ontologies. I have copied below the text of many of the points raised as a valuable reference for future work on Pipi.
The contributors are;
- John Sowa
- William "Bill" Burkett
- Mike Peters
- Michael DeBellis
- David Eddy
- Paul Tyson
- Igor Toujilov
- Alex Shkotin
- Kingsley Idehen
- Elisa Kendell
- Mike Bennett
I will keep adding to these notes as more useful contributions are made.
Any errors or omissions are mine.
Resources
People mentioned
References
Repository
- Home > Ajabbi Research > Library >
-
Home > Handbook >
Last Updated
17/05/2025
Data Model and Ontology
By: Mike Peters
On a Sandy Beach: 01/12/2024
Mike is the inventor and architect of Pipi and the founder of Ajabbi.
Design Pattern Ontology
Note taken for the Ontolog Forum
John Sowa
Since Matthew isn't with us, I'll summarize some of his points which we
had discussed over many years in different ways.
He had been working at Shell Oil for years, and he had developed a
detailed ontology for the oil industry. He later generalized it to
develop a more general top level, which I considered quite good.
We had discussed issues about generalizing it even farther,
but he was reluctant to go farther in levels of abstraction. We
agreed that was a reasonable point of view.
But we also agreed that his details for the oil industry might conflict
with details for other industries, such as banking and farming.
Furthermore, he recognized that different oil companies had
different ways of representing the same terms because they had developed
different policies and procedures.
I'll also mention another widely used ontology, which evolved over a
period of about 70 years, and it is unlikely to change for a long, long
time: the ontology for making reservations for airlines, which was
later extended to cover anything related to airlines, such as hotels, car
rentals, trains, taxis, etc., etc.
And that ontology began in the 1950s with IBM's project SAGE for the
airplanes used in the Strategic Air Command over North
America. In the 1960s, IBM adapted that ontology for American
Airlines. IBM later sold the software to other airlines.
And all the other additions were made to conform to the same basic
ontology from the 1950s. 70 years later, that top level is so
entrenched that it is never going away.
Another world-wide ontology that also developed in the 1960s includes the
global weather patterns that were established by the world-wide weather
simulation programs. They use a different way of representing the
world than the reservation systems.
Fundamental problem: There will never be a single universal
ontology for representing anything and everything in the world (or the
universe).
That is a fact that any system of knowledge representation must deal
with.
Data Model and Ontology
Abbreviated notes taken from the Ontolog Forum
01/01/2025 and ongoing
William Burkett
-
Hay, D. C. (1996). Data model patterns : conventions of thought.
New York, Dorset House Pub.
-
Hay, D. C. (2006). Data model patterns : a metadata map. Amsterdam
; Boston, Elsevier Morgan Kaufmann.
-
Silverston, L. (2009). The data model resource book, Vols 1-3. New
York, John Wiley.
Mike Peters
-
Hay, D. C. (2011). UML and Data Modeling: A Reconciliation,
Technics Publications.
Mike Peters
Ontologies are represented in graph structures. Non-relational
databases like semantic or graph databases are better suited for this
job, and ontologists (I'm not one) have no problem working with
them.
However, the workforce needs conventional everyday interfaces driven
by relational databases. So, there is an import/export issue that
David Hay could have written a book about, and I wish he had. His
explanations are excellent. His book on UML and data modelling also
bridged two different ways of looking at the world.
Michael DeBellis
RE: The distinction / difference between data models & ontologies
is what...?
I think there is a difference but it isn't what many people in the
ontology community seem to think. Most of the methodologies and
guidelines I've seen for building ontologies present the ontology as a
thing in itself. For example, they often minimize or even ignore
questions such as loading data or integrating with existing systems.
In reality these are some of the most important issues to face for
ontologies used in the real world. Some of the most important
distinctions between an E/R model and an ontology are:
1) E/R models are typically used for Online Transaction Processing
(OLTP) ontology and knowledge graph models are typically used for
Online Analytic Processing (OLAP). The design of an OLTP model needs
to be optimized for response time. Thus, such models tend to be fairly
sparse, leaving most of the domain knowledge in the code of the
systems that use them. The design for an OLAP model can be much richer
and include much more knowledge about the domain in the model itself
rather than in the code because the users will tend to be working on
client machines that have more processing power and because they are
going to be executing complex queries will be a bit more patient than
someone adding a post to Facebook or doing a funds transfer on their
bank account with their phone.
2) E/R models need to worry about various normalization forms for
peak efficiency depending on what needs to be maximized. Ontologies
are implemented as graphs and don't need to consider the same kinds of
issues. Essentially when you build an ontology you are able to work at
the analysis level whereas for an E/R model you tend to work at the
design level.
3) E/R models are relatively difficult to change at run time.
Ontologies and knowledge graphs can easily be changed at run time, not
just the instance values but the schemas themselves can be changed at
run time. In this way they are more similar to NoSQL databases that
have "schema on read" rather than E/R models that have schema on
write.
Actually, I just remembered I'm putting together a table comparing
relational databases, NoSQL (Hadoop) and OWL. Here's what I have so
far. This is a work in progress:
|
Feature
|
Hadoop (HDFS + Pig/Hive)
|
SQL Databases
|
OWL Knowledge Graphs (e.g., AllegroGraph)
|
|
Schema
|
Schema on read
|
Schema on write
|
Schema on write but schema can be modified at run time
|
|
Data Model
|
File-based (raw formats)
|
Tables (rows/columns)
|
RDF triples or quads (subject-predicate-object)
|
|
Storage
|
Distributed (HDFS)
|
Centralized or sharded
|
Triple or quad stores with distributed options and
sharding
in some triplestores (e.g., AllegroGraph)
|
|
Processing
|
Batch (MapReduce)
|
Transactional (ACID)
|
Semantic reasoning and SPARQL queries. Some triplestores
also
support ACID transactions.
|
|
Querying
|
Pig Latin, HiveQL
|
SQL
|
SPARQL and Description Logic queries
|
|
Reasoning
|
None
|
None
|
SWRL, SHACL, OWL DL axioms
|
|
Flexibility
|
Flexible (unstructured, semi-structured, structured)
|
Rigid schema
|
Highly flexible with semantic annotations and Linked Data
|
|
Scalability
|
Horizontal (many nodes)
|
Vertical (more powerful nodes)
|
Horizontal or vertical, depending on implementation
|
|
Integration
|
Tools for ETL and analytics
|
Tight coupling with apps
|
Can integrate with ontologies, Linked Data, and other RDF
graphs
|
|
Best For
|
Batch processing, raw data storage
|
Transactional workloads, OLTP
|
Semantic data, reasoning, and complex relationships
|
David Eddy
You might want to add this site to your reading list.
https://www.db-engines.com/en/ranking
list of DBMSs DBMS engine tally at 417 as of 2024-01-04 Their counting
system gets a little wonky at 390+
Paul Tyson
2024 - P15Y = 2009 < 2012 (date of
R2RML recommendation)
Igor Toujilov
Mike, I would not say "Ontologies are represented in graph structures"
only. Ontologies can be represented in a wide range of formalisms,
including graphs, which are just one possible representation. For
example, there are tools to store the same ontology in different
representation formats: RDF/XML, Turtle, OWL Functional Syntax,
Manchester OWL Syntax, etc. Yes, RDF and Turtle are graph
representations. But OWL Functional and Manchester syntaxes have nothing
to do with graphs. And yet they represent the same ontology.
I also disagree that "the workforce needs conventional everyday
interfaces driven by relational databases". It depends on your system
architecture. Today many systems use No-SQL or graph databases
successfully without any need for relational databases.
In real systems, the difference between data models and ontologies
can be sharp or subtle. Some systems continue using relational
databases while performing some tasks on ontologies. Other systems
have ontologies that are tightly integrated in the production process,
so sometimes it is hard to separate the ontologies from data. And of
course, there is a wide range of systems in between of those extreme
cases.
William Burkett
My take is that the difference is primarily one of intention.
Ontology designers and “conceptual data model” designers are seeking
to create representations of the “real world”. Logical/physical data
designers are seeking to create specifications for actual data
structures for applications to store/access/use data. This is
distinction is, of course, very fuzzy and fluid because both sets of
designers are usually pursuing both of these intentions
simultaneously. If an “ontology” is specified in OWL, it is, of
course, is a self-defining “data structure” that is processable by
applications designed to use that structure – so, IMHO, there is no
objective, practical difference between them. Unless we’re talking
about box-and-line diagrams, most ontologies that we talk about here
(I think) are just special kinds of data models.
Alex Shkotin
Your table reminds me "What Goes Around Comes Around… And Around…
Michael Stonebraker and Andrew Pavlo"
We have discussed it about 2 and half hours at our last meeting
SIGMOD-Moscow. They overview RDF and graph DB, but not OWL.
Kingsley Idehen
Yes, R2RML is the component of “the SemWeb stack” designed to
describe how relations in a DBMS are represented as relations in
RDF.
Here’s a post I wrote years ago, complete with live examples,
demonstrating how to create an RDF-based Entity Relationship Graph
from CSV files located on a local filesystem accessible to a Virtuoso
instance:;
Virtuoso is a multi-model database management system in one of its
guises. Specifically, it can operate on relations represented in the
coarse-grained form typical of SQL-based DBMSs or in the finer-grained
form facilitated by RDF via its Quad Store functionality.
In my view, Ontology and Data Model are different terms
for the same concept—they’re essentially conceptual synonyms. That
said, practitioners often view them differently depending on their
operational lenses. For example, most ontologists express their
worldviews on data modeling using formal graphic or linear notations,
while data modelers (often from the RDBMS domain) favor graphical
notations such as ERD (Entity-Relationship Diagrams) or UML (Unified Modeling Language) diagrams.
Links:
-
[2]
Example Hyperlink Super-Key
A hyperlink-based super-key resolving to a specific row in a
CSV document (which could also represent any ODBC- or
JDBC-accessible table in an RDBMS)
Mike Peters
So, what options are there to map RDF to relational data?
Michael DeBellis
RE: So, what options are there to map RDF to relational data?
I was going to list what little I knew but I'm working with ChatGPT
right now, so I asked it. Here's what it said:
Implementing OWL (Web Ontology Language) in a relational database
involves translating OWL's semantic capabilities into a relational
schema and querying framework.
Several tools and approaches allow you to achieve this:
Tools and Frameworks for OWL in Relational Databases
1. RDF and SPARQL Layers on Relational Databases
-
Apache Jena with RDBMS Integration:
-
Jena provides a framework to work with RDF and OWL.
-
It supports relational database backends via
the TDB or RDB storage
layers.
-
You can map RDF triples to relational tables and query them using
SPARQL.
-
Virtuoso Universal Server:
-
Virtuoso supports RDF storage and SPARQL queries directly in a
relational database.
-
It combines a traditional RDBMS with a semantic store.
-
Useful for large-scale ontologies.
-
D2RQ:
-
A tool for exposing relational databases as virtual RDF
graphs.
-
You can use OWL over the RDF representation while keeping data in
the relational schema.
2. Ontology Mapping Tools
-
Ontop:
-
A powerful tool to map OWL ontologies to relational databases
using OBDA (Ontology-Based Data Access).
-
Allows you to query relational data through SPARQL while
maintaining OWL semantics.
-
Supports mappings through R2RML or native Ontop mapping
syntax.
-
Karma:
-
A tool for integrating data from relational databases with OWL
ontologies.
-
Supports mapping relational data to RDF and OWL constructs.
3. Hybrid Database Systems
-
GraphDB (by Ontotext):
-
A triplestore that can integrate with relational databases.
-
It allows you to store and query OWL data seamlessly alongside
relational data.
-
Oracle RDF Graph:
-
Part of Oracle Database, it supports RDF and OWL data alongside
relational data.
-
Provides SPARQL querying and reasoning capabilities.
4. Custom Implementation Approaches
-
Mapping OWL Axioms to Relational Rules:
-
Use SQL views or stored procedures to encode OWL semantics
manually.
-
Define mappings for classes, properties, and relationships in
relational terms.
-
DL-Lite Ontology Mapping:
-
If your OWL usage aligns with OWL 2 QL (a subset of OWL optimized
for relational databases), you can use DL-Lite mappings for
efficient querying.
Considerations for Choosing a Tool
1. Scale of the Ontology:
-
Large ontologies benefit from systems like Virtuoso or GraphDB,
which are optimized for scalability.
2. OWL Subset:
-
OWL 2 QL is specifically designed for integration with relational
databases. Tools like Ontop are excellent for such use cases.
3. Querying Needs:
-
If SPARQL is a requirement, tools like Ontop, Jena, or Virtuoso are
natural fits.
-
For simple SQL querying, D2RQ or custom mapping may suffice.
4. Reasoning Capabilities:
-
If you require reasoning over OWL axioms, focus on tools like
GraphDB, Oracle RDF Graph, or Jena.
Suggested Approach
If you're starting out and want robust OWL support with minimal setup,
Ontop is highly recommended. It supports direct integration with
relational databases, enables SPARQL querying, and is
well-documented.
Would you like a walkthrough on setting up any of these tools, or a
comparison of their features?
Elisa Kendell
Together with other colleagues, we’ve used a number of the tools listed
for mapping FIBO to relational for large banks successfully, including
Ontop and Jena tools, among others. FIBO does include axioms,
particularly some cardinality restrictions, that are outside of OWL RL,
but most of the semantics can be mapped without any issues. RDFOx
supports OWL RL, which is more expressive than OWL QL, and ignores
anything outside of the RL profile rather than throwing up on it. Their
team has also provided some rules that extended OWL RL for us to test
with, including approximations of the axioms in FIBO that are outside of
RL, which worked well. Stardog supports all of the semantics encoded in
FIBO, and most knowledge graph tools that support SPARQL 1.1 can import
it, though they may not support all of the reasoning encoded in the
ontologies.
RDFOx and some other knowledge graph engines prefer either Turtle or
JSON-LD to RDF/XML, which is the serialization we work in (primarily to
see all of the warts in what we are publishing). But FIBO and the other
ontology efforts I participate in publish in all three serializations –
RDF/XML, Turtle, and JSON-LD, so that we can supply whatever is needed
to a given tool/framework. Same is true of the Commons, MVF, LCC, and
other ontologies we publish at OMG - in RDF/XML and Turtle, at a
minimum. There is also a toolkit available from the EDM Council that we
use to support transformations between serializations consistently, that
we use for GitHub comparisons as well as for tool support, which we
publish as open source at
https://github.com/edmcouncil/rdf-toolkit. It’s a fairly complex Swiss army knife, with various options you can
use to manage the transformation as needed.
Kingsley Idehen
Do you mean the reverse—creating SQL RDBMS relations from RDF-based
relations? If so, note that the SPARQL query language includes a SELECT
option for projecting query solutions as tables from RDF Graphs, which can
then be fed into a SQL RDBMS.
Mike Peters
Yes, I do mean importing ontologies into relational databases. I'm not an
ontologist, but I can see the great value in using ontologies, schema and
taxonomies as read-only references in a working database.
The question is how to reliably and effectively allow users to point at
any ontology using a form (e.g., something on OBO Foundry or SnowMed) and
import it into the relational database they are logged into.
I was thinking OWL and RDF. Are both possible?
Michael DeBellis
As someone rightly pointed out in response to one of my answers, OWL is a
logical, not a graph model, and not necessarily tied to RDF. Since
OWL is a subset of First Order Logic (Description Logic) it can
directly map to a relational database rather than going through RDF first.
According to ChatGPT:
Ontop Supports mappings through R2RML or native Ontop mapping
syntax.
Mike Bennett
Well this has been a very interesting sub-thread. I'll fork here from
before the sub-thread on RDF to RDB etc. considerations.
Is "Ontology" really synonymous with, or even necessarily a kind of,
"Data Model"?
I'd say emphatically not. There are kinds of ontology that are a kind of
data model, of course, and much has been said about these in this
sub-thread.
But those are not the only things of which it can be said "This is an
ontology".
Any model has an "aboutness"; that is, "Of what is this a model"
For some models, what it is about is data: each element of the model
represents some element of data.
For some models, the aboutness is that of things in the world.
The model language or formalism, and the model aboutness, are orthogonal:
it does not necessarily follow that a model in a given language must be
about a given kind of thing. UML Class models are designed to represent
Object Oriented class constructs (with both behavioral and structural
elements), but some people use them to represent all sorts of other things
(including sometimes, things in the world). Similarly, an OWL model may
represent RDF data and usually does, since that is what it is intended
for.
Suppose someone wants to have a model of real things in the world. One
would call this an "Ontology". However, as soon as someone says they
want an ontology, various people pop up and say "I can do you an ontology"
when what they mean, as evidenced in this thread, is "I can do you an
ontology of the sort that is a kind of data model".
Maybe that's what the customer needed, maybe it's not. If the business
needs something that formally defines the meanings of things, for example
for management communication, reporting, common understandings (in place
of word-dependent dictionaries or glossaries) and so on, or if they want
something for AI to process, then the chances are they need an ontology of
the sort that represents real things in the world. All too often they get
given an Ontology-as-data-model because someone thinks that is the only
sort of ontology there is.
There are some questions, the answer to which is not a data model.
Let's consider 2 things:
-
Basic engineering best practice
-
Practical examples of how these kinds of ontology are different.
Good engineering follows a separation of concerns. Artifacts that
represent the customer or business view, for example defining what the
customer wants or what their world looks like, should always be expressed
independently of any assumptions about the design techniques or
technologies that will be used in crafting a solution.
For example a business process model represents the activities that the
business carries out, independently of any software design to automate
these.
The reason is that (a) things are represented without presuming anything
about the solution and (b) the solution can then be validated against that
design-independend artifact. That's basic QA.
Similarly, a data model is a kind of design (typically done at 2 levels:
Platform Independent and Platform Specific, both of which are still
designs).
The corresponding design-independent artifact is a kind of ontology: one
in which the real-world meanings of the things of interest to the business
are expressed. In other words, what does it mean to be this kind of
thing?
Traditionally that's been done with words. But words are slippery. Better
to use formal logic.
So there are ontologies which are a kind of data model, and there are
ontologies which are a representation of things in the world. Both are
needed, at these different levels in the development method, and with
linkages between them.
A practical example of the difference is the best way to understand this
distinction between these kinds of ontology.
And the difference is best illustrated with an example of where it went
wrong.
The difference is between what we call "Truth Makers" and data. Truth
Makers are what it takes for something to be defined as being a member of
a given class of Thing. These are the necessary and sufficient conditions
for a thing in the world to be a member of that class. Most of these are
either physical matters such as physics and chemistry, or legal and social
constructs such as legal capacities, value and so on (mainly classifiable
under Searle's Ontology of Social Constructs - an ontology which is
definitely not a data model; it's a book). A very few things get their
meaning from data itself, as a kind of thing.
An ontology of things in the world (let's call this a Concept Ontology)
defines things using those truth makers.
An ontology as a kind of data model looks for data surrogates for those
things in the world: what data can you expect to find when this or that
legal capacity, physical quantity value etc. is in play?
Example: suppose we consider what it means to be a bank. Very loosely,
this is something with certain legal capacities, such as the capacity to
take on funds, the capacity to disburse those funds and so on.
In one project I was involved with, the class "Bank" was defined using a
data element for "FDIC Insurance", a kind of insurance that all banks in
the US must carry. Then it was noticed that the DTCC, a clearing house,
also carries FDIC Insurance, and so a different data item was sought
instead.
There were two errors, one inside the other. The first, proximate error
was that they chose the wrong data surrogate. The error inside the error
(the ultimate error) was that they did not realize they were making a
design decision for a data surrogate. Therefore, that design decision was
not peer reviewed and was only discovered later (costing time etc. to fix
which is another reason we have separation of concerns).
The model was updated to use the more correct data surrogate of Banking
License. This reliably exists whenever the legal capacities for something
to be a bank exists, and doesn't when it doesn't.
Of course that would not work in all use cases: if the requirement was to
detect when some entity was acting as a bank when it should not be, you
would look for data about the entity's behavior instead. Different use
cases may give rise to different data surrogate design decisions.
And that's why, while ontology-as-data-model is an extremely valuable
kind of data model, the same kind of engineeering integrity should go into
their design as into the design of anything else, including the provision
of ontologies of the target domain subject matter (subject to scope),
against which these can be designed, against which design decisions can be
reviewed, and against which the end result can be tested.
Those are ontologies too. These ideally use formal logic because words
are too slippery to be relied upon. But just because a concept ontology is
framed using formal logic, does not make it a data model (logic has been
around a lot longer than computational data).
Kingsley Idehen
You can use SPARQL SELECT, as I described against a collection of relations
(comprising terms from both the RDF and OWL ontologies/vocabularies) to
insert data into relations (colloquially referred to as tables) managed by
an RDBMS.
John Sowa
As Yogi Berrra said, these discussions are "Deja vu all over again."
Re "No SQL": The person who coined that term, rewrote it as Not Only
SQL The original SQL was designed for data that is best
organized in a table. The fact that other data might be better
represented in other formats does not invalidate the use of tables for data
that is naturally tabular.
Re tree structure in ontologies: A tree structure for the
NAMES of an ontology does NOT imply that the named data
happens to be a tree. Some of the data might be organized in a
tree, but other data might be better organized in a table, list, vector,
matrix, tensor, graph, multidimensional shapes, or combinations of all of
them.
The following survey article was written about 40 years of developments
from 1970 to 2010. Some new methods have been invented since then, but
90% of the discussions are about new names for old ideas re-invented by
people who didn't know the history. I wrote the survey, but 95%
of the links are to writings by other people;
https://jfsowa.com/ikl
.
And by the way, I agree with Bill Burkett (on the list down below).
He is one of the people I collaborated with on various committees in the
past many years. We viewed the Deja Vu over and over and over.
That's one reason why I don't get excited by new names.
Alex Shkotin
What is an idea to import ontology into RDB? Just to store it? Or to use it
as a schema for RDB?
And if the first do you need to keep it structurally or just as a
blob?
Mike Peters
The idea is to feed Pipi 9 structured data from versioned external
references, such as ontologies, taxonomies, XML Schemas, CSV, etc.
This data then becomes bits of relational database schema or is used to
populate the tables.
This needs to be an automated process that is highly reliable. It's like
using an external API.
So, using a silly made-up example of what I want to end up with.
Ontology-Imports-Table
----------------------------------
ID | Source | Version | Thing 1 | Relation | Thing 2
1 | obofoundary-example.owl | 5 | elf | worksFor | Santa
2 | obofoundary-example.owl | 5 | rudolf | isA | Reindeer
3 | obofoundary-example.owl | 5 | mary | isA | Elf
4 | obofoundary-example.owl | 6 | mary | isA | RetiredElf
5 | obofoundary-periodicTable.rdf | 1 | Plutonium | isA | Chemical Element
6 | movieLab.rdf | 5 | Camera | hasA | Camera Lens
7| movieLab.xml | 10 | DSMC2 Gemini 5K S35 | isA | Camera
Depending on user requirements, this could be used to generate;
Camera-Table
Camera-Lens-Table
ChemicalElement-Table
etc
Or
Populate a table with read-only records.
Alex Shkotin
Why not ontology about ontologies like discussed here