Data Model and Ontology

Mike's Notes

An excellent discussion titled Design Pattern Ontology recently occurred on the Ontolog Forum. Igor Toujilov started the thread on November 26, 2024.

It covered a lot of ground, and a point made by John Sowa is copied below.

Eventually, it led to a discussion about the relationship between data models and ontologies. I have copied below the text of many of the points raised as a valuable reference for future work on Pipi.

The contributors are;

  • John Sowa
  • William "Bill" Burkett
  • Mike Peters
  • Michael DeBellis
  • David Eddy
  • Paul Tyson
  • Igor Toujilov
  • Alex Shkotin
  • Kingsley Idehen
  • Elisa Kendell
  • Mike Bennett

I will keep adding to these notes as more useful contributions are made.

Any errors or omissions are mine.

Resources

People mentioned

Design Pattern Ontology

Note taken for the Ontolog Forum

1/12/2024

John Sowa

Since Matthew isn't with us, I'll summarize some of his points which we had discussed over many years in different ways.

He had been working at Shell Oil for years, and he had developed a detailed ontology for the oil industry.  He later generalized it to develop a more general top level, which I considered quite good.   We had discussed issues about generalizing it even farther,  but he was reluctant to go farther in levels of abstraction.  We agreed that was a reasonable point of view.

But we also agreed that his details for the oil industry might conflict with details for other industries, such as banking and farming.   Furthermore, he recognized that different oil companies had different ways of representing the same terms because they had developed different policies and procedures.

I'll also mention another widely used ontology, which evolved over a period of about 70 years, and it is unlikely to change for a long, long time:  the ontology for making reservations for airlines, which was later extended to cover anything related to airlines, such as hotels, car rentals, trains, taxis, etc., etc.

And that ontology began in the 1950s with IBM's project SAGE for the airplanes used in the Strategic Air Command  over North America.  In the 1960s, IBM adapted that ontology for American Airlines.  IBM later sold the software to other airlines.   And all the other additions were made to conform to the same basic ontology from the 1950s.  70 years later, that top level is so entrenched that it is never going away.

Another world-wide ontology that also developed in the 1960s includes the global weather patterns that were established by the world-wide weather simulation programs.  They use a different way of representing the world than the reservation systems.

Fundamental problem:  There will never be a single universal ontology for representing anything and everything in the world (or the universe).

That is a fact that any system of knowledge representation must deal with.

Data Model and Ontology

Abbreviated notes taken from the Ontolog Forum

01/01/2025 and ongoing

William Burkett

  • Hay, D. C. (1996). Data model patterns : conventions of thought. New York, Dorset House Pub.
  • Hay, D. C. (2006). Data model patterns : a metadata map. Amsterdam ; Boston, Elsevier Morgan Kaufmann.
  • Silverston, L. (2009). The data model resource book, Vols 1-3. New York, John Wiley.

Mike Peters

  • Hay, D. C. (2011). UML and Data Modeling: A Reconciliation, Technics Publications.

Mike Peters

Ontologies are represented in graph structures. Non-relational databases like semantic or graph databases are better suited for this job, and ontologists (I'm not one) have no problem working with them.

However, the workforce needs conventional everyday interfaces driven by relational databases. So, there is an import/export issue that David Hay could have written a book about, and I wish he had. His explanations are excellent. His book on UML and data modelling also bridged two different ways of looking at the world.

Michael DeBellis

RE: The distinction / difference between data models & ontologies is what...?

I think there is a difference but it isn't what many people in the ontology community seem to think. Most of the methodologies and guidelines I've seen for building ontologies present the ontology as a thing in itself. For example, they often minimize or even ignore questions such as loading data or integrating with existing systems. In reality these are some of the most important issues to face for ontologies used in the real world. Some of the most important distinctions between an E/R model and an ontology are:

1) E/R models are typically used for Online Transaction Processing (OLTP) ontology and knowledge graph models are typically used for Online Analytic Processing (OLAP). The design of an OLTP model needs to be optimized for response time. Thus, such models tend to be fairly sparse, leaving most of the domain knowledge in the code of the systems that use them. The design for an OLAP model can be much richer and include much more knowledge about the domain in the model itself rather than in the code because the users will tend to be working on client machines that have more processing power and because they are going to be executing complex queries will be a bit more patient than someone adding a post to Facebook or doing a funds transfer on their bank account with their phone.

2) E/R models need to worry about various normalization forms for peak efficiency depending on what needs to be maximized. Ontologies are implemented as graphs and don't need to consider the same kinds of issues. Essentially when you build an ontology you are able to work at the analysis level whereas for an E/R model you tend to work at the design level.

3) E/R models are relatively difficult to change at run time. Ontologies and knowledge graphs can easily be changed at run time, not just the instance values but the schemas themselves can be changed at run time. In this way they are more similar to NoSQL databases that have "schema on read" rather than E/R models that have schema on write.

Actually, I just remembered I'm putting together a table comparing relational databases, NoSQL (Hadoop) and OWL. Here's what I have so far. This is a work in progress:

Feature

Hadoop (HDFS + Pig/Hive)

SQL Databases

OWL Knowledge Graphs (e.g., AllegroGraph)

Schema

Schema on read

Schema on write

Schema on write but schema can be modified at run time

Data Model

File-based (raw formats)

Tables (rows/columns)

RDF triples or quads (subject-predicate-object)

Storage

Distributed (HDFS)

Centralized or sharded

Triple or quad stores with distributed options and sharding

in some triplestores (e.g., AllegroGraph)

Processing

Batch (MapReduce)

Transactional (ACID)

Semantic reasoning and SPARQL queries. Some triplestores also

support ACID transactions.

Querying

Pig Latin, HiveQL

SQL

SPARQL and Description Logic queries

Reasoning

None

None

SWRL, SHACL, OWL DL axioms

Flexibility

Flexible (unstructured, semi-structured, structured)

Rigid schema

Highly flexible with semantic annotations and Linked Data

Scalability

Horizontal (many nodes)

Vertical (more powerful nodes)

Horizontal or vertical, depending on implementation

Integration

Tools for ETL and analytics

Tight coupling with apps

Can integrate with ontologies, Linked Data, and other RDF graphs

Best For

Batch processing, raw data storage

Transactional workloads, OLTP

Semantic data, reasoning, and complex relationships

David Eddy

You might want to add this site to your reading list. https://www.db-engines.com/en/ranking list of DBMSs DBMS engine tally at 417 as of 2024-01-04 Their counting system gets a little wonky at 390+

Paul Tyson

2024 - P15Y = 2009 < 2012 (date of R2RML recommendation)

Igor Toujilov

Mike, I would not say "Ontologies are represented in graph structures" only. Ontologies can be represented in a wide range of formalisms, including graphs, which are just one possible representation. For example, there are tools to store the same ontology in different representation formats: RDF/XML, Turtle, OWL Functional Syntax, Manchester OWL Syntax, etc. Yes, RDF and Turtle are graph representations. But OWL Functional and Manchester syntaxes have nothing to do with graphs. And yet they represent the same ontology.

I also disagree that "the workforce needs conventional everyday interfaces driven by relational databases". It depends on your system architecture. Today many systems use No-SQL or graph databases successfully without any need for relational databases.

In real systems, the difference between data models and ontologies can be sharp or subtle. Some systems continue using relational databases while performing some tasks on ontologies. Other systems have ontologies that are tightly integrated in the production process, so sometimes it is hard to separate the ontologies from data. And of course, there is a wide range of systems in between of those extreme cases.

William Burkett

My take is that the difference is primarily one of intention. Ontology designers and “conceptual data model” designers are seeking to create representations of the “real world”. Logical/physical data designers are seeking to create specifications for actual data structures for applications to store/access/use data. This is distinction is, of course, very fuzzy and fluid because both sets of designers are usually pursuing both of these intentions simultaneously. If an “ontology” is specified in OWL, it is, of course, is a self-defining “data structure” that is processable by applications designed to use that structure – so, IMHO, there is no objective, practical difference between them. Unless we’re talking about box-and-line diagrams, most ontologies that we talk about here (I think) are just special kinds of data models.

Alex Shkotin

Your table reminds me "What Goes Around Comes Around… And Around… Michael Stonebraker and Andrew Pavlo"

We have discussed it about 2 and half hours at our last meeting SIGMOD-MoscowThey overview RDF and graph DB, but not OWL.

Kingsley Idehen

Yes, R2RML is the component of “the SemWeb stack” designed to describe how relations in a DBMS are represented as relations in RDF.

Here’s a post I wrote years ago, complete with live examples, demonstrating how to create an RDF-based Entity Relationship Graph from CSV files located on a local filesystem accessible to a Virtuoso instance:;

Virtuoso is a multi-model database management system in one of its guises. Specifically, it can operate on relations represented in the coarse-grained form typical of SQL-based DBMSs or in the finer-grained form facilitated by RDF via its Quad Store functionality.

In my view, Ontology and Data Model are different terms for the same concept—they’re essentially conceptual synonyms. That said, practitioners often view them differently depending on their operational lenses. For example, most ontologists express their worldviews on data modeling using formal graphic or linear notations, while data modelers (often from the RDBMS domain) favor graphical notations such as ERD (Entity-Relationship Diagrams) or UML (Unified Modeling Language) diagrams.

Links:

  • [2] Example Hyperlink Super-Key
    A hyperlink-based super-key resolving to a specific row in a CSV document (which could also represent any ODBC- or JDBC-accessible table in an RDBMS)

Mike Peters

So, what options are there to map RDF to relational data?

Michael DeBellis

RE: So, what options are there to map RDF to relational data?

I was going to list what little I knew but I'm working with ChatGPT right now, so I asked it. Here's what it said:

Implementing OWL (Web Ontology Language) in a relational database involves translating OWL's semantic capabilities into a relational schema and querying framework. 

Several tools and approaches allow you to achieve this:

Tools and Frameworks for OWL in Relational Databases

1. RDF and SPARQL Layers on Relational Databases
  • Apache Jena with RDBMS Integration:
    • Jena provides a framework to work with RDF and OWL.
    • It supports relational database backends via the TDB or RDB storage layers.
    • You can map RDF triples to relational tables and query them using SPARQL.
    • Virtuoso Universal Server:
      • Virtuoso supports RDF storage and SPARQL queries directly in a relational database.
        • It combines a traditional RDBMS with a semantic store.
        • Useful for large-scale ontologies.
    • D2RQ:
      • A tool for exposing relational databases as virtual RDF graphs.
      • You can use OWL over the RDF representation while keeping data in the relational schema.
    2. Ontology Mapping Tools
    • Ontop:
      • A powerful tool to map OWL ontologies to relational databases using OBDA (Ontology-Based Data Access).
      • Allows you to query relational data through SPARQL while maintaining OWL semantics.
      • Supports mappings through R2RML or native Ontop mapping syntax.
    • Karma:
      • A tool for integrating data from relational databases with OWL ontologies.
      • Supports mapping relational data to RDF and OWL constructs.
    3. Hybrid Database Systems
    • GraphDB (by Ontotext):
      • A triplestore that can integrate with relational databases.
      • It allows you to store and query OWL data seamlessly alongside relational data.
    • Oracle RDF Graph:
      • Part of Oracle Database, it supports RDF and OWL data alongside relational data.
      • Provides SPARQL querying and reasoning capabilities.
    4. Custom Implementation Approaches
    • Mapping OWL Axioms to Relational Rules:
      • Use SQL views or stored procedures to encode OWL semantics manually.
      • Define mappings for classes, properties, and relationships in relational terms.
    • DL-Lite Ontology Mapping:
      • If your OWL usage aligns with OWL 2 QL (a subset of OWL optimized for relational databases), you can use DL-Lite mappings for efficient querying.

    Considerations for Choosing a Tool

    1. Scale of the Ontology:
    • Large ontologies benefit from systems like Virtuoso or GraphDB, which are optimized for scalability.
    2. OWL Subset:
    • OWL 2 QL is specifically designed for integration with relational databases. Tools like Ontop are excellent for such use cases.
    3. Querying Needs:
    • If SPARQL is a requirement, tools like Ontop, Jena, or Virtuoso are natural fits.
    • For simple SQL querying, D2RQ or custom mapping may suffice.
    4. Reasoning Capabilities:
    • If you require reasoning over OWL axioms, focus on tools like GraphDB, Oracle RDF Graph, or Jena.

    Suggested Approach

    If you're starting out and want robust OWL support with minimal setup, Ontop is highly recommended. It supports direct integration with relational databases, enables SPARQL querying, and is well-documented.

    Would you like a walkthrough on setting up any of these tools, or a comparison of their features?

    Elisa Kendell

    Together with other colleagues, we’ve used a number of the tools listed for mapping FIBO to relational for large banks successfully, including Ontop and Jena tools, among others. FIBO does include axioms, particularly some cardinality restrictions, that are outside of OWL RL, but most of the semantics can be mapped without any issues. RDFOx supports OWL RL, which is more expressive than OWL QL, and ignores anything outside of the RL profile rather than throwing up on it. Their team has also provided some rules that extended OWL RL for us to test with, including approximations of the axioms in FIBO that are outside of RL, which worked well. Stardog supports all of the semantics encoded in FIBO, and most knowledge graph tools that support SPARQL 1.1 can import it, though they may not support all of the reasoning encoded in the ontologies.

    RDFOx and some other knowledge graph engines prefer either Turtle or JSON-LD to RDF/XML, which is the serialization we work in (primarily to see all of the warts in what we are publishing). But FIBO and the other ontology efforts I participate in publish in all three serializations – RDF/XML, Turtle, and JSON-LD, so that we can supply whatever is needed to a given tool/framework. Same is true of the Commons, MVF, LCC, and other ontologies we publish at OMG - in RDF/XML and Turtle, at a minimum. There is also a toolkit available from the EDM Council that we use to support transformations between serializations consistently, that we use for GitHub comparisons as well as for tool support, which we publish as open source at https://github.com/edmcouncil/rdf-toolkit. It’s a fairly complex Swiss army knife, with various options you can use to manage the transformation as needed.

    Kingsley Idehen

    Do you mean the reverse—creating SQL RDBMS relations from RDF-based relations? If so, note that the SPARQL query language includes a SELECT option for projecting query solutions as tables from RDF Graphs, which can then be fed into a SQL RDBMS.

    Mike Peters

    Yes, I do mean importing ontologies into relational databases. I'm not an ontologist, but I can see the great value in using ontologies, schema and taxonomies as read-only references in a working database.

    The question is how to reliably and effectively allow users to point at any ontology using a form (e.g., something on OBO Foundry or SnowMed) and import it into the relational database they are logged into.

    I was thinking OWL and RDF. Are both possible?

    Michael DeBellis

    As someone rightly pointed out in response to one of my answers, OWL is a logical, not a graph model, and not necessarily tied to RDF.  Since OWL is a subset of First  Order Logic (Description Logic) it can directly map to a relational database rather than going through RDF first. According to ChatGPT: 

    Ontop Supports mappings through R2RML or native Ontop mapping syntax.

    Mike Bennett

    Well this has been a very interesting sub-thread. I'll fork here from before the sub-thread on RDF to RDB etc. considerations.

    Is "Ontology" really synonymous with, or even necessarily a kind of, "Data Model"?

    I'd say emphatically not. There are kinds of ontology that are a kind of data model, of course, and much has been said about these in this sub-thread.

    But those are not the only things of which it can be said "This is an ontology".

    Any model has an "aboutness"; that is, "Of what is this a model"

    For some models, what it is about is data: each element of the model represents some element of data.

    For some models, the aboutness is that of things in the world.

    The model language or formalism, and the model aboutness, are orthogonal: it does not necessarily follow that a model in a given language must be about a given kind of thing. UML Class models are designed to represent Object Oriented class constructs (with both behavioral and structural elements), but some people use them to represent all sorts of other things (including sometimes, things in the world). Similarly, an OWL model may represent RDF data and usually does, since that is what it is intended for.

    Suppose someone wants to have a model of real things in the world. One would call this an "Ontology". However, as  soon as someone says they want an ontology, various people pop up and say "I can do you an ontology" when what they mean, as evidenced in this thread, is "I can do you an ontology of the sort that is a kind of data model".

    Maybe that's what the customer needed, maybe it's not. If the business needs something that formally defines the meanings of things, for example for management communication, reporting, common understandings (in place of word-dependent dictionaries or glossaries) and so on, or if they want something for AI to process, then the chances are they need an ontology of the sort that represents real things in the world. All too often they get given an Ontology-as-data-model because someone thinks that is the only sort of ontology there is.

    There are some questions, the answer to which is not a data model.

    Let's consider 2 things:

    1. Basic engineering best practice
    2. Practical examples of how these kinds of ontology are different.

    Good engineering follows a separation of concerns. Artifacts that represent the customer or business view, for example defining what the customer wants or what their world looks like, should always be expressed independently of any assumptions about the design techniques or technologies that will be used in crafting a solution. 

    For example a business process model represents the activities that the business carries out, independently of any software design to automate these.

    The reason is that (a) things are represented without presuming anything about the solution and (b) the solution can then be validated against that design-independend artifact. That's basic QA. 

    Similarly, a data model is a kind of design (typically done at 2 levels: Platform Independent and Platform Specific, both of which are still designs).

    The corresponding design-independent artifact is a kind of ontology: one in which the real-world meanings of the things of interest to the business are expressed. In other words, what does it mean to be this kind of thing?

    Traditionally that's been done with words. But words are slippery. Better to use formal logic.

    So there are ontologies which are a kind of data model, and there are ontologies which are a representation of things in the world. Both are needed, at these different levels in the development method, and with linkages between them.

    A practical example of the difference is the best way to understand this distinction between these kinds of ontology.

    And the difference is best illustrated with an example of where it went wrong.

    The difference is between what we call "Truth Makers" and data. Truth Makers are what it takes for something to be defined as being a member of a given class of Thing. These are the necessary and sufficient conditions for a thing in the world to be a member of that class. Most of these are either physical matters such as physics and chemistry, or legal and social constructs such as legal capacities, value and so on (mainly classifiable under Searle's Ontology of Social Constructs - an ontology which is definitely not a data model; it's a book). A very few things get their meaning from data itself, as a kind of thing.

    An ontology of things in the world (let's call this a Concept Ontology) defines things using those truth makers.

    An ontology as a kind of data model looks for data surrogates for those things in the world: what data can you expect to find when this or that legal capacity, physical quantity value etc. is in play?

    Example: suppose we consider what it means to be a bank. Very loosely, this is something with certain legal capacities, such as the capacity to take on funds, the capacity to disburse those funds and so on.

    In one project I was involved with, the class "Bank" was defined using a data element for "FDIC Insurance", a kind of insurance that all banks in the US must carry. Then it was noticed that the DTCC, a clearing house, also carries FDIC Insurance, and so a different data item was sought instead.

    There were two errors, one inside the other. The first, proximate error was that they chose the wrong data surrogate. The error inside the error (the ultimate error) was that they did not realize they were making a design decision for a data surrogate. Therefore, that design decision was not peer reviewed and was only discovered later (costing time etc. to fix which is another reason we have separation of concerns).

    The model was updated to use the more correct data surrogate of Banking License. This reliably exists whenever the legal capacities for something to be a bank exists, and doesn't when it doesn't.

    Of course that would not work in all use cases: if the requirement was to detect when some entity was acting as a bank when it should not be, you would look for data about the entity's behavior instead. Different use cases may give rise to different data surrogate design decisions.

    And that's why, while ontology-as-data-model is an extremely valuable kind of data model, the same kind of engineeering integrity should go into their design as into the design of anything else, including the provision of ontologies of the target domain subject matter (subject to scope), against which these can be designed, against which design decisions can be reviewed, and against which the end result can be tested.

    Those are ontologies too. These ideally use formal logic because words are too slippery to be relied upon. But just because a concept ontology is framed using formal logic, does not make it a data model (logic has been around a lot longer than computational data).

    Kingsley Idehen

    You can use SPARQL SELECT, as I described against a collection of relations (comprising terms from both the RDF and OWL ontologies/vocabularies) to insert data into relations (colloquially referred to as tables) managed by an RDBMS.

    John Sowa

    As Yogi Berrra said, these discussions are "Deja vu all over again."

    Re "No SQL":  The person who coined that term, rewrote it as Not Only SQL   The original SQL was designed for data that is best organized in a table.  The fact that other data might be better represented in other formats does not invalidate the use of tables for data that is naturally tabular.

    Re tree structure in ontologies:   A  tree structure for the NAMES  of an ontology does NOT imply that the named data happens to be a tree.   Some of the data might be organized in a tree, but other data might be better organized in a table, list, vector, matrix, tensor, graph, multidimensional shapes, or combinations of all of them.

    The following survey article was written about 40 years of developments from 1970 to 2010.  Some new methods have been invented since then, but 90% of the discussions are about new names for old ideas re-invented by people who didn't know the history.   I wrote the survey, but 95% of the links are to writings by other people;  https://jfsowa.com/ikl .

    And by the way, I agree with Bill Burkett (on the list down below).  He is one of the people I collaborated with on various committees in the past many years.  We viewed the Deja Vu over and over and over.  That's one reason why I don't get excited by new names.

    Alex Shkotin

    What is an idea to import ontology into RDB? Just to store it? Or to use it as a schema for RDB?

    And if the first do you need to keep it structurally or just as a blob? 

    Mike Peters

    The idea is to feed Pipi 9 structured data from versioned external references, such as ontologies, taxonomies, XML Schemas, CSV, etc.

    This data then becomes bits of relational database schema or is used to populate the tables.

    This needs to be an automated process that is highly reliable. It's like using an external API.

    So, using a silly made-up example of what I want to end up with.


    Ontology-Imports-Table
    ----------------------------------
    ID | Source | Version | Thing 1 | Relation | Thing 2
    1 | obofoundary-example.owl | 5 | elf | worksFor | Santa
    2 | obofoundary-example.owl | 5 | rudolf | isA | Reindeer
    3 | obofoundary-example.owl | 5 | mary | isA | Elf
    4 | obofoundary-example.owl | 6 | mary | isA | RetiredElf
    5 | obofoundary-periodicTable.rdf | 1 | Plutonium | isA | Chemical Element
    6 | movieLab.rdf | 5 | Camera | hasA | Camera Lens
    7| movieLab.xml | 10 | DSMC2 Gemini 5K S35 | isA | Camera

    Depending on user requirements, this could be used to generate;

    Camera-Table
    Camera-Lens-Table
    ChemicalElement-Table
    etc

    Or

    Populate a table with read-only records.

    Alex Shkotin

    Why not ontology about ontologies like discussed here

    No comments:

    Post a Comment