Showing posts with label data model. Show all posts
Showing posts with label data model. Show all posts

Modeling Semantics: How Data Models and Ontologies Connect to Build Your Semantic Foundations

Mike's Notes

I discovered this reference in the Data Engineering Weekly from Ananth Packkildurai to this great article by Juha Korpela.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > Data Engineering Weekly
  • Home > Handbook > 

Last Updated

3/02/2026

Modeling Semantics: How Data Models and Ontologies Connect to Build Your Semantic Foundations

By: Juha Korpela
Modern Data 101 (Medium): 22/01/2026

Independent Consultant | Data Modelling Enthusiast | Founder @ Helsinki Data Week.

This piece is a community contribution from Juha Korpela, Independent Consultant and Founder of Helsinki Data Week, a community-first data conference. With deep expertise in information architecture, data products, and modern operating models, Juha has spent his career helping organisations truly understand what their data means and how to use that semantic clarity to build better systems.

Formerly Chief Product Officer at Ellie Technologies and now the voice behind the “Common Sense Data” Substack, Juha is also a speaker, trainer, and advisor shaping the resurgence of conceptual modeling in the industry. We’re thrilled to feature his unique insights on Modern Data 101!

We actively collaborate with data experts to bring the best resources to a 15,000+ strong community of data leaders and practitioners. If you have something to share, reach out!

Share your ideas and work: community@moderndata101.com

  • Note: Opinions expressed in contributions are not our own and are only curated by us for broader access and discussion. All submissions are vetted for quality & relevance. We keep it information-first and do not support any promotions, paid or otherwise!

Let’s Dive In!

Knowledge Management Provides Context for AI

Knowledge Management and Information Architecture have had a rocket ride to the top of the data world’s consciousness due to Generative AI. The ability to organize, store, and serve structured semantics as context to various agents and chatbots is widely recognized as a winning ingredient in the GenAI race, reducing hallucinations and improving accuracy.

Terms like taxonomies, ontologies, and knowledge graphs are being thrown around as if just been invented, but veterans of the trade know better: there’s nothing new under the sun.

Knowledge Management and the Library Sciences, from which these subjects were born, are well-known disciplines, and the theory behind concepts like the Semantic Web is solid. It’s merely the utilization of these that has now changed with GenAI.

Data Modeling Foundations Return

But when it comes to organizing, storing, and serving semantics, there have always been two schools of thought, usually with very little cross-pollination between them. The other viewpoints outside ontologies and knowledge graphs have been coming from the data modeling world.

Traditionally, data modeling has had different levels of abstraction to cover different needs at different levels of detail. Conceptual, Logical, and Physical modeling has been a well-recognized three-level layout for data modeling activities (you can check my views on these three levels on my Substack).

But sadly, at some point in the Big Data craze of yesteryear, many data experts reduced data modeling to the Physical level only, focusing almost exclusively on the technical structures of data storage.

Where Semantics Was Compromised

By forgoing Conceptual modelling to a large extent, data experts had let go of a very practical method for doing exactly the same thing that is now required from taxonomies, ontologies, and knowledge graphs: describing structured semantics.

At the core of both ontologies and conceptual data models are things: real-life entities that exist in the real business, irrespective of the systems we have built. You might call these things “entities” or “objects” or “nodes” or whatever you like,

…but they are what you need to understand in order to describe

(to a human or an agent) what goes on in your business.

Think of “Customer”, “Order”, “Product”, “Delivery”, and so on. These are what you have data about, no matter how the data is technically stored in database tables or files.

In addition to the list of things, to fully understand the business context, we need relationships between the things. How do the things in our business interact with each other? Think “Customer makes an Order”, “Product is added to Delivery”, and so on.

Ontology vs. Conceptual Model

An ontology is, in simple terms, a list of things (and their definitions) with a list of the relationships between them. In the Knowledge Management world, this would be formalized according to, say, RDF standards.

A conceptual model is, in simple terms, also a list of things and their relationships. Data modelers traditionally produce an Entity-Relationship Diagram out of it, with a list of entity definitions (a Glossary) attached.

Now here’s the important thing to understand, regardless of which world you are coming from:

the semantical content you capture with both approaches is exactly the same!

Merely the method of capturing, organizing, and storing that information is different.

For me personally, the method of conceptual modeling feels natural, as I’ve done data modeling for around 15 years now. I know what questions I need to ask people (or what documents to read) to capture information about the entities and their relationships, I know how to draw the diagram, I know how to create the glossary, and I know what tools I can use to help.

For someone coming from a semantic web background, building formalized ontologies according to the RDF standard feels natural, with all the methods and tools that come with it.

We’re both still working on semantics: in effect, we’re capturing the exact same ontology, thus storing information about business context to be used later.

Technical Implementation of Semantics

For us data modelers, the utilization of these models has traditionally focused on the technical implementation of data solutions, and we’ve thus followed a path from Conceptual to Logical to Physical. That is, if we have done conceptual modeling at all!

But especially now, in the age of context-hungry AI, we have to realize we’ve been sitting on a semantics gold mine: conceptual data modeling is an excellent method for figuring out what the entities and relationships should be.

The diagram titled “How Data Models & Ontologies Connect to Build Semantic Foundations” shows “Understanding the business” leading into “Conceptual Modeling,” which branches into two paths: “Solution Design” and “Semantic Discovery”. The ontology also integrates inputs from Standard Ontologies and AI Agents.

How Data Models and Ontologies Connect to Build Semantic Foundations | Source: Author

Why is this important? Because the most valuable semantical information is that which is unique to the organization, and those semantics are the hardest to capture.

While AI tools can be used to find semantical concepts from unstructured data and various knowledge bases, a lot of this information is tacit knowledge in the business experts’ heads. Conceptual modeling is a known-good method for getting that tacit knowledge out.

Data Modeling as Semantic Discovery

I envision a world where we build the semantic foundation of an organization with a set of tools at our disposal:

A pyramid diagram titled “Building Organisational Semantics” illustrating the layers of a Semantic Foundation. The foundation acts as a context provider for both agents and humans. It features three tiers: Industry Standards at the base for common structures, Conceptual Modeling in the center to unearth unique organisational knowledge, and AI Agents at the apex to find details that enhance core semantics.

Technical Implementation of the semantic Foundation with Industry Standards, Conceptual Modeling, and AI Agents | Insights from the Author

  • We use industry standards and existing knowledge bases to cover the basic structures that are common to most organizations within an industry
  • We use conceptual modeling methods as a surgical knife to cut through tacit knowledge and unearth & document the valuable, unique semantics of the organization
  • We use AI agents as “semantic helpers” to trawl through tons of documentation and find details to add around the strong core that has been formed

This semantic foundation will then act as the context provider for all your agents and chatbots, but also for humans! Context is king in today’s world. By looking at data modeling as not only a technical design method, but as a semantic discovery method, we enable a powerful tool for building this context.

Documenting your data: WordPress case study, pt. 1

Mike's Notes

Another very useful article by Alexey Makhotkin.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > Minimal Modelling
  • Home > Handbook > 

Last Updated

04/10/2025

Documenting your data: WordPress case study, pt. 1

By: Alexey Makhotkin
Minimal Modelling: 01/10/2025

I started working with databases around 1996. The idea of this Substack has been brewing for few years already.

Super-short CV: software developer (+database administrator) — team lead — project manager — head of software development (150+ people) — burned out — dinosaur.

240 tables and no documentation: making sense of your database.

A very common question I see on database-related forums goes something like:

“At my new place of work, there is a database with hundreds of tables, barely any documentation, and I need to understand it to do my job: running SQL queries.

[additional complications are usually described]

Any advice on how should I approach this problem?”

You could answer that question on different levels, but I’d like to discuss an approach that is focused on the immediate situation that this person is in. How to organize the company’s data management processes is a bit above our pay grade here.

A problem

Suppose that you’ve recently joined a new company as a data engineer, business analyst, or some such. Basically your job is to create reports of all sorts, building queries, pipelines etc. A very common situation is that there are a lot of tables (say, a few hundreds), and a very limited amount of documentation. Sometimes you have access to people who’ve worked at the company for quite some time, but they are not readily available for advice. Usually there are also several different databases: say, an OLTP database in Postgres, MySQL or Oracle, and a copy of that in some sort of data warehouse, sometimes in many different versions.

How do you start learning what is what in the database? What sort of data is there, how is it stored, how reliable is the data, how clean, etc., etc?

A knee-jerk approach is to document the tables and their columns. This is what’s often considered the data catalog. Unfortunately, if you try this you’ll find that this approach does not work. In a follow-up article we’ll discuss why, but let’s focus on an approach that may have a better chance of working for you.

Case study: WordPress

Let’s use a real-world database as an example: a WordPress database schema. The official description could be found on https://codex.wordpress.org/Database_Description. This page has everything that is traditionally used to document databases:

  • a physical ERD diagram;
  • an overview of tables;
  • a detailed table structure (in a tabular format);

We could also consult a more compact database schema expressed as a sequence of SQL CREATE TABLE statements: https://gist.github.com/squadette/3bafa201a04f1372d69c182f206f8975.

We’re going to use a different approach based on Minimal Modeling (https://minimalmodeling.com/).

We’ll be documenting the database using a four-part catalog in a tabular format:

  • list of anchors;
  • list of attributes;
  • list of links;
  • list of secondary data.

We’ll work incrementally. In the first part we’ll show how to document just a few of each data element: anchors, attributes and links, just enough to illustrate the approach. In the follow-up posts, we’ll build the complete database documentation.

It’s not necessary to build the full design upfront. This helps you deal with large databases: you need to document only the parts that you are directly interested in. The entire data catalog is structured in such a way that you can easily document additional data elements.

Anchors first

We start with anchors (also known as entities). Anchors are nouns, but not every noun is an anchor. To find anchors, we need to look for things that could be added and counted.

Let’s look at the list of tables:

mysql> show tables;
+-----------------------+
| Tables_in_wordpress |
+-----------------------+
| wp_commentmeta |
| wp_comments |
| wp_links |
| wp_options |
| wp_postmeta |
| wp_posts |
| wp_term_relationships |
| wp_term_taxonomy |
| wp_termmeta |
| wp_terms |
| wp_usermeta |
| wp_users |
+-----------------------+
12 rows in set (0.00 sec)

The most common anchor is probably User (we found it in the wp_users table). It’s easy to confirm that users could indeed be added and counted:

  • We have 100 users in our database.
  • One more user has just registered.

Such sentences sound trivial in simple cases, but would become useful in more complicated cases. Hopefully, later we’ll find an example of such in WordPress.

WordPress is a content management system, and the most common type of content is Post and Comment. Both posts and comments could be added and counted. Let’s add those three into the first part of our Minimal Modeling catalog:

The first column is an anchor name; you choose anchor names according to the business vocabulary. They do not necessarily match table names (table names are often unclear or misleading).

The second column documents ID type; in this case it’s “bigint”, an SQL data type. If you have more interesting IDs you could also provide examples of IDs so that you could better recognize them in data. In most cases, of course, the IDs are pretty opaque: just some integers or UUIDs.

The third table contains an SQL query fragment that returns all the IDs of the corresponding anchor, and nothing else. So if we have ten users, the query would return ten different IDs of those users.

Here we begin to see some interesting details, for example the fact that Comment uses a different naming convention for the ID column than the other two.

Three anchors is enough for the start, now let’s look at some attributes.

Attributes

Let’s document a couple of attributes for each anchor. Attributes contain the actual data: strings, numbers, yes/no values, and so on. Note that attributes cannot contain anchor IDs (this is handled by links, see below).

Let’s look at the definitions of wp_users, wp_posts and wp_comments, and find some simple attributes. If we look at the real data in a test WordPress installation, it’s easy to see which data goes where.

The first column is the attribute name. It combines the name of the anchor and some short readable name of the attribute. You can use this string to refer to the specific attribute in other documentation, or just during the discussion.

Note that the attribute name is only remotely related to the column name where the attribute is stored.

The second column of our table contains the most important piece of documentation: a question. We use questions for every attribute. In casual speech people would often just say something like “Name of the User”, or “Item price”, but we take one step further and provide longer and more unambiguous description. Questions help you to document the semantics of less trivial attributes. Additionally, it helps LLMs to understand what exactly is stored here.

The third column is an example value. Practice shows that even a single representative sample of data immediately help with understanding a piece of data. That’s how you can see, for example, that the login name of the User is clearly machine-readable, or that Comment/posted_at has the granularity of one second.

Column #4 is the physical data type. Here we just use normal SQL data types, as defined in your schema.

Finally, SQL query. It needs to return a dataset with exactly two columns: anchor ID and the attribute value. The queries presented here are simple, but you can also extend them to show how to clean the data. We’ll discuss data quality later.

Links

Links roughly correspond to relationships. Links connect two anchors using a verb. Let’s write down all the links that we have between our three anchors so far:

  • User publishes a Post;
  • User posts a Comment;
  • Post has Comments;

How did we find those links? Because we, as users, understand how WordPress works. To make sense of the database, you should have some understanding of the business. As you explore the database schema and present it as the Minimal Modeling catalog, you’ll get more detailed understanding.

Each link has an associated SQL query. This query must return exactly two columns: an ID of the first anchor and an ID of the second anchor.

Let’s discuss each of the columns in that table. The first column contains names of both anchors and the main verb that connects them. Both anchors must be present in the list of anchors that we have.

In our example the anchors are different, but they could also be the same: for example, “An Employee is a manager of another Employee”.

The second column shows link cardinality. In Minimal Modeling, we use only three options:

  • 1:N;
  • M:N;
  • 1:1.

In column #3 we describe the link in a more verbose, slightly formalized way. We write down TWO sentences, one in each direction. We use the words “only one” and “several”: this helps us to confirm the cardinality of the link (we avoid using word “many”).

For example, “several Posts” means that the Post anchor is on the N-side of cardinality, “only one User” means that User is on the 1-side.

Here all three links are 1:N, but we hope to try and find some M:N and 1:1 links later in our investigation.

Finally, we have a column that contains an SQL query for each link. Note that this query must return clean data: two valid IDs, no NULLs or anything like that.

Taking the first link, “User / publishes / Post”, as an example, here is how its table is defined:

CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_author` bigint(20) unsigned NOT NULL DEFAULT ‘0’,
. . . .
);

See that the “post_author” column is defined as NOT NULL, but then it immediately allows using a value of 0 to, apparently, mean that the post has no author. We must make sure that only valid IDs are returned, so we filter out NULLs and other sentinel values (e.g., “WHERE post_author <> 0”).

That’s enough data elements for the first part. We’ll continue assembling the data catalog in the following article.

Tooling

To use this approach, you need to maintain four tables. Here we show anchors, attributes, and links, but there is also secondary data, more on that later.

There is no “official” tooling at the moment, but you can use any spreadsheet-like tool that is convenient. Most probably you want to use a collaborative tool, but in some cases even having private notes about the database is what you need.

You can use Google Docs, like I do as I write this document. You can use Notion, Roam, or Obsidian. You can use any Wiki that has good support for tabular data, or even Markdown.

Early adopters of Minimal Modeling use Grist, and I guess that Airtable or something similar would also work great. You can use Google Sheets too, or Excel.

Process

The idea is that you never try to do big modeling upfront. Instead you start just with a handful of data elements in a shared document, and add new entries as needed.

For example, as you work on some query, you learn about some new tables and columns that are not yet documented. So you document them: do they contain a new anchor? A new attribute? A new link?

Note that sometimes a single database table column can store many different attributes. The most common example here is EAV (Entity-Attribute-Value approach).

Note that the catalog tables could be extended with the extra information that you’re interested in. For example, in many companies it makes sense to keep track which of the attributes contain personally-identifiable information (PII), or other regulated data, such as financial information. You can just add another column in the “attributes” table, and enter the required information.

It takes three pieces of data to describe and anchor, five for the attribute, and four for the link, so adding a new entry should take less than five minutes.

What’s next

That was a short introduction into documenting your database using the Minimal Modeling approach.

In the following posts we’ll continue exploring WordPress database schema. The end goal is to have a complete description, covering every single table and column.

Also, we’ll discuss the Minimal Modeling approach in more detail, as related to understanding your existing database. Particularly, we’ll see how to handle multi-database cases, both for primary data and for secondary (analytical) data, like data warehouses.

On a Sandy Beach, database version 2 is underway

Mike's Notes

In May, after manually reformatting every page and post of "On a Sandy Beach," I wrote.

"A blogging module needs to be built and added to Pipi 9 CMS. This could then be used to create blog posts using an underlying database, which could be modified to be more useful."

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

28/09/2025

On a Sandy Beach, database version 2 is underway

By: Mike Peters
On a Sandy Beach: 28/09/2025

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

The datamodel version 2 to support blogging is now being built. It is designed to support On a Sandy Beach. Yesterday, this Blogger post index was scraped and imported to initially populate the database.

In future, the blogging module will also support other blogs/newsletters, including the Ajabbi Research Monthly Newsletter, which begins next month in October on Substack.

The new database and blogger will be synced while other jobs are completed, including;

  • The tags need consolidating
  • The same tags will form a topic map and be used across Ajabbi
  • etc

Data Model version 1 (current)

  • Mike's Note
  • Resources
  • References
  • Repository links
  • Date Updated
  • Title
  • Page Url
  • Author
  • Source publication
  • Date Created
  • Author description
  • Body of the article
  • Tags
  • Comments

Data Model version 2 (now being built)

  • Title
  • Page Url
  • Site-wide Navigation
  • Site-wide Breadcrumb
  • Mike's Note
  • Author
  • Source publication
  • Date Created
  • Author description
  • Body of the article
  • References
  • Further Reading (replacing References)
  • Articles
  • See Also (cross-links to Ajabbi.com website pages, replacing Repository URL)
  • External Links (replacing Resources)
  • Keywords (replacing Tags)
  • Sharing
  • Updated
  • Forum (replacing Comments)

Translation website

Mike's Notes

Notes on the new translation website. Feedback is very welcome.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

19/10/2025

Translation website

By: Mike Peters
On a Sandy Beach: 23/09/2025

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

The new translation website went live yesterday. It will become a catalogue of all strings used by Pipi when rendering web pages.

In Pipi 9, the UI strings are not with the code. Strings are added at render time by the Render Engine (rnd) from a database. Each copy of Pipi is configured to work with one i18n language.

Subdomain

Languages

The translation website has a separate section for each language. The section URL can include, if required.

<3-letter language code>-<4-letter script code>-<2-letter country code>/

Examples

Each section will be written in that language and script. The language home page provides information about;
  • About
  • Terms
  • Downloads
  • Locales
  • Sources

Terms

Each language has a list of terms in that language.

Each Term links to a web page for that term.

The term home page will provide information about;

  • Term
  • Meaning
  • Part of speech (Verb/Noun/Pronoun/etc)
  • Grammatical number (Singular/etc)
  • Where it is used by Pipi
  • Source of information
  • List the same term in other languages and scripts. These will be linked back to the native language term page

Downloads

The lists of terms in different languages will be freely downloadable in multiple formats.

Work in progress

The website layout requires significant improvement and is subject to change based on the discovery of what works and is useful.

Translation Workspace

A separate workspace for logged-in users to edit translations will be made available in the future. This will use automated workflows to propose changes to the translation database.

Database Design for Google Calendar: a tutorial

Mike's Notes

The first customer needs a Gregorian calendar module that can import and export with Google Calendar, and the rest (Outlook, Yahoo, etc). Here are some notes and references to help me build this quickly. The data model is mainly complete.

There is also an existing Pipi Engine that deals with space and time.

Below is a table of contents from the Google Calendar online article by Alexey Makhotkin, taken from his book.

Resources

References

  • Database Design Book. By Alexey Makhotkin.

Repository

  • Home > Ajabbi Research > Library > Subscriptions > Minimal Modelling
  • Home > Handbook > 

Last Updated

11/07/2025

Database Design for Google Calendar: a tutorial

By: Alexey Makhotkin
Database Design Book: 20/05/2024

Table of contents

  • Introduction
  • Intended audience
  • Approach of this book
  • Problem description
  • Part 1: Basic all-day events
    • Anchors
    • Attributes of User
    • Attributes of DayEvent
    • Links
    • A peek into the physical model
  • Part 2: Time-based events
    • Time zones
    • Anchors
    • Attributes of Timezone
    • Attributes of TimeEvent
    • Links
    • Similarities between DateEvent and TimeEvent
  • Part 3. Repeated all-day events
    • Attribute #1, cadence
    • Attribute #2, tangled attributes
    • Attribute #3
    • Days of the week: micro-anchors
    • Are we done?
    • Repeat limit: more tangled attributes
  • Part 4. Rendering the calendar page
    • A note on tempo
    • General idea
    • Day slots
    • Exercise: TimeSlots
    • How far ahead do you need to think?
  • Part 5. Rendering the calendar page: time-based events.
  • Part 6. Complete logical model so far
  • Part 7. Creating SQL tables
    • Anchors: choose names for tables
    • Attributes: choose the column name and physical type
    • 1:N Links
    • M:N links
    • Finally: the tables
  • Conclusion
    • What’s next?

Data Modeling: Definition, Types, and Challenges

Mike's Notes

More on data contracts.

Resources

References


Repository

  • Home > 

Last Updated

01/04/2025

Data Modeling: Definition, Types, and Challenges

By: Mark Freeman
Gable.ai Blog: 7/12/2024

Class is in session as we break down the fundamentals of data modeling, its different forms, and why it's often a source of contention in the data space.

Serving as a (very real, fully accredited, we swear) 101-level collegiate course, this blog article aims to lay a solid, real-world-based foundation regarding the concept and practice of data modeling. 

As such, the article will include a summary of data modeling’s historical prevalence in data engineering, its more recent dissolution, a definition of the concept, and different methods of use. 

We’ll conclude by exploring why any attempt to discuss the benefits of one type over another consistently equates to booting a hornet’s nest.

This foundation will serve as a gateway for newer data engineers, function as a juicy target of ridicule for the more seasoned, and will act to foster an appreciation for the role data contracts will play in data modeling’s future.

Course schedule:

  • Data modeling: An overview
  • Data modeling defined
  • Common types of data models
  • Causes of controversy in the data modeling space
  • Restoring the model of balance
  • Suggested next steps

1. Data modeling: An overview

At one point in the not-too-distant past, data modeling was the cornerstone of any data management strategy. Due to the technical and business practices that were predominant at the end of the 21st century, data modeling at its zenith placed a strong emphasis on structured, well-defined models.

However, in the late 2000s, the emergence of major cloud service providers like Google Cloud Platform, Microsoft Azure, and Amazon Web Services (AWS) enabled cloud computing to gain traction within business organizations. 

By the end of that same decade, the benefits of scaleable, on-demand computing resources led to a proper surge within business organizations, which then led to the proliferation of what is now commonly referred to as the modern data stack—a group of cloud-based tools and technologies used for the collection, storage, processing, and analysis of data. 

Compared to the at-the-push-of-a-button benefits available on demand, data modeling was then seen by a growing number of practitioners as rigid and inflexible. Data modeling takes time. It can get complicated. The costs and overheads associated with the process reflected this. Perhaps most damaging at the time, it became easy to frame data modeling as a bottleneck—dead weight hampering the speed and flexibility of modern data management. 

However, this overemphasis on speed and flexibility and the underutilization of data modeling wasn’t sustainable. Though there is no specific “breaking point” to point to, by the mid-2010s, these issues became increasingly attributable to data modeling’s diminution. 

While far from exhaustive, increasingly common factors helped to precipitate this recalibration in the data space:

  • Data governance challenges: The abundance of cloud-based data storage and processing fueled an explosive increase in the data sources and repositories the average organization had access to. This sudden abundance, in turn, intensified the maintenance of data quality, security, and compliance, irreparably complicating the governance process.
  • Data quality issues: The fevered rate at which cloud-based solutions were adopted resulted in the neglect of data modeling and proper data architecture, resulting in inconsistencies, data quality issues, and difficulties in data integration. 
  • Lack of standardization: While cloud environments freed teams to use various tools and platforms, the consistency of data management practices degraded, making it harder to ensure consistency and interoperability across an organization.
  • Scalability and performance issues: Without proper data modeling, it became difficult to optimize systems for performance and scalability. Bottlenecks and reduced system efficiency resulted as data volumes grew.
  • Security and compliance risks: Rapid cloud adoption without adequate attention to data modeling and architecture can expose organizations to security vulnerabilities and compliance risks, especially when dealing with sensitive or regulated data.
  • Difficulties in extracting value from data: Without a well-thought-out data model, organizations struggle to extract meaningful insights from data. Inevitably, these organizations found that simply having data in the cloud did not guarantee it was inherently usable or valuable for decision-making.

2. Data modeling defined

Data modeling is the practice or process of creating a conceptual representation of data objects and the relation between them. Data modeling is comparable to architecture, in that the process blueprints how data is stored, managed, and used within and between systems and databases.

In essence, there are three key components of data modeling:

  1. Entities: Entities represent the real-world objects or concepts an organization wants to understand better. Examples of data modeling entities include products, employees, and customers.
  2. Attributes: These are the characteristics or properties of the entities being modeled. Attributes provide details that are used to describe and distinguish instances of an entity—product names, prices, customer names, phone numbers, etc.
  3. Relationships: The connections between entities in a data model are called relationships. They can be one-to-one, many-to-many, or one-to-many. Each entity is represented in a relational database in the typical data modeling process. While each entity has a unique identity, it can have multiple instances. 

Traditionally, the role of data modeling primarily focused on designing databases for transactional systems and normalizing data to reduce redundancy, improving database performance. The process itself mainly involved working with structured data in relational databases.

Modern data modeling is highly varied by comparison. And while its practice and process have evolved beyond some of its inherent qualities viewed negatively in the past, others are now increasingly accepted as trade-offs to be balanced against.

Data modeling today caters to a wide range of data storage and processing systems, ranging from traditional relational database management systems (RDBMS) to data lakes and NoSQL databases. Data models now facilitate data integration. They can support advanced analytics, data science initiatives, and predictive modeling. Modern models emphasize agility and scalability to quickly adapt to shifting business requirements.

As such, data modeling now also supports efforts in the data space to democratize data, helping to make data more understandable and accessible to a wide range of users.

3. Common types of data models

There are four main types of data models, conceptual, logical, physical, and dimensional. This is true when the goal is to simplify the categorization of data models.

Depending on the business needs of an organization, however, more than these initial four may be considered and utilized. We note the former simply because of the confusion this can sometimes cause within the data space.

Conceptual data models

The purpose of conceptual data models is to establish a macro, business-focused view of an organization’s data structure. Conceptual models are often leveraged in the planning phase of database design or a database management system.

In these cases, a data architect or modeler may work with business stakeholders and analysts to identify relevant entities, attributes, and relationships using unified modeling language (UML) and entity-relationship diagrams (ERDs).

Logical data modeling

Logical data models work to provide a detailed view of organizational data that is independent of specific technologies and physical considerations. By doing so, logical models are free to focus on capturing business requirements and rules without being biased by technical constraints. As a result, they can provide a clearer understanding of data from a business perspective.

The ability of less technical stakeholders to more easily understand logical data models also makes them a particularly useful tool for communicating with technical teams.

Physical data modeling

Alternately, physical data modeling aims to capture and represent the detailed structure and design of a database, taking into account the specific features and constraints of a chosen database management system (DBMS), as well as business requirements for performance, access methods, and storage.

For this reason, the entities database administrators and developers will focus on include physical aspects of a database—indexes, keys, partitioning, stored procedures and triggers, etc.

Dimensional data modeling

For business intelligence and data warehousing applications, dimensional data modeling is often used. This is because a dimensional model employs an efficient, user-friendly, flexible structure that organizes data into fact tables and dimensions to support fast querying and reporting.

Due to this, dimensional data models can specifically support related applications' complex queries, analysis, and reporting needs.

Object-oriented data modeling

Based on the principles of object-oriented programming, object-oriented data modeling represents data as objects instead of entities. The objects in this type of data modeling encapsulate both data and behavior. This object-oriented approach is key, making object-oriented models highly useful in scenarios where data structures must reflect real-world objects and their relationships.

Common examples of these scenarios include ecommerce and inventory management systems, banking and financial systems, customer relationship management (CMS) systems, and educational software.

Data vault modeling

As the word “vault” implies, data vault modeling is used in data warehousing, but also in business intelligence. Both data warehousing and BI projects benefit from the historical data preservation, scalability, flexibility, and integration capabilities that data vault models provide.

In theory, this makes data modeling a potential tool for any organization that needs to integrate data from multiple sources while maintaining data history and lineage (e.g., healthcare organizations, government agencies, and manufacturing companies).

Normalized data modeling

This type of data modeling focuses on two things—reducing data redundancy and improving data integrity. This can be crucial for transactional systems where data integrity and consistency are of prime importance. Normalized models are easier to maintain and update, while they also prevent data anomalies like inconsistencies and duplication.

De-normalized data modeling

Alternately, de-normalized data models involve the intentional introduction of redundancies into a dataset in order to improve performance. Through de-normalized modeling, related data can be stored in the same table or document. This reduces the need for computationally expensive join operations, which can slow down query performance.

Because of how they function, de-normalized data models also harmonize with the principles of NoSQL databases, which prioritize flexibility, scalability, and performance.

4. Causes of controversy in the data modeling space

Data scholars agree that discussions around data modeling function similarly to a hornet’s nest in nature—both tend to cause massive amounts of pain when stumbled into. While unfortunate for the stumbler, it helps to understand that, in both cases, damage results in the attempt to defend what one holds dear.

For hornets, driven to protect the nest’s existing and developing queens, the aggression results from a combination of their innate programming, alarm pheromones, and the instinct to attack in numbers in order to intimidate and dissuade larger foes.

For data practitioners, however, aggressively defending one’s beliefs about the process and practice of data modeling is usually motivated by one or more of the following factors:

  • Diverse perspectives: Data modeling is a field that intersects with numerous disciplines, including data science, software engineering, database design, data analytics, and business intelligence. While sharing varying degrees of overlap, these disparate professional backgrounds act as frames through which the views of “effective data modeling” become wildly divergent in the data space.
  • Complexity and trade-offs: Additionally, data modeling tends to involve near-endless tradeoffs between competing priorities. These tradeoffs include speed vs. governance, normalization vs. performance, and structure vs. flexibility—each with passionate advocates on both sides of the aisle.
  • Organizational context: The “right” data model in one organization may not be the same in another, even when operating within the same industry. Differing business rules and goals, data requirements, schema, information systems, and data maturity all but guarantee that there will never be one true data modeling technique or process.
  • Subjectivity in design: Data modeling itself can be quite subjective. Like many design disciplines, there are often multiple ways to model a given dataset. And data modelers themselves often have legitimate reasons for championing one approach over another. This subjectivity is part of why many find the challenges of data modeling so fulfilling.
  • Evolving technologies: Despite the order and logic practitioners attempt to bring to the table, the exponentially rapid evolution of data technologies—from traditional relational databases to NoSQL, low and no-code platforms, and big data—necessitates approaches to data modeling to continuously diversify.
  • Fluctuating best practices: Due to the ever-evolving modeling landscape, its related best practices invariably need to change. Techniques once considered sacrosanct can find themselves outdated, furthering debates about what the current best approach may be at any given time.
  • Emotional Investment: Data practitioners tend to be curious, persistent, analytical thinkers who benefit from a high attention to detail. As such, those who practice data modeling (or cross paths with it) tend to invest a great deal of intellectual and emotional capital in their work. Occasionally, this can create an environment where critiques or suggestions for alternate approaches can either be delivered as a personal attack, or taken as such. 

5. Restoring the model of balance

The good news is that navigating the tension between the impact of data modeling and the convenience of the modern data stack is inevitable. Organizations helping to strike the balance should consider employing the following:

  1. Adopt a hybrid approach: Consider using structured data modeling for core business entities that require consistency and stability above all. In areas that call for more agility and flexibility, employ modern data technologies that enable rapid iteration.
  2. Harmonize flexibility with standardization: Building on a hybrid approach, look to standardize core data elements and processes. At the same time, allow for flexibility in areas where rapid change can be expected. Embrace constant balancing and rebalancing of the strengths of structured data modeling and the modern data stack. 
  3. Use iterative data modeling: Instead of insisting on extensive upfront data modeling, try an iterative approach. Start with a basic model, then evolve it as needed. Iteration can produce the best of both worlds, maintaining a structured approach while responding to requirements as they change over time.
  4. Leverage data virtualization: Data visualization provides a helpful layer of abstraction that allows for integrating diverse data sources without extensive modeling. In some organizations, this approach maintains agility while ensuring data is effectively understood and used. 
  5. Focus on metadata management: Bridging the gap between structure modeling and agility usually involves a (sometimes renewed) focus on effective metadata management. Robust metadata curation further enables organizational flexibility while clarity regarding data structures and relationships is maintained. 
  6. Emphasize data governance: When individuals are empowered to enact consistent data governance, clear policies and standards guiding data quality, usage, and security help ensure a data environment remains as agile as possible.
  7. Enable self-service data access: When implemented with appropriate controls, self-service data access supports agility by allowing users to access data as needed while still operating within the framework of the established data model.
  8. Continuous collaboration: Make sure to foster continuous collaboration between your data architects, engineers, and business users. While the passionate data modeling discussions will still take place from time to time, making cross-disciplinary collaboration an important part of the culture helps keep modeling efforts and business needs aligned. 
  9. Implement data contracts: Finally, employ data contracts to provide structured agreements on data formats and interfaces. Their ability to foster communication between data producers and consumers promotes balance just as the other tactics here do—but also allows that balance to scale.

6. Suggested next steps

As is now abundantly clear, treating data as a product is paramount for any organization looking to succeed in an overwhelmingly data-dependent world. Data contracts are the best way to guarantee the quality of data before it even enters an organization. 

For this reason, we’re offering a transformative approach to retaining, developing, and operationalizing data contracts. Make sure to join our product waitlist to be among the first to experience the benefits of Gable.ai.

Data Model and Ontology

Mike's Notes

An excellent discussion titled Design Pattern Ontology recently occurred on the Ontolog Forum. Igor Toujilov started the thread on November 26, 2024.

It covered a lot of ground, and a point made by John Sowa is copied below.

Eventually, it led to a discussion about the relationship between data models and ontologies. I have copied below the text of many of the points raised as a valuable reference for future work on Pipi.

The contributors are;

  • John Sowa
  • William "Bill" Burkett
  • Mike Peters
  • Michael DeBellis
  • David Eddy
  • Paul Tyson
  • Igor Toujilov
  • Alex Shkotin
  • Kingsley Idehen
  • Elisa Kendell
  • Mike Bennett

I will keep adding to these notes as more useful contributions are made.

Any errors or omissions are mine.

Resources

People mentioned

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

17/05/2025

Data Model and Ontology

By: Mike Peters
On a Sandy Beach: 01/12/2024

Mike is the inventor and architect of Pipi and the founder of Ajabbi.

Design Pattern Ontology

Note taken for the Ontolog Forum

John Sowa

Since Matthew isn't with us, I'll summarize some of his points which we had discussed over many years in different ways.

He had been working at Shell Oil for years, and he had developed a detailed ontology for the oil industry.  He later generalized it to develop a more general top level, which I considered quite good.   We had discussed issues about generalizing it even farther,  but he was reluctant to go farther in levels of abstraction.  We agreed that was a reasonable point of view.

But we also agreed that his details for the oil industry might conflict with details for other industries, such as banking and farming.   Furthermore, he recognized that different oil companies had different ways of representing the same terms because they had developed different policies and procedures.

I'll also mention another widely used ontology, which evolved over a period of about 70 years, and it is unlikely to change for a long, long time:  the ontology for making reservations for airlines, which was later extended to cover anything related to airlines, such as hotels, car rentals, trains, taxis, etc., etc.

And that ontology began in the 1950s with IBM's project SAGE for the airplanes used in the Strategic Air Command  over North America.  In the 1960s, IBM adapted that ontology for American Airlines.  IBM later sold the software to other airlines.   And all the other additions were made to conform to the same basic ontology from the 1950s.  70 years later, that top level is so entrenched that it is never going away.

Another world-wide ontology that also developed in the 1960s includes the global weather patterns that were established by the world-wide weather simulation programs.  They use a different way of representing the world than the reservation systems.

Fundamental problem:  There will never be a single universal ontology for representing anything and everything in the world (or the universe).

That is a fact that any system of knowledge representation must deal with.

Data Model and Ontology

Abbreviated notes taken from the Ontolog Forum

01/01/2025 and ongoing

William Burkett

  • Hay, D. C. (1996). Data model patterns : conventions of thought. New York, Dorset House Pub.
  • Hay, D. C. (2006). Data model patterns : a metadata map. Amsterdam ; Boston, Elsevier Morgan Kaufmann.
  • Silverston, L. (2009). The data model resource book, Vols 1-3. New York, John Wiley.

Mike Peters

  • Hay, D. C. (2011). UML and Data Modeling: A Reconciliation, Technics Publications.

Mike Peters

Ontologies are represented in graph structures. Non-relational databases like semantic or graph databases are better suited for this job, and ontologists (I'm not one) have no problem working with them.

However, the workforce needs conventional everyday interfaces driven by relational databases. So, there is an import/export issue that David Hay could have written a book about, and I wish he had. His explanations are excellent. His book on UML and data modelling also bridged two different ways of looking at the world.

Michael DeBellis

RE: The distinction / difference between data models & ontologies is what...?

I think there is a difference but it isn't what many people in the ontology community seem to think. Most of the methodologies and guidelines I've seen for building ontologies present the ontology as a thing in itself. For example, they often minimize or even ignore questions such as loading data or integrating with existing systems. In reality these are some of the most important issues to face for ontologies used in the real world. Some of the most important distinctions between an E/R model and an ontology are:

1) E/R models are typically used for Online Transaction Processing (OLTP) ontology and knowledge graph models are typically used for Online Analytic Processing (OLAP). The design of an OLTP model needs to be optimized for response time. Thus, such models tend to be fairly sparse, leaving most of the domain knowledge in the code of the systems that use them. The design for an OLAP model can be much richer and include much more knowledge about the domain in the model itself rather than in the code because the users will tend to be working on client machines that have more processing power and because they are going to be executing complex queries will be a bit more patient than someone adding a post to Facebook or doing a funds transfer on their bank account with their phone.

2) E/R models need to worry about various normalization forms for peak efficiency depending on what needs to be maximized. Ontologies are implemented as graphs and don't need to consider the same kinds of issues. Essentially when you build an ontology you are able to work at the analysis level whereas for an E/R model you tend to work at the design level.

3) E/R models are relatively difficult to change at run time. Ontologies and knowledge graphs can easily be changed at run time, not just the instance values but the schemas themselves can be changed at run time. In this way they are more similar to NoSQL databases that have "schema on read" rather than E/R models that have schema on write.

Actually, I just remembered I'm putting together a table comparing relational databases, NoSQL (Hadoop) and OWL. Here's what I have so far. This is a work in progress:

Feature

Hadoop (HDFS + Pig/Hive)

SQL Databases

OWL Knowledge Graphs (e.g., AllegroGraph)

Schema

Schema on read

Schema on write

Schema on write but schema can be modified at run time

Data Model

File-based (raw formats)

Tables (rows/columns)

RDF triples or quads (subject-predicate-object)

Storage

Distributed (HDFS)

Centralized or sharded

Triple or quad stores with distributed options and sharding

in some triplestores (e.g., AllegroGraph)

Processing

Batch (MapReduce)

Transactional (ACID)

Semantic reasoning and SPARQL queries. Some triplestores also

support ACID transactions.

Querying

Pig Latin, HiveQL

SQL

SPARQL and Description Logic queries

Reasoning

None

None

SWRL, SHACL, OWL DL axioms

Flexibility

Flexible (unstructured, semi-structured, structured)

Rigid schema

Highly flexible with semantic annotations and Linked Data

Scalability

Horizontal (many nodes)

Vertical (more powerful nodes)

Horizontal or vertical, depending on implementation

Integration

Tools for ETL and analytics

Tight coupling with apps

Can integrate with ontologies, Linked Data, and other RDF graphs

Best For

Batch processing, raw data storage

Transactional workloads, OLTP

Semantic data, reasoning, and complex relationships

David Eddy

You might want to add this site to your reading list. https://www.db-engines.com/en/ranking list of DBMSs DBMS engine tally at 417 as of 2024-01-04 Their counting system gets a little wonky at 390+

Paul Tyson

2024 - P15Y = 2009 < 2012 (date of R2RML recommendation)

Igor Toujilov

Mike, I would not say "Ontologies are represented in graph structures" only. Ontologies can be represented in a wide range of formalisms, including graphs, which are just one possible representation. For example, there are tools to store the same ontology in different representation formats: RDF/XML, Turtle, OWL Functional Syntax, Manchester OWL Syntax, etc. Yes, RDF and Turtle are graph representations. But OWL Functional and Manchester syntaxes have nothing to do with graphs. And yet they represent the same ontology.

I also disagree that "the workforce needs conventional everyday interfaces driven by relational databases". It depends on your system architecture. Today many systems use No-SQL or graph databases successfully without any need for relational databases.

In real systems, the difference between data models and ontologies can be sharp or subtle. Some systems continue using relational databases while performing some tasks on ontologies. Other systems have ontologies that are tightly integrated in the production process, so sometimes it is hard to separate the ontologies from data. And of course, there is a wide range of systems in between of those extreme cases.

William Burkett

My take is that the difference is primarily one of intention. Ontology designers and “conceptual data model” designers are seeking to create representations of the “real world”. Logical/physical data designers are seeking to create specifications for actual data structures for applications to store/access/use data. This is distinction is, of course, very fuzzy and fluid because both sets of designers are usually pursuing both of these intentions simultaneously. If an “ontology” is specified in OWL, it is, of course, is a self-defining “data structure” that is processable by applications designed to use that structure – so, IMHO, there is no objective, practical difference between them. Unless we’re talking about box-and-line diagrams, most ontologies that we talk about here (I think) are just special kinds of data models.

Alex Shkotin

Your table reminds me "What Goes Around Comes Around… And Around… Michael Stonebraker and Andrew Pavlo"

We have discussed it about 2 and half hours at our last meeting SIGMOD-MoscowThey overview RDF and graph DB, but not OWL.

Kingsley Idehen

Yes, R2RML is the component of “the SemWeb stack” designed to describe how relations in a DBMS are represented as relations in RDF.

Here’s a post I wrote years ago, complete with live examples, demonstrating how to create an RDF-based Entity Relationship Graph from CSV files located on a local filesystem accessible to a Virtuoso instance:;

Virtuoso is a multi-model database management system in one of its guises. Specifically, it can operate on relations represented in the coarse-grained form typical of SQL-based DBMSs or in the finer-grained form facilitated by RDF via its Quad Store functionality.

In my view, Ontology and Data Model are different terms for the same concept—they’re essentially conceptual synonyms. That said, practitioners often view them differently depending on their operational lenses. For example, most ontologists express their worldviews on data modeling using formal graphic or linear notations, while data modelers (often from the RDBMS domain) favor graphical notations such as ERD (Entity-Relationship Diagrams) or UML (Unified Modeling Language) diagrams.

Links:

  • [2] Example Hyperlink Super-Key
    A hyperlink-based super-key resolving to a specific row in a CSV document (which could also represent any ODBC- or JDBC-accessible table in an RDBMS)

Mike Peters

So, what options are there to map RDF to relational data?

Michael DeBellis

RE: So, what options are there to map RDF to relational data?

I was going to list what little I knew but I'm working with ChatGPT right now, so I asked it. Here's what it said:

Implementing OWL (Web Ontology Language) in a relational database involves translating OWL's semantic capabilities into a relational schema and querying framework. 

Several tools and approaches allow you to achieve this:

Tools and Frameworks for OWL in Relational Databases

1. RDF and SPARQL Layers on Relational Databases
  • Apache Jena with RDBMS Integration:
    • Jena provides a framework to work with RDF and OWL.
    • It supports relational database backends via the TDB or RDB storage layers.
    • You can map RDF triples to relational tables and query them using SPARQL.
    • Virtuoso Universal Server:
      • Virtuoso supports RDF storage and SPARQL queries directly in a relational database.
        • It combines a traditional RDBMS with a semantic store.
        • Useful for large-scale ontologies.
    • D2RQ:
      • A tool for exposing relational databases as virtual RDF graphs.
      • You can use OWL over the RDF representation while keeping data in the relational schema.
    2. Ontology Mapping Tools
    • Ontop:
      • A powerful tool to map OWL ontologies to relational databases using OBDA (Ontology-Based Data Access).
      • Allows you to query relational data through SPARQL while maintaining OWL semantics.
      • Supports mappings through R2RML or native Ontop mapping syntax.
    • Karma:
      • A tool for integrating data from relational databases with OWL ontologies.
      • Supports mapping relational data to RDF and OWL constructs.
    3. Hybrid Database Systems
    • GraphDB (by Ontotext):
      • A triplestore that can integrate with relational databases.
      • It allows you to store and query OWL data seamlessly alongside relational data.
    • Oracle RDF Graph:
      • Part of Oracle Database, it supports RDF and OWL data alongside relational data.
      • Provides SPARQL querying and reasoning capabilities.
    4. Custom Implementation Approaches
    • Mapping OWL Axioms to Relational Rules:
      • Use SQL views or stored procedures to encode OWL semantics manually.
      • Define mappings for classes, properties, and relationships in relational terms.
    • DL-Lite Ontology Mapping:
      • If your OWL usage aligns with OWL 2 QL (a subset of OWL optimized for relational databases), you can use DL-Lite mappings for efficient querying.

    Considerations for Choosing a Tool

    1. Scale of the Ontology:
    • Large ontologies benefit from systems like Virtuoso or GraphDB, which are optimized for scalability.
    2. OWL Subset:
    • OWL 2 QL is specifically designed for integration with relational databases. Tools like Ontop are excellent for such use cases.
    3. Querying Needs:
    • If SPARQL is a requirement, tools like Ontop, Jena, or Virtuoso are natural fits.
    • For simple SQL querying, D2RQ or custom mapping may suffice.
    4. Reasoning Capabilities:
    • If you require reasoning over OWL axioms, focus on tools like GraphDB, Oracle RDF Graph, or Jena.

    Suggested Approach

    If you're starting out and want robust OWL support with minimal setup, Ontop is highly recommended. It supports direct integration with relational databases, enables SPARQL querying, and is well-documented.

    Would you like a walkthrough on setting up any of these tools, or a comparison of their features?

    Elisa Kendell

    Together with other colleagues, we’ve used a number of the tools listed for mapping FIBO to relational for large banks successfully, including Ontop and Jena tools, among others. FIBO does include axioms, particularly some cardinality restrictions, that are outside of OWL RL, but most of the semantics can be mapped without any issues. RDFOx supports OWL RL, which is more expressive than OWL QL, and ignores anything outside of the RL profile rather than throwing up on it. Their team has also provided some rules that extended OWL RL for us to test with, including approximations of the axioms in FIBO that are outside of RL, which worked well. Stardog supports all of the semantics encoded in FIBO, and most knowledge graph tools that support SPARQL 1.1 can import it, though they may not support all of the reasoning encoded in the ontologies.

    RDFOx and some other knowledge graph engines prefer either Turtle or JSON-LD to RDF/XML, which is the serialization we work in (primarily to see all of the warts in what we are publishing). But FIBO and the other ontology efforts I participate in publish in all three serializations – RDF/XML, Turtle, and JSON-LD, so that we can supply whatever is needed to a given tool/framework. Same is true of the Commons, MVF, LCC, and other ontologies we publish at OMG - in RDF/XML and Turtle, at a minimum. There is also a toolkit available from the EDM Council that we use to support transformations between serializations consistently, that we use for GitHub comparisons as well as for tool support, which we publish as open source at https://github.com/edmcouncil/rdf-toolkit. It’s a fairly complex Swiss army knife, with various options you can use to manage the transformation as needed.

    Kingsley Idehen

    Do you mean the reverse—creating SQL RDBMS relations from RDF-based relations? If so, note that the SPARQL query language includes a SELECT option for projecting query solutions as tables from RDF Graphs, which can then be fed into a SQL RDBMS.

    Mike Peters

    Yes, I do mean importing ontologies into relational databases. I'm not an ontologist, but I can see the great value in using ontologies, schema and taxonomies as read-only references in a working database.

    The question is how to reliably and effectively allow users to point at any ontology using a form (e.g., something on OBO Foundry or SnowMed) and import it into the relational database they are logged into.

    I was thinking OWL and RDF. Are both possible?

    Michael DeBellis

    As someone rightly pointed out in response to one of my answers, OWL is a logical, not a graph model, and not necessarily tied to RDF.  Since OWL is a subset of First  Order Logic (Description Logic) it can directly map to a relational database rather than going through RDF first. According to ChatGPT: 

    Ontop Supports mappings through R2RML or native Ontop mapping syntax.

    Mike Bennett

    Well this has been a very interesting sub-thread. I'll fork here from before the sub-thread on RDF to RDB etc. considerations.

    Is "Ontology" really synonymous with, or even necessarily a kind of, "Data Model"?

    I'd say emphatically not. There are kinds of ontology that are a kind of data model, of course, and much has been said about these in this sub-thread.

    But those are not the only things of which it can be said "This is an ontology".

    Any model has an "aboutness"; that is, "Of what is this a model"

    For some models, what it is about is data: each element of the model represents some element of data.

    For some models, the aboutness is that of things in the world.

    The model language or formalism, and the model aboutness, are orthogonal: it does not necessarily follow that a model in a given language must be about a given kind of thing. UML Class models are designed to represent Object Oriented class constructs (with both behavioral and structural elements), but some people use them to represent all sorts of other things (including sometimes, things in the world). Similarly, an OWL model may represent RDF data and usually does, since that is what it is intended for.

    Suppose someone wants to have a model of real things in the world. One would call this an "Ontology". However, as  soon as someone says they want an ontology, various people pop up and say "I can do you an ontology" when what they mean, as evidenced in this thread, is "I can do you an ontology of the sort that is a kind of data model".

    Maybe that's what the customer needed, maybe it's not. If the business needs something that formally defines the meanings of things, for example for management communication, reporting, common understandings (in place of word-dependent dictionaries or glossaries) and so on, or if they want something for AI to process, then the chances are they need an ontology of the sort that represents real things in the world. All too often they get given an Ontology-as-data-model because someone thinks that is the only sort of ontology there is.

    There are some questions, the answer to which is not a data model.

    Let's consider 2 things:

    1. Basic engineering best practice
    2. Practical examples of how these kinds of ontology are different.

    Good engineering follows a separation of concerns. Artifacts that represent the customer or business view, for example defining what the customer wants or what their world looks like, should always be expressed independently of any assumptions about the design techniques or technologies that will be used in crafting a solution. 

    For example a business process model represents the activities that the business carries out, independently of any software design to automate these.

    The reason is that (a) things are represented without presuming anything about the solution and (b) the solution can then be validated against that design-independend artifact. That's basic QA. 

    Similarly, a data model is a kind of design (typically done at 2 levels: Platform Independent and Platform Specific, both of which are still designs).

    The corresponding design-independent artifact is a kind of ontology: one in which the real-world meanings of the things of interest to the business are expressed. In other words, what does it mean to be this kind of thing?

    Traditionally that's been done with words. But words are slippery. Better to use formal logic.

    So there are ontologies which are a kind of data model, and there are ontologies which are a representation of things in the world. Both are needed, at these different levels in the development method, and with linkages between them.

    A practical example of the difference is the best way to understand this distinction between these kinds of ontology.

    And the difference is best illustrated with an example of where it went wrong.

    The difference is between what we call "Truth Makers" and data. Truth Makers are what it takes for something to be defined as being a member of a given class of Thing. These are the necessary and sufficient conditions for a thing in the world to be a member of that class. Most of these are either physical matters such as physics and chemistry, or legal and social constructs such as legal capacities, value and so on (mainly classifiable under Searle's Ontology of Social Constructs - an ontology which is definitely not a data model; it's a book). A very few things get their meaning from data itself, as a kind of thing.

    An ontology of things in the world (let's call this a Concept Ontology) defines things using those truth makers.

    An ontology as a kind of data model looks for data surrogates for those things in the world: what data can you expect to find when this or that legal capacity, physical quantity value etc. is in play?

    Example: suppose we consider what it means to be a bank. Very loosely, this is something with certain legal capacities, such as the capacity to take on funds, the capacity to disburse those funds and so on.

    In one project I was involved with, the class "Bank" was defined using a data element for "FDIC Insurance", a kind of insurance that all banks in the US must carry. Then it was noticed that the DTCC, a clearing house, also carries FDIC Insurance, and so a different data item was sought instead.

    There were two errors, one inside the other. The first, proximate error was that they chose the wrong data surrogate. The error inside the error (the ultimate error) was that they did not realize they were making a design decision for a data surrogate. Therefore, that design decision was not peer reviewed and was only discovered later (costing time etc. to fix which is another reason we have separation of concerns).

    The model was updated to use the more correct data surrogate of Banking License. This reliably exists whenever the legal capacities for something to be a bank exists, and doesn't when it doesn't.

    Of course that would not work in all use cases: if the requirement was to detect when some entity was acting as a bank when it should not be, you would look for data about the entity's behavior instead. Different use cases may give rise to different data surrogate design decisions.

    And that's why, while ontology-as-data-model is an extremely valuable kind of data model, the same kind of engineeering integrity should go into their design as into the design of anything else, including the provision of ontologies of the target domain subject matter (subject to scope), against which these can be designed, against which design decisions can be reviewed, and against which the end result can be tested.

    Those are ontologies too. These ideally use formal logic because words are too slippery to be relied upon. But just because a concept ontology is framed using formal logic, does not make it a data model (logic has been around a lot longer than computational data).

    Kingsley Idehen

    You can use SPARQL SELECT, as I described against a collection of relations (comprising terms from both the RDF and OWL ontologies/vocabularies) to insert data into relations (colloquially referred to as tables) managed by an RDBMS.

    John Sowa

    As Yogi Berrra said, these discussions are "Deja vu all over again."

    Re "No SQL":  The person who coined that term, rewrote it as Not Only SQL   The original SQL was designed for data that is best organized in a table.  The fact that other data might be better represented in other formats does not invalidate the use of tables for data that is naturally tabular.

    Re tree structure in ontologies:   A  tree structure for the NAMES  of an ontology does NOT imply that the named data happens to be a tree.   Some of the data might be organized in a tree, but other data might be better organized in a table, list, vector, matrix, tensor, graph, multidimensional shapes, or combinations of all of them.

    The following survey article was written about 40 years of developments from 1970 to 2010.  Some new methods have been invented since then, but 90% of the discussions are about new names for old ideas re-invented by people who didn't know the history.   I wrote the survey, but 95% of the links are to writings by other people;  https://jfsowa.com/ikl .

    And by the way, I agree with Bill Burkett (on the list down below).  He is one of the people I collaborated with on various committees in the past many years.  We viewed the Deja Vu over and over and over.  That's one reason why I don't get excited by new names.

    Alex Shkotin

    What is an idea to import ontology into RDB? Just to store it? Or to use it as a schema for RDB?

    And if the first do you need to keep it structurally or just as a blob? 

    Mike Peters

    The idea is to feed Pipi 9 structured data from versioned external references, such as ontologies, taxonomies, XML Schemas, CSV, etc.

    This data then becomes bits of relational database schema or is used to populate the tables.

    This needs to be an automated process that is highly reliable. It's like using an external API.

    So, using a silly made-up example of what I want to end up with.


    Ontology-Imports-Table
    ----------------------------------
    ID | Source | Version | Thing 1 | Relation | Thing 2
    1 | obofoundary-example.owl | 5 | elf | worksFor | Santa
    2 | obofoundary-example.owl | 5 | rudolf | isA | Reindeer
    3 | obofoundary-example.owl | 5 | mary | isA | Elf
    4 | obofoundary-example.owl | 6 | mary | isA | RetiredElf
    5 | obofoundary-periodicTable.rdf | 1 | Plutonium | isA | Chemical Element
    6 | movieLab.rdf | 5 | Camera | hasA | Camera Lens
    7| movieLab.xml | 10 | DSMC2 Gemini 5K S35 | isA | Camera

    Depending on user requirements, this could be used to generate;

    Camera-Table
    Camera-Lens-Table
    ChemicalElement-Table
    etc

    Or

    Populate a table with read-only records.

    Alex Shkotin

    Why not ontology about ontologies like discussed here