Hierarchies in 3NF model

Publish date: 2019-08-12
Authors: Grzegorz Swierniak
Tags: dwh, design, 3nf, hierarchy, integrated layer, access layer, surrogate key

In today’s article, I want to show you how to store n- number of hierarchies with n- number of levels using just 4 tables. The 4 tables will represent 3rd normal form (3NF) of our new integrated model. The scripts below you can apply in your own model and see how powerful this technique is, I’ve been using it for some time and the actual implementation does not vary that much whatever hierarchy you have, trust me!

After reading this article you will learn:

After reading this article you will not learn:

hierarchies

Source file

Let’s start with the source file - the hierarchy that we are going to normalize and store in our data warehouse. The hierarchy represents the group of products in a sports store with up to 9 levels, up to 9 levels means that some group of products can have fewer levels, like ‘Balls’ - only 4 levels.

In a source system (ERP) the table is named as ‘product_tree_unq’ and for education purposes, we will keep the same in our stage layer (demo_sl). Additionally, every node in the hierarchy has its own unique identifier called ‘node_id’, which is used to specify what products fall under a given node (we will come back to that field in the next article).

Create your model

It is the time to design the model in 3rd normal form - the model you see below can fit any hierarchy with any number of levels, so next time you will be moving a hierarchy to your data warehouse, take the model as is, don’t waste your time trying to reinvent the wheel.

model

Feel free to copy the model, just clone my project: Demo_il model in genmymodel.com

Let me quickly give you an overview of the tables that we have:

Important note In order to keep the article clear enough, some scripts were shortened, normally the pattern is presented in the first 2 and the last UNION and the rest of UNIONs were cut off. But all functional scripts can be found in the GitHub repository, so feel free to download them and test if on your own. Link to the the dwhacademy/demoproject repository can be found at the end of the article. repo

Design the transformation logic

Load reference data

Before we actually get to designing the transformation logic for our target tables, we need to load our reference data - in our case, we have only one hierarchy, so we just need to insert one record with hierarchy name into m002_hier table. I named our hierarchy as ‘Product_Group’, in fact, it can by any other name you wish but please make sure it has a proper business meaning.

The transformation logic

The transformation logic is wrapped in a view that will be used to update key table (k001_hier_item_key) and the main target table (m001_hier_item), therefore in the view, we should combine all columns from both tables. Here are a few key facts about the logic:

Load your model

Generate surrogate keys

It is time to start loading our tables in the integrated layer. At first, we have to make sure, all our hierarchy items (nodes) have their own surrogate keys, in order to do that - we update the key table at the beginning. In the script below you can find the common formula for surrogate key generation:

I encourage you to use this formula because it gives you full control over the surrogate key generation and can be built in most RDMS I am aware of (Teradata, PostgreSQL, Oracle, etc.).

Load target table

Having all surrogate keys in place, we can easily insert data to our main target table - m001_hier_item. As I pointed out, this article is not meant to teach you how to properly historize tables in the integrated model, therefore to keep the simplicity of this lesson, I use SCD type 1 in my procedures, meaning I delete the target table completely before inserting new records.

Results:

Load relationship table

Another crucial table to load is m003_hier_item_rltd which defines parent node and child node relationship - as a general rule, each parent can have multiple children and each child can have only one parent. Simple joins to the key table should easily specify the relations between nodes.

Views on the top of the Integrated Layer tables

Recursive view as a universal object to store all hierarchies

Once the tables in our integrated model had been loaded, we have fully normalized hierarchy. Of course, the data can’t be easily accessed yet, therefore we should aim to denormalize the data and put it in the nice hierarchical form. The first step to achieve that is to create a recursive view that will associate parent-child relationships with level of hierarchies they belong to.

Hierarchy - final obejct for Access Layer

The second and the last step in denormalization exercise is to pivot the data from the recursive view. I named the new view as dim_product_tree_hier and placed it in the Access layer (demo_al) because it is the proper layer to create such denormalize tables(views). The view is designed in such a way that surrogate keys are placed next to the original node names, so every node (hier_item) can be joined a specific product.

Summary

I hope that you found this article interesting, in the next article I will try to show you how to associate individual products with the hierarchy that we have built.

All the scripts presented in this article can be found also on GitHub, feel free to use them and try to improve!

comments powered by Disqus