Naming Standards

Publish date: 2019-07-21
Authors: Grzegorz Swierniak
Tags: dwh, design, standards, names, integrated layer, access layer, staging layer

The main goal of adopting a naming convention for database objects is so that you and others can easily identify the type and purpose of all objects contained in the database. In this article, I want to describe one of the possible ways of adopting naming standards, undoubtedly there are other standards that also work but my goal is to show you the ones I applied and they proved to be working very well. I do not believe the same naming standards can be applied to all layers of the DWH (read more about Proven DWH Design here). Each layer serves a different purpose, so different users will be accessing them - let’s use the standards that fit their needs. Despite that, there are certain aspects of naming conventions that are common for all objects, no matter where in the architecture of the DHW they reside.

names

Common standards

Staging layer naming standards

The names here are straightforward since this layer consists of only tables (views) that represent raw data from source systems. The general pattern is to use a prefix with the source system name (can be abbreviated) plus the business meaning of data in that table, as follow:

Very often you can hear about the restriction where data from the staging layer cannot be accessed directly (from tables), but only through views that are built on top of that tables (restricted access might be applied on views). In such a case, we just need to add another prefix ‘V_’ in front of the source table:

Integrated layer naming standards

Some of the rules you will find below might not adhere to general naming conventions you know from books. But I want to highlight the fact that the primary goal of the Integrated layer is to be easily extensible, scalable and allowing to track the history of changes in data. Therefore it requires lots of objects that should support those objectives that need to be very well organized.

Although I am not a huge fan of abbreviations, I need to admit that they are really necessary. Of course, you can start building your integrated layer using full names, but as the complexity of your DWH grows, sooner or later you will realize that you need to start using them. Thus I suggest starting to abbreviate the name of your objects from day 1.

Another unpopular rule I would like to share is the table numbers. Each table that is part of your model in the Integrated layer should contain an internal number, that should be used as a prefix in the object names. The numbers can also act as some kind of name aliases to simplify the reference to particular tables, as the full name may be long and complex. Just to remind you, here we are discussing the Integrated layer, the core of the data warehousing, the model that very often has hundreds of tables, so not to organize them through numbers seems to be very reckless.

Now let’s go through different prefixes for tables and views you might use in your Integrated layer. The general pattern is: Prefix + Table number + Business name Prefixes for tables:

Prefixes for views:

Access and Datamart layer naming standards

Unlike the Integrated layer, in Access layer we should avoid abbreviations because it is the layer where business users may access directly. The only acceptable abbreviation is DIM, stands for dimension. This abbreviation can be used together with FACT as the only common prefixes used here:

All access layers have to be built as star schema, so there is no need for any other prefixes. Sometimes, when you have multiple star schema models in your access layer, I suggest adding given subject area name after the DIM/FACT prefix to logically organize objects from the same area, for example, DIM_HR_REGION

comments powered by Disqus