Dimensional Modeling for a Data Warehouse

Sagadevan K
3 min readJun 6, 2022

--

Image from Flaticon

What is dimensional modeling?

Dimensional modeling is a type of data modeling which is designed to optimize data retrieval speeds for analytical purposes in a data warehouse. Relational data models focus on efficiently storing the data whereas in dimensional modeling, redundancy is increased so that data can be located easily and hence retrieval is faster.

Components of a Dimensional Model

Fact Table: Facts are the metrics of a business process. Fact table consists of measures and foreign keys to dimension tables. Fact table is generally found at the center of a Star or Snowflake schema.

Dimension Table: Dimension tables consist of primary key and attributes which describe the fact table. Dimensions are loaded before facts. An analogy for the same would be: a product cannot be sold if it does not exist. i.e Product information should be added before sales information.

Good to know:

Star Schema: An architecture in which the dimensions are denormalized. The fact table is located at the center and dimension tables are connected to the fact table(like points of the star).

Snowflake Schema: An architecture in which the dimensions are normalized. It is an extension of the Star schema where dimensions may be connected to dimensions.

Steps to design a dimensional model

  1. Identify the business process:

Isolate processes around which questions are generally asked. For e.g. What was my profit margin last month? What was my sales revenue for the last 6 months?. Try to identify processes from these questions. Processes can span across multiple departments. For e.g. Sales as a process depends upon Marketing, Publicity, etc.

While starting, it is good to choose processes which are high impact and low risk. Impact depends upon the needs of the organization (which process needs more visibility), while risk depends upon data availability, data quality, amount of data cleaning required, …

2. Identify the grain:

Grain tells us about the lowest level of information stored in the fact table. What does one row of the fact table represent? e.g. Data at a month level or day level, all items of an order in one row or one row for each item of an order, one row per product or one row per store or one row per product per store per day.

The grain must be decided before choosing dimensions or facts — Data Warehouse Toolkit.

Choose the most atomic level, so that aggregations, roll up, slicing, dicing can become easier.

3. Identify the dimensions:

Dimensions provide descriptive information about the business process. Answers to Who, What, When, Where can be considered dimensions. For e.g. Who was the sale made by: Customer, What does the sale comprise of: Product, When did the sale take place: Time, From Where did the sale take place: Location.

The attributes of these dimensions should be descriptive. It is a good practice to use small data types for these attributes.

4. Identify the measures:

In this step, we select the metrics which enable us to make better decisions. Metrics like quantity, sales amount, profit. Rollup, slicing, dicing work well on fully additive measures.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Sagadevan K
Sagadevan K

Written by Sagadevan K

Data Engineer, Writing to learn

No responses yet

Write a response