We have already discussed multidimensional schema in another article in this series. These schemas are designed to effectively model data warehouse systems. These can also address the needs of even the larger databases designed for analytical purposes (OLAP or online analytical processing databases). Data warehousing schema is an important consideration to make while trying out multidimensional schemas, and here we will discuss various types of warehouse schemas to consider.
Different types of data warehouse schemas
Here are the four major types of multidimensional data warehouse schemas, each of which comes with some unique benefits.
- Star Schema
- Snowflake Schema
- Galaxy Schema
- Cluster schema
In the data warehouse Star Schema, the center of a star can have the fact table and many tables associated with it specifying various dimensions. This is known as a star schema, and the structure of this looks like of a star. Star Schema data model is also the simplest among the warehouse schemas. It is also called Star Join Schema, which can effectively be used for querying on a larger set of data.
Star schema characteristics
- The dimensions of Star schema are represented with a single one-dimensional table.
- This dimension table will contain the attributes set.
- Dimension table joins to fact table by the usage of a foreign key.
- Different dimension tables are not joined to one another.
- Fact table contains the measure and key.
- Star schema is very easy to understand.
- Star schema ensures optimal disk usage.
- Dimension tables are not normalized.
- Star schema is also supported widely by business intelligence tools.
Advantages Of Star Schema
- Inquiries utilize straightforward joins while recovering the information and accordingly question execution is expanded.
- It is easy to recover information for revealing, anytime of time for any period.
Disservices Of Star Schema
- In the event that there are numerous progressions in the necessities, the current star diagram isn’t prescribed to change and reuse over the long haul.
- Information excess is more as tables are not progressively separated.
Snowflake Schema is another data warehouse that can be identified as a logical arrangement of various tables in the multidimensional DB like that in the ER diagram, which looks like a snowflake. Snowflake Schema is an extension of the Star Schema, which adds more dimension to it. Dimension tables in Snowflake Schema are normalized and can split data into different add-on tables.
Major characteristics of Snowflake Schema are
- It uses only smaller disk space.
- Easy to implement a specific dimension added to the Schema.
- With many tables, query performance is restricted.
One major challenge you may face with snowflake Schema usage is that you have to put in more maintenance efforts into it with more and more lookup tables.
Advantages of SnowFlake Schema
- Information excess is totally taken out by making new measurement tables.
- When contrasted and star pattern, less extra room is utilized by the Snow Flaking measurement tables.
- It is not difficult to refresh (or) keep up the Snow Flaking tables.
Disservices of SnowFlake Schema
- Because of standardized measurement tables, the ETL framework needs to stack the quantity of tables.
- You may require complex joins to play out an inquiry because of the quantity of tables added. Consequently inquiry execution will be corrupted.
The differences between Star and Snowflake Schemas
Here is a side-by-side comparison of the key differences between Star vs. Snowflake schema.
|Stat Schema||Snowflake Schema|
|Different dimensions related hierarchies are stored in the dimensional table.||Hierarchies are divided into different tables.|
|It consists of a fact table which is surrounded by many dimension tables.||One fact table with surrounding dimension table which is again surrounded by different dimension tables.|
|Only a single join to create the relation between fact and dimension tables||Many joins needed to fetch needed data|
|Very simple and flexible database design||Complex database design|
|Single dimension table containing aggregated data||Data is split into various dimension tables.|
|Denormalized data structure||Normalized data structure|
|Higher data redundancy||Lower data redundancy|
|Cube processing is much faster.||Cube processing may slow down due to complex joins|
|Offering queries that are higher performing using the query optimization||Schema is represented by a centralized fact table that is not connected in various dimensions.|
Galaxy Schema usually consists of two fact tables, which share the dimension tables among them. This is also known as Fact Constellation Schema. This schema is viewed as a collection of different Star Schemas, symbolically named Galaxy Schema. The shared dimensions in Galaxy Schema are known as Conformed Dimensions.
Galaxy Schema Characteristics:
- Dimensions of Galaxy Schema are divided into separate dimensions, which are specified based on various hierarchical levels. Ex: A table for geography can have various hierarchy levels as region, city, state, country, and galaxy Schema with four dimensions.
- It is possible to build Galaxy schema by splitting one-star schema into many Star schemas.
- Dimensions of Galaxy schema are larger as needed to build various hierarchy levels.
- Galaxy schema is helpful in aggregating various fact tables to understand easily.
With Snowflake schema, there are many expanded hierarchies in place. But having complex hierarchies may demand more joins and will increase the complexity of the schema. Star Schema contains collapsed hierarchies, which will further cause redundancy. So, an ideal solution is to have a fine balance between Star and Snowflake schemas, and there comes the relevance of Star Cluster Schema.
In Cluster Schema, there are overlapping dimensions that are found as forks in the hierarchies. A fork happens while a specific entity acts as the parent and splits into various dimensional hierarchies. The Fork entities are further identified as different classifications featuring one-to-many relations.
To conclude, we will have an overview of the above schemas we discussed. Multidimensional schema is the most comfortable model of data warehouse systems. Star schema is the simplest type of schema, which has a structure resembling a star. Snowflake Schema can be considered an extension of the Star schema, which adds more dimensions. In Star schema, only a single join defines the relationship between the fact table and dimension tables. In the typical Star Schema structure, there is a fact table surrounded by many dimension tables. Snowflake schema has a structure adding to the Star Schema as dimension tables further surround the dimension table. Snowflake Schema needs many joins to fetch data. Galaxy Schema consists of two fact tables that shared dimension tables. This is also called Fact Constellation Schema. Finally, the Star Cluster Schema contains various attributes of Star and Snowflake Schemas.
While planning for a schema for your data warehousing project, you need first to analyze your data’s nature and complexity to see which will match the best to your needs.
We trust you got a decent comprehension of various kinds of Data Warehouse Schemas, alongside their advantages and drawbacks from this instructional exercise.
We additionally figured out how Star Schema and SnowFlake Schema can be questioned, and which diagram is to pick between these two alongside their disparities.