Usually, when building a data model, the process of structuring a star schema needs to combine different data sources (say Google Analytics, Excel or csv files, AdWords, Web data, etc.) to build fact tables, and dimension tables. Remember fact tables add data normally, such as sales operations; and dimension tables contain descriptive information, such as customer name, customer id, customer age.
When building a model, it is useful to place fact tables in the middle, and all other dimensions around with one relationship from fact table to other dimensions. A normalized way of working is needed, as one needs to reduce redundancy and achieve consistency of the data. But what happens once we have extracted fact tables, and we need to build the appropriate dimensions to generate relationships between them?
Well, we might have two options:
Either we have the “master data” which we could use as dimensions. In some cases, this is a periodically updated table with specific changes due to the regular business operations (such as maintenance of conversion factors, equipment capacities, product attributes, etc.)
Or we need to build them instead, so that any new category of data will automatically feed the dimension table.
Let’s see what happens in the second option, where we need to build de dimensions. Suppose we have a sales table named “Traditional Sales Data”, which holds a great number of daily transactions for, say, a year.
And we also have another sales table named “Online Sales Data” which aggregates a great number of online sales.
We are asked to focus on the dimension “ship mode”, because the company is interested in analyzing whether the existing ship modes are actually profitable, or not.
To build a dimension table called “Ship Mode” we need to have all the unique values of the field ship mode, present in both “traditional sales data” and “online sales data” so that any new value in any of both fact tables will be updated in the dimension table. To accomplish this, we need to combine both fact tables, “Online Sales Data” and “Traditional Sales Data” and follow the next steps:
Create a Referenced table, of both fact tables
Combine them, considering “ship mode” as the common
First, we’ll create a “referenced” table. This implies that whatever change happens in the original table will be reflected on the referenced table. This step is key to make the dimension dynamic. So, we right click on the table we wish to create a reference, and we select the option “Duplicate”, indicated with a clip icon.
Duplicate option, copies the
Standing on one of those tables, we are going to the Merge Queries option in the Home ribbon. Maybe it’s better to merge queries “as new”, so we don’t affect the existing tables.
And we will be asked which tables we want to merge. There we select both: “Traditional Sales Data” and “Online Sales Data” tables.
As a result, both tables will be combined, and given birth to a third one named “Merged1”, which we may want to rename as desired, say, “Dim_Shipmode”. Now, we have a big table containing all the existing ship modes. We need to build a table with single values. Just select only the column “ship mode” of the merged table, and afterwards remove duplicates.
As a result, the ship mode table is built, from a combined table of both traditional sales table and online sales table.
After building the ship mode dimension, we need to generate de surrogate key, because each dimension table should contain a key column, unique per each row in the dimension table. It is better not to use the Id of the source system. It is proper that this column is numeric, and auto-increment. We can use the “index column” option, in the “add column” ribbon.
Next step is to build a relationship between both fact tables, and the new dimension table, in the relationship view. So, we leave Query Editor, and just generate the relationships.
Building dynamic dimensions is
a useful way to create data models from multiple sources, when one does not
have the master data already built and complete to create the needed dimensions.
Using duplicated tables makes the refreshing process more efficient, and avoids
time consuming maintenance of master data.