6 More on Dimension Tables 119
Recognizing QR In None
Using Barcode Control SDK for Software Control to generate, create, read, scan barcode image in Software applications.
QR-Code Generation In None
Using Barcode creation for Software Control to generate, create Quick Response Code image in Software applications.
ORDER_FACTS day_key salesperson_key customer_key product_key brand_key
Read QR-Code In None
Using Barcode reader for Software Control to read, scan read, scan image in Software applications.
QR Code ISO/IEC18004 Creator In Visual C#.NET
Using Barcode generation for .NET Control to generate, create QR Code 2d barcode image in VS .NET applications.
QR Code 2d Barcode Printer In .NET Framework
Using Barcode drawer for ASP.NET Control to generate, create Denso QR Bar Code image in ASP.NET applications.
QR-Code Encoder In VS .NET
Using Barcode encoder for VS .NET Control to generate, create QR Code image in .NET framework applications.
QR Code 2d Barcode Creator In Visual Basic .NET
Using Barcode maker for Visual Studio .NET Control to generate, create Denso QR Bar Code image in Visual Studio .NET applications.
Creating Code-39 In None
Using Barcode printer for Software Control to generate, create Code39 image in Software applications.
PRODUCT product_key product product_description sku
Barcode Encoder In None
Using Barcode printer for Software Control to generate, create bar code image in Software applications.
UPC-A Supplement 5 Printer In None
Using Barcode creator for Software Control to generate, create UPC A image in Software applications.
quantity_ordered order_dollars cost_dollars margin_dollars
EAN13 Printer In None
Using Barcode maker for Software Control to generate, create GTIN - 13 image in Software applications.
Draw Code128 In None
Using Barcode printer for Software Control to generate, create Code 128A image in Software applications.
BRAND brand_key brand brand_code brand_manager category category_code
Drawing Delivery Point Barcode (DPBC) In None
Using Barcode generation for Software Control to generate, create Delivery Point Barcode (DPBC) image in Software applications.
Printing Code39 In None
Using Barcode generator for Online Control to generate, create ANSI/AIM Code 39 image in Online applications.
Product and Brand are not browsable. This design only allows them to be studied in the context of an order.
Paint Bar Code In Visual Studio .NET
Using Barcode drawer for ASP.NET Control to generate, create barcode image in ASP.NET applications.
EAN13 Creator In None
Using Barcode printer for Online Control to generate, create EAN13 image in Online applications.
Creating Code 39 In None
Using Barcode printer for Microsoft Excel Control to generate, create USS Code 39 image in Microsoft Excel applications.
Code 39 Full ASCII Decoder In None
Using Barcode recognizer for Software Control to read, scan read, scan image in Software applications.
Separation of product and brand destroys browsability
Generate Bar Code In Objective-C
Using Barcode printer for iPad Control to generate, create bar code image in iPad applications.
DataMatrix Scanner In .NET
Using Barcode decoder for Visual Studio .NET Control to read, scan read, scan image in Visual Studio .NET applications.
Breaking Up Large Dimensions
A large set of dimension attributes enables the rich analytic capability that makes the data warehouse valuable. 3 provided advice on how to fill out dimension tables with useful attributes, which contribute to the formulation of powerful queries and the development of useful reports. It is not uncommon for dimension tables to contain well over 100 attributes. Not every dimension is this wide, but every business tends to have two or three major dimensions for which a great deal of information is collected. Wide dimensions usually center on some variation of products and customers. Examples include companies, people, documents, accounts, contracts, students, laws, regulations, locations, and so forth. Sometimes, a dimension table becomes so wide that database administrators become concerned about its effect on the database. Such a concern may be purely technical but is completely valid. Very wide rows, for example, may impact the way that the database administrator allocates space or designates block size. Large dimensions can also become a concern for ETL (extract, transform, load) developers. When a table has scores of type 2 attributes, incremental updates to the dimension
can become a tremendous processing bottleneck. On top of this, large dimension tables may involve so many slow-changing dimensions that developers begin to question the meaning of the word slow. The first instinct of many designers is to divide a large dimension in half, with the two resulting tables sharing the same surrogate key. This limits row size but does have some drawbacks. While it may deal directly with width, it does not necessarily address processing bottlenecks or uncontrolled growth, and may require establishing some workarounds. Numerous options avoid splitting a dimension arbitrarily. One technique, the mini-dimension, is particularly effective in reducing processing bottlenecks and limiting growth.
Splitting Dimension Tables Arbitrarily
When the length of a dimension row pushes the database administrator over the edge, it is time to rethink dimension design. One common solution to the overly long dimension row is a simple separation of attributes into two tables. These two tables use the same surrogate key values, and they share a one-to-one relationship with one another. The excessive row length is split across the two tables, bringing row size back into the comfort zone of the database administrators. An example is shown in Figure 6-3. The customer table in Figure 6-3 is divided into two parts: customer_part1 and customer_part2. For any given surrogate key, some of the dimension attributes are stored in customer_part1 and the rest are in customer_part2. Rows in the tables have a one-toone correspondence. Customer_key 102, for example, appears in both tables exactly once. Together, these rows describe customer A501: Halfway, Inc.
DAY ORDER_FACTS day_key product_key customer_key PRODUCT
CUSTOMER_PART1 customer_ customer_ key id 102 281 966 1407 A501 A472 A472 A593 customer_ name Halfway Inc. Wooly Links LTD Wooly Links LTD ABC Paper address_line1 192 Elm St. 4710 Maple Ave. 4710 Maple Ave. 4022 Davis Highway
CUSTOMER_PART2 customer_ customer_ key id 102 281 966 1407 A501 A472 A472 A593 hq_ location Grayville, MT Springfield, NH Lawson, NH North Palte, IA annual_ revenue 500,000_ 1,000,000 Greater than 1,000,000 Greater than 1,000,000 Less than 500,000
Figure 6-3 Arbitrary separation of customer attributes
6 More on Dimension Tables 121
Drawbacks to Arbitrary Separation
While this approach addresses issues raised by database administrators, it replaces them with a series of new challenges. More importantly, it may not address any issues raised by the ETL developers.
Join Options By splitting the customer table into two halves that share the same surrogate key, there are now multiple ways to join the tables in the star. In and of itself, this is not an issue however, it may lead to confusion, and may pose problems for business intelligence tools that automatically generate queries. Figure 6-3 depicts each of the customer_keys joining back to customer_key in the fact table. This join configuration is the logical way to combine customer attributes when querying the fact table. It fits the basic query pattern introduced in 1 and allows the DBMS to perform a star join. When someone wants to browse all attributes of customer as a single logical dimension, however, the appropriate configuration may be to join customer_ part1 and customer_part2, using the customer_key. While this seems a simple distinction, when there are large teams or turnover among developers, inconsistent or improper usage may result. For example, someone might include all three joins in a query, linking each part of customer to the fact table as well as to one another. Another possibility is that a developer will join the fact table to customer_part1, and then join customer_part1 to customer_part2. This configuration sounds reasonable, but the extra join may lead to sub-optimal performance. This potential issue is discussed in 7. Business intelligence tools that automatically generate SQL queries can be thrown off when multiple ways exist to join tables. This situation may force a choice between browsability and star join optimization so that the tables are joined in a single determinate way. (This and similar issues are further explained in 16, Design and Business Intelligence. ) Fact Table Foreign Key Declarations The preceding issues notwithstanding, representing each dimension row in two parts may present a purely technical issue. The two dimension tables share the same surrogate key, providing for a complete representation of the dimension. Although we understand that the foreign key in the fact table references each of these tables, a relational database management system (RDBMS) cannot be configured for this double-reference. Each foreign key can refer to only one table. If primary key / foreign key relationships are enabled in the database, the DBA must specify which table is referred to by customer_key in the fact table. It is possible to work around this limitation by storing two copies of customer_key in the fact table. In the example, the customer_key in order_facts might be replaced by customer_ key_part1 and customer_key_part2. This is unappealing because both columns will contain the same value, but it allows the database administrator to define foreign key relationships to customer_part1 and customer_part2. ETL Processing For ETL developers, splitting a table into two parts poses a unique challenge. While there are two physical tables, the developers must treat them as one logical table. This complicates ETL processing and means that splitting the dimension table does not mitigate any processing issues surrounding the large dimension.