Wednesday, April 27, 2011

Physical Layer Tips

After you import, set up a PK and some joins, you usually are done with the physical layer – aside from the occasional alias. But you should pay more attention to it, as it can have some unforeseen problems if you overlook the details.

Circular Joins
The first physical layer problem is Circular Joins. A circular join is where there are 2 routes to get from one table to another, going from a dimension to a fact. Aggregate tables don’t count, as they are alternate tables used and won’t cause circular joins in the SQL. A real circular join is where usually 2 dimension tables join to each other, and each dimension tables also joins to a fact. This can happen frequently with small look-up tables in a non-denormalized schema, such as a snowflake or Transactional model. 

There are three ways to fight this, and if you follow them, you will never ever encounter them. In fact, I haven’t seen a circular join in a few years. 

Import physical tables without FK joins turned on. Make the joins yourself, so there is nothing hidden that could cause problem. You should “own” your joins, not the other way around. 

Aliases are the technique one uses to eliminate circular joins. An easy example of how to use aliases is when you have a lookup table, like W_LOV_D – each version of it requires an alias, which will be joined to differently. 

With all tables & aliases, using a naming standard will guarantee that only one physical layer object will exist in one Business Model object. If you prefix every object in the physical layer with its Logical usage first (essentially making everything an alias), you will not be able to use the Customer (W_COUNTRY_D) table in the Employee logical table, as the standard would require you to use a table alias called Employee (W_COUNTRY_D). The 7.9 OOB repository has finally adopted a similar approach.

Nullable Flag
The most devious Gotcha however is the nullable flag on each column. Since many DBAs get lazy and do not set NOT NULL constraints in the database; this mis-information is consumed by the BI Server and gives it a tummy ache. Incorrect not-null settings are important to the BI server as they guide how it not only generates SQL but does internal joins as well. 

By leaving the default of Nullable set to true, just think what that is saying: it is saying that one can expect null values to appear in the column. Whenever a join on this column is needed, such as for a subtotal or a logical query that derives into multiple physical SQLs, the join syntax can be affected. When subtotalling on such a field, certain kinds of metrics that are calculated in the SQL (such as a count distinct) can cause problems. 

When subtotalling on a field in a report, this will translate into a multi-part query of the basic format: 

Select fields 

from Select (Detail Rows) DR, 

Select (SubTotal Rows) SR 

Where DR.SubTotal Field = SR.SubTotal Field 

Note that for straight additive measures, the server will do the calculation and not the SQL. 

If the field you are SubTotaling on is set to nullable, then you have the following join condition: 

nvl(DR.c1 , 88.0) = nvl(SR.c1 , 88.0) 

and nvl(DR.c1 , 99.0) = nvl(SR.c1 , 99.0) 

No that is not a typo. Although the join condition it generates is a little bizarre by being redundant, it is clear that the server is taking into consideration that the field may be null. For varchar fileds, it uses ‘q’ and ‘z’ instead of 88 and 99. (This is dangerous if you actually have those values in the field!) 

When set to Not Null, your join condition will be simply: 

D1.C1 = SR.C1 

There are two errors you can make, each the opposite of the other: 

Setting the column to null when it is really not null. This will give you the correct answer, however it will be slower than the straight join due to the non-use of indexes. 

Setting the column to not null when it is really nullable. This will result in the wrong answer, as the equi-join will remove null rows from the recordset completely. 

Thus, be very accurate about your nullable settings in the physical layer. Ideally this should be handled in the database data model before you import a table to ensure that your physical layer does not become out of synch with the actual database.

Primary Keys
When beginning with Analytics / OBI, a lot of people make the mistake of creating too many primary keys when their PK simply has multiple columns.
Normally in Data Warehousing, a dimension table has a single field to be its primary key.  However sometimes we do not have that luxury, and must use multiple fields to represent our PK.  The best example of this is a Fact table: its PK is a combination of all the FKS to the dimension.  For example, a small fact table with FKs to Product, Month and Region has a composite Primary Key of PRODUCT_KEY, MONTH_KEY, REGION_KEY.  It is one key and not three: in the Analytics Physical layer, configure one Primary key but select multiple physical columns to define the key.
As a basic rule, think of the Highlander: “There can be only one!”  There are advanced scenarios where their may be more than one, but 99% of the time, make sure every physical layer table has a single PK.

Complex Joins
Occasionally I will see complex joins where a complex join is not needed.  The vast majority of the time your joins should be straight FK joins, even if the PK of the parent table is a composite PK (see above).
Complex joins are used in the following scenarios:
Range Joins. Useful for YTD metrics or rolling metrics.  In 10.1.3.2, the need to do this has been eliminated, but your legacy config may still need them for this purpose.  An example might be: W_DAY_D.DAY_KEY <= FACTS.DAY_KEY AND W_DAY_D.DAY_KEY < FACTS.DAY_KEY + 7.  This query will bring back the last seven days of facts for the current day selected.  (Yes this looks strange, but try it out on a piece of paper to see how it works)
Data Type Conversion Joins.  This is common in non-star schema models, or poorly designed ones.  The most common example is the conversion of a date field to a YYYYMMDD format to join to a date table that has a field of the same YYYYMMDD format.  Incidentally, this is one of the reasons why you would have more than one PK of the date table – DAY_KEY and DAY_NUMBER would both be PKs if you had to do this.
Cartesian Joins.  In strange circumstances, a 1=1 join can be used.  Be extremely careful, as this is a Cartesian product, which means the same thing as no join, which in turn means all records from table 1 will be pared with all records from table 2.  I have had to use this in my current project extensively, where we have to force a dimension to work with a fact table for the sake of the UI.  When doing so, you must guarantee that only one of the dimensional values is selected in the report or prompt; otherwise your result will be substantially incorrect.
Scenarios when you should not use a complex join
1:1 Joins.  This is common when using _DX and _FX tables in the Oracle BI applications (the RMW).  A complex join frequently is over included in SQL, meaning that sometimes it is added and slows down performance even when it is not needed.  Model this with the _DX or _FX as the parent of the base _D or _F table using a regular FK join

Filtering
This is where there is a filter in the join to exclude some rows from one of the tables.  Think about a LOV table.  One could make a join to a LOV and filter out the TYPE field to just ‘Case Status Types’.  Although this will work, it is poor design and can be missed in other joins.  It is best to put the filters in the LTS Content tab for the Dimension table it is mapped to.
Best of luck with your Physical Layer.  

2 comments: