I’m not so sure now that the Junk Dimension will work. For a start, I’m not much of a maths expert so I can’t work out the formula to populate the table with all the required combinations…any experts out there, then let me know!
Apart from that, I’m taking a break, so there will be no more updates for a short while, for a start I don’t have internet access and my mind is elsewhere.
I’ve been struggling to…
…work out how to cope with a very specific problem related to my database and had it in the back of my mind that a Junk Dimension might do the trick. I have been lucky enough to spend a good part of my career testing Kimball style warehouses, so most of his concepts are familiar to me, however, I’ve never had occasion to put them into practice.
The problem that I face…
…is that I want to hold a set of indicators on each customer to reflect whether they are a member of any of a set of 500+ groups, but more importantly I want to be able to quickly find other customers who do not hold the same set of group memberships. When I first modeled this I thought that I would simply hold the customers in a dimension, the groups in a dimension and create a fact table containing a foreign key to the customer dimension and a foreign key to the group dimension for those groups that the customer was a member of. Therefore having multiple records for each customer, so for example, if a customer was a member of 130 groups, he would have 130 records in the fact table (one for each group that he was a member of). This seemed ok (and a better idea that creating 500 columns on the fact table [one for each group]) until I started to write a query to return customers who were not members of the same set of groups as a specific customer. Yes, I could write the query and get the data, but the amount of processing that would be going on to get this information seemed excessive
….so I brought in a Junk dimension.
At first I was a bit worried that I might have too many combinations of values to make any performance gain, but with just over 500 groups and only two possible flags (0 or 1) the table will contain ~250,000 rows which doesn’t seem excessive. So I created my junk dimension with a groupMemberID as its primary key (an integer) and each group as a column. Each column contains a memberFlag which is a bit (0,1). My fact table now contains the customerID plus groupMemberId (the foreign key to the groupMemberJunkDimension) (plus some other stuff).
Using this junk table I can write a very simple query in my stored procedure which says
SELECT customerID from customerGroupFact
WHERE groupMemberID <> @groupMemberId
A simple solution to an awkward problem.