Junk Dimensions – a solution to a tricky problem

I’ve been struggling to…

…work out how to cope with a very specific problem related to my database and had it in the back of my mind that a Junk Dimension might do the trick. I have been lucky enough to spend a good part of my career testing Kimball style warehouses, so most of his concepts are familiar to me, however, I’ve never had occasion to put them into practice.

The problem that I face…

…is that I want to hold a set of indicators on each customer to reflect whether they are a member of any of a set of 500+ groups, but more importantly I want to be able to quickly find other customers who do not hold the same set of group memberships. When I first modeled this I thought that I would simply hold the customers in a dimension, the groups in a dimension and create a fact table containing a foreign key to the customer dimension and a foreign key to the group dimension for those groups that the customer was a member of. Therefore having multiple records for each customer, so for example, if a customer was a member of 130 groups, he would have 130 records in the fact table  (one for each group that he was a member of). This seemed ok (and a better idea that creating 500 columns on the fact table [one for each group]) until I started to write a query to return customers who were not members of  the same set of groups as a specific customer. Yes, I could write the query and get the data, but the amount of processing that would be going on to get this information seemed excessive

….so I brought in a Junk dimension.

At first I was a bit worried that I might have too many combinations of values to make any performance gain, but with just over 500 groups and only two possible flags (0 or 1) the table will contain ~250,000 rows which doesn’t seem excessive.  So I created my junk dimension with a groupMemberID as its primary key (an integer) and each group as a column. Each column contains a memberFlag which is a bit (0,1). My fact table now contains the customerID plus groupMemberId (the foreign key to the groupMemberJunkDimension) (plus some other stuff).

Using this junk table I can write a very simple query in my stored procedure which says

SELECT customerID from customerGroupFact
WHERE groupMemberID <> @groupMemberId

A simple solution to an awkward problem.

Got a Resource id #5 appearing, don’t worry!

So I was echoing out my $query and my $result variables and I coded my site in order to check my SQL statements and I came across a value of ‘Resource id #5’ in my echo values where I instead expected to see a customerID.

The answer? Well,  you can’t echo the result of a sql query,   i.e you can’t do this:

$query = "SELECT CustomerID  FROM CUSTOMER WHERE emailAddress = '".$emailAddress."'";
echo $result;

instead you need to identify which item in $result you want to echo (and for which row). In my instance the item was CustomerID and  I thought it would be good to iterate through all the rows in the resultset, so I echoed it like this:

$query = "SELECT CustomerID  FROM CUSTOMER WHERE emailAddress = '".$emailAddress."'";

//Get the number of rows in the recordset

//Set the initial value of the counter to zero (the first position in the resultset)

//iterate through the resulset and echo the value of the CustomerID for each record
while ($i < $num) {
echo mysql_result($result,$i,"CustomerID");

It’s an easy mistake to make, but a bit thoughtless when you think about it..after all, how can we expect to echo a recordset? I don’t think I’ll be making that mistake again!!

How to post the values of set checkboxes back to the database – Stage 5 complete!

Much play has stopped work for a number of days, but tonight I picked up my laptop and went back to the problem that I had in passing the values of the checkboxes that the user had checked back to the database.

Why on earths name I thought I might need to use javascript I have no idea…too much internet reading of the wrong thing I think.

Anyway, here’s how I managed to solve the problem of presenting the list of items from the database with checkboxes and then updating the database with values of the checked checkboxes. The first part gets the items from the database and displays them on the page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Diplay data as checkboxes</title>


// Include connection to my local server
//Connect to the database - 
mysql_connect ($server, $username, $password, $dbName) or die ('Error: ' .mysql_error());
//Create the query
$query="SELECT * FROM TableName WHERE columnName = 1";
//Define the variable that will hold the result set
//Find out how many records there are in the resultset

<!-- Create the form - this will tell the page which php script to get when the form is submitted --->
<form action="process_changes.php" method="post">

<!-- go into php scripting to obtain each value individually from the resultset --->
//Set up the loop to loop thru the result set
while ($i < $num) {
//set the value from the current row of the resultset to the $field1 variable

<!-- Display the checkbox on the page containing the value from the current row of the resultset --->
  <input type="checkbox" name="items[]" value="<?php echo $field1;?>" /> <?php echo $field1;?><br />

<!-- break back into php to increment the variable - and therefore go to the next row of the resultset -->
	//increment the counter


<!-- Place a submit button on the webpage, so that the checkbox values selected are passed when the button is clicked -->
  <input type="submit" value="Submit" />


The second part is the script process_changes.php (which is called from the previous script). This script gets the items array (which holds the list of checkbox values) from previous page and builds an SQL INSERT statement for each item. This may not be the way that you want to send your inserts to the database, but it gives you an idea anyway.


$checkBox = $_POST['items'];
for($i=0; $i<sizeof($checkBox); $i++){
    $query = "INSERT INTO table(TableToInsertInto) values('".$checkBox[$i]."')";
//	echo $query;
   mysql_query($query) or die(mysql_error());


So Stage 5 is now complete – I now just need to collect some more data from my user and write some more update and insert queries and then I get to work on Stage 7, which is finding out how to send a verification e-mail to a user and then process it’s return. I expect that this will be pretty easy since it’s the corner stone of a lot of web actions. I’m more than 25% of the way through now.  It’s great to be doing something and not just thinking about it.

We have reference data and a page with dynamic content

My reference data table is now loaded and I have now managed to get the data from my database and out onto the lovely php page, I have dymamically generated content!

I think I can say that stage four is now complete.

Oh I found out today that sometimes, if you don’t restart your webserver after making code changes, the code hangs onto old values in some it’s variables, so now I’m doing a webserver restart after every major update to the code…I’m nots sure yet if this is normal, or if it’s just my code that’s bad…I need to do more reading a learn a bit more before I figure this out.

The easy part – mySQL

If there is one part of this that I’ve found easy it’s the creation of my database. I can’t believe that mySQL & the tool mySQL Workbench is free. For anyone who is used to creating databases with something like MS SQL Server it is cinch to move across too. Although I personally find it easier to hand code my DDL for a database, the design tool does look useful for anyone new to SQL. I found that it was useful to be able to reverse engineer my database using the design tool so that I could just do a quick visual check to make sure that I had implemented the correct foreign keys.

There are certainly syntax differences between MS SQL Server and mySQL, but nothing that a quick lookup of the mySQL manual doesn’t fix in a few seconds. I’ve yet to write a stored procedure in mySQL and it might get more difficult when I’m trying to use T-SQL everywhere, but I’m sure there’s a T-SQL to mySQL crib sheet out there on the internet somewhere.

By the way, it took my a while to work out that command + return executes only the highlighted statement….I was cursing the inability to execute selected statements for a few hours then I realised that there were two execute buttons. Just thought I’d mention it case anyone else who had moved from SQL Server Management Studio was as dumb as me.

Which Development Environment?

I’ve been trying to find out which IDE I should use to start on my journey.

(Why does this wordpress do double line spacing?)

In my google search for IDE’s on a Mac I came across the blog of a guy called Sean Barton (http://www.sean-barton.co.uk/2009/02/setting-up-a-phpmysql-local-development-environment-on-a-mac-doing-it-properly/) and thought, that looks good, I’ll learn how to set up a php and mysql development environment on my mac. So, I followed his instructions and downloaded and installed mysql then configured apache to load the php module (I think this is the right terminology!).

The big question is, was it simple? The answer is, not as simple as I thought it would be. Below are the things that I learned along the way…they might be useful for another complete newbie.

Tip1: It’s the MySQL Community Server edition that you want to download ( I hope)

Tip 2: Make sure you use a mirror from the mysql site to get a .dmg file and not a .tar file.

Tip 3: To a copy a previous command in a Terminal window use crtl+P. For a full list of commands see this helpful bash cheat sheet

Tip 4:  How do you edit a file using vi? Answer: I”ve no idea – I used nano [to do this I substitued the word pico for the word vi when I executed this command ]

sudo vi /private/etc/apache2/httpd.conf

Tip 5: How to edit a file using nano? Answer : Wherever you see the ^ sign means hold the crtl key! – This site contains a neat crib sheet.

Tip 6: apachectl is not the same as apachrct1 – be careful of 1s and ls when looking at text which uses a terminal style font

Tip 7:  The command sudo /usr/sbin/apachectl restart doesn’t work. It errors with:

line 82: ulimit: open files: cannot modify limit: Invalid argument

The guy at this deversus page explain a way around it, but to be honest I just ignored it and continued.

Anyway, after removing the index.html page from the /Library/WebServer/Documents folder and navigating to in my browser and pressing refresh a few times,  I finally saw a page come up containing a question mark on it…so I guess my environment is now working!!

So, lots of new things learned today. Roll on tomorrow when I’ll be working out what I should do next!