At a recent running of our Introduction to SQL course, a delegate asked me a great question, when we were on the subject of joining multiple tables. He had a requirement to link two tables, based on values that were not held in different fields of the same record, but in the same field in different records. The platform was SQL Server 2005.

Problem

Specifically, he had a table which held details of stocks traded. For simplicity, let’s just image it has two columns – amount_sold and date_sold. Such a table might look like this:

SELECT amount_sold, date_sold FROM stock_sales 

amount_sold  date_sold
-----------  -----------------------
1700         2010-02-15 00:59:49.190
1900         2010-02-16 05:47:49.203
1400         2010-02-24 20:11:45.873

The table he needed to join to held exchange rate values at different points in time. Again stripped down to the columns relevant to our problem, we could imagine it would look something like this:

SELECT rate, effective_date FROM exchange_rate 

rate      effective_date
--------- -----------------------
1.09      2010-01-01 00:00:00.000
1.12      2010-02-17 11:14:00.000
1.2       2010-02-20 13:24:00.000

What he wanted to do – as you can probably already anticipate – is to find the appropriate exchange rate for each sale, in other words to join the stock_sales table to exchange_rate, where date_sold is greater than or equal to effective_date, but less than effective_date in the next exchange_rate record.

Great problem! In 12 years I find it difficult to believe I haven’t come across this type of problem before, but if I have I can’t remember it! So it gave me some thinking to do.

Here is my answer to the problem. Please let me know if you can find a better one!

My solution

My immediate thought was that if the exchange_rate table was in a format that held “effective from” and “effective to” columns, then the solution would be easy to achieve with a standard non-equijoin. So if it looked like this:

rate    effective_date_from     effective_date_to
------- ----------------------- -----------------------
1.09    2010-01-01 00:00:00.000 2010-02-17 11:14:00.000
1.12    2010-02-17 11:14:00.000 2010-02-20 13:24:00.000
1.2     2010-02-20 13:24:00.000 2010-02-27 14:44:49.830

Then this query would give him the data he needed.

SELECT amount_sold, date_sold, rate
FROM stock_sales
INNER JOIN exchange_rate
ON (stock_sales.date_sold >= exchange_rate.effective_date_from AND stock_sales.date_sold < exchange_rate.effective_date_to)

amount_sold date_sold               rate
----------- ----------------------- ------
1700        2010-02-15 00:59:49.190 1.09
1900        2010-02-16 05:47:49.203 1.09
1400        2010-02-24 20:11:45.873 1.2

So, if I could get my exchange_rate data into that format, then I could use an inline view to get my desired result. (I could also use a common table expression, and this would be slightly easier, but as these weren’t covered on the course I was delivering I preferred to stick to inline views). So how to get my data in the format I wanted?

I thought a good starting point would be to generate two copies of the exchange_rate table data. If I could then give each one an ID column, then I could join them together, matching each row in the first table to the following row in the second table. Then the effective_date column from the first table would be my “effective_date_to” column, and the same column from the second table would be “effective_date_from”. To generate the “ID” column for each, I used the ROW_NUMBER() function.

The query

SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date FROM exchange_rate

gives me my first copy of the data, with a numeric “key” against each record, like so:

rownum   rate    effective_date
-------- ------- -----------------------
1        1.09    2010-01-01 00:00:00.000
2        1.12    2010-02-17 11:14:00.000
3        1.2     2010-02-20 13:24:00.000

I can therefore slightly adapt that, subtracting 1 from the generated number, to provide a second copy of the data:

SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, effective_date FROM exchange_rate 

rownum   rate    effective_date
-------- ------- -----------------------
0        1.09    2010-01-01 00:00:00.000
1        1.12    2010-02-17 11:14:00.000
2        1.2     2010-02-20 13:24:00.000

Each of these queries can therefore be made into inline views, and these can then be joined on the rownum column. This will join together records in the first result set with the next record on in the second. (If I was happy to use a common table expression I wouldn’t need to take this second step, I could have declared a CTE from the first query and used it twice.)

SELECT exchange_rate1.rate
,      exchange_rate1.effective_date AS effective_date_from
,      exchange_rate2.effective_date AS effective_date_to
FROM
  (  SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date
     FROM exchange_rate) exchange_rate1
LEFT OUTER JOIN (SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, effective_date
                 FROM   exchange_rate ) exchange_rate2
ON (exchange_rate1.rownum = exchange_rate2.rownum)

rate   effective_date_from     effective_date_to
------ ----------------------- -----------------------
1.09   2010-01-01 00:00:00.000 2010-02-17 11:14:00.000
1.12   2010-02-17 11:14:00.000 2010-02-20 13:24:00.000
1.2    2010-02-20 13:24:00.000 NULL

This is now pretty close to what we want. I just need to handle trades made since the last exchange_rate became effective – i.e the NULL that appears at the end of my result set. There are a couple of ways to do it, but for simplicity I will just use ISNULL to replace the null with any future date. Any future date is fine, as no future trades are likely to be recorded in my stock_sales table.

SELECT exchange_rate1.rate
,      exchange_rate1.effective_date AS effective_date_from
,      ISNULL(exchange_rate2.effective_date, DATEADD(day, 1, getDate())) effective_date_to
FROM
  (  SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date
     FROM exchange_rate) exchange_rate1
LEFT OUTER JOIN (SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, DATEADD(millisecond, -1, effective_date) effective_date
                 FROM   exchange_rate ) exchange_rate2
ON (exchange_rate1.rownum = exchange_rate2.rownum)

rate   effective_date_from     effective_date_to
------ ----------------------- -----------------------
1.09   2010-01-01 00:00:00.000 2010-02-17 11:14:00.000
1.12   2010-02-17 11:14:00.000 2010-02-20 13:24:00.000
1.2    2010-02-20 13:24:00.000 2010-02-26 15:38:29.493

Having got my data in to the format I was looking for, I can now join in my stock_sales table using a standard non-equijoin:

SELECT amount_sold, date_sold, rate FROM stock_sales
INNER JOIN
	(SELECT exchange_rate1.rate
	 ,      exchange_rate1.effective_date AS effective_date_from
	 ,      ISNULL(exchange_rate2.effective_date, DATEADD(minute, 1, getDate())) effective_date_to
	 FROM
	   (  SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date
 		 FROM exchange_rate) exchange_rate1
 	 LEFT OUTER JOIN (SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, DATEADD(millisecond, -1, effective_date) effective_date
					 FROM   exchange_rate ) exchange_rate2
	 ON (exchange_rate1.rownum = exchange_rate2.rownum)
	 ) exchange_rates_range
ON (stock_sales.date_sold >= exchange_rates_range.effective_date_from AND stock_sales.date_sold < exchange_rates_range.effective_date_to)

amount_sold date_sold               rate
----------- ----------------------- ----------------------
1700        2010-02-15 00:59:49.190 1.09
1900        2010-02-16 05:47:49.203 1.09
1400        2010-02-24 20:11:45.873 1.2

Conclusion

This was a nice little problem, and one which I’m sure will have many practical uses. I would be interested in hearing anyone else’s alternative ways to solve it.

 

About the poster

Dean Bullen

Dean Bullen is Impartica’s SQL course leader and has been working with SQL since 1998. He previously worked for Oracle and has been working with SQL Server since 2006.