At a recent running of our Introduction to SQL course, a delegate asked me a great question, when we were on the subject of joining multiple tables. He had a requirement to link two tables, based on values that were not held in different fields of the same record, but in the same field in different records. The platform was SQL Server 2005.
Problem
Specifically, he had a table which held details of stocks traded. For simplicity, let’s just image it has two columns – amount_sold and date_sold. Such a table might look like this:
SELECT amount_sold, date_sold FROM stock_sales amount_sold date_sold ----------- ----------------------- 1700 2010-02-15 00:59:49.190 1900 2010-02-16 05:47:49.203 1400 2010-02-24 20:11:45.873
The table he needed to join to held exchange rate values at different points in time. Again stripped down to the columns relevant to our problem, we could imagine it would look something like this:
SELECT rate, effective_date FROM exchange_rate rate effective_date --------- ----------------------- 1.09 2010-01-01 00:00:00.000 1.12 2010-02-17 11:14:00.000 1.2 2010-02-20 13:24:00.000
What he wanted to do – as you can probably already anticipate – is to find the appropriate exchange rate for each sale, in other words to join the stock_sales table to exchange_rate, where date_sold is greater than or equal to effective_date, but less than effective_date in the next exchange_rate record.
Great problem! In 12 years I find it difficult to believe I haven’t come across this type of problem before, but if I have I can’t remember it! So it gave me some thinking to do.
Here is my answer to the problem. Please let me know if you can find a better one!
My solution
My immediate thought was that if the exchange_rate table was in a format that held “effective from” and “effective to” columns, then the solution would be easy to achieve with a standard non-equijoin. So if it looked like this:
rate effective_date_from effective_date_to ------- ----------------------- ----------------------- 1.09 2010-01-01 00:00:00.000 2010-02-17 11:14:00.000 1.12 2010-02-17 11:14:00.000 2010-02-20 13:24:00.000 1.2 2010-02-20 13:24:00.000 2010-02-27 14:44:49.830
Then this query would give him the data he needed.
SELECT amount_sold, date_sold, rate FROM stock_sales INNER JOIN exchange_rate ON (stock_sales.date_sold >= exchange_rate.effective_date_from AND stock_sales.date_sold < exchange_rate.effective_date_to) amount_sold date_sold rate ----------- ----------------------- ------ 1700 2010-02-15 00:59:49.190 1.09 1900 2010-02-16 05:47:49.203 1.09 1400 2010-02-24 20:11:45.873 1.2
So, if I could get my exchange_rate data into that format, then I could use an inline view to get my desired result. (I could also use a common table expression, and this would be slightly easier, but as these weren’t covered on the course I was delivering I preferred to stick to inline views). So how to get my data in the format I wanted?
I thought a good starting point would be to generate two copies of the exchange_rate table data. If I could then give each one an ID column, then I could join them together, matching each row in the first table to the following row in the second table. Then the effective_date column from the first table would be my “effective_date_to” column, and the same column from the second table would be “effective_date_from”. To generate the “ID” column for each, I used the ROW_NUMBER() function.
The query
SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date FROM exchange_rate
gives me my first copy of the data, with a numeric “key” against each record, like so:
rownum rate effective_date -------- ------- ----------------------- 1 1.09 2010-01-01 00:00:00.000 2 1.12 2010-02-17 11:14:00.000 3 1.2 2010-02-20 13:24:00.000
I can therefore slightly adapt that, subtracting 1 from the generated number, to provide a second copy of the data:
SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, effective_date FROM exchange_rate rownum rate effective_date -------- ------- ----------------------- 0 1.09 2010-01-01 00:00:00.000 1 1.12 2010-02-17 11:14:00.000 2 1.2 2010-02-20 13:24:00.000
Each of these queries can therefore be made into inline views, and these can then be joined on the rownum column. This will join together records in the first result set with the next record on in the second. (If I was happy to use a common table expression I wouldn’t need to take this second step, I could have declared a CTE from the first query and used it twice.)
SELECT exchange_rate1.rate
, exchange_rate1.effective_date AS effective_date_from
, exchange_rate2.effective_date AS effective_date_to
FROM
( SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date
FROM exchange_rate) exchange_rate1
LEFT OUTER JOIN (SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, effective_date
FROM exchange_rate ) exchange_rate2
ON (exchange_rate1.rownum = exchange_rate2.rownum)
rate effective_date_from effective_date_to
------ ----------------------- -----------------------
1.09 2010-01-01 00:00:00.000 2010-02-17 11:14:00.000
1.12 2010-02-17 11:14:00.000 2010-02-20 13:24:00.000
1.2 2010-02-20 13:24:00.000 NULL
This is now pretty close to what we want. I just need to handle trades made since the last exchange_rate became effective – i.e the NULL that appears at the end of my result set. There are a couple of ways to do it, but for simplicity I will just use ISNULL to replace the null with any future date. Any future date is fine, as no future trades are likely to be recorded in my stock_sales table.
SELECT exchange_rate1.rate
, exchange_rate1.effective_date AS effective_date_from
, ISNULL(exchange_rate2.effective_date, DATEADD(day, 1, getDate())) effective_date_to
FROM
( SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date
FROM exchange_rate) exchange_rate1
LEFT OUTER JOIN (SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, DATEADD(millisecond, -1, effective_date) effective_date
FROM exchange_rate ) exchange_rate2
ON (exchange_rate1.rownum = exchange_rate2.rownum)
rate effective_date_from effective_date_to
------ ----------------------- -----------------------
1.09 2010-01-01 00:00:00.000 2010-02-17 11:14:00.000
1.12 2010-02-17 11:14:00.000 2010-02-20 13:24:00.000
1.2 2010-02-20 13:24:00.000 2010-02-26 15:38:29.493
Having got my data in to the format I was looking for, I can now join in my stock_sales table using a standard non-equijoin:
SELECT amount_sold, date_sold, rate FROM stock_sales INNER JOIN (SELECT exchange_rate1.rate , exchange_rate1.effective_date AS effective_date_from , ISNULL(exchange_rate2.effective_date, DATEADD(minute, 1, getDate())) effective_date_to FROM ( SELECT ROW_NUMBER() OVER (ORDER BY effective_date) rownum, rate, effective_date FROM exchange_rate) exchange_rate1 LEFT OUTER JOIN (SELECT (ROW_NUMBER() OVER (ORDER BY effective_date))-1 rownum, rate, DATEADD(millisecond, -1, effective_date) effective_date FROM exchange_rate ) exchange_rate2 ON (exchange_rate1.rownum = exchange_rate2.rownum) ) exchange_rates_range ON (stock_sales.date_sold >= exchange_rates_range.effective_date_from AND stock_sales.date_sold < exchange_rates_range.effective_date_to) amount_sold date_sold rate ----------- ----------------------- ---------------------- 1700 2010-02-15 00:59:49.190 1.09 1900 2010-02-16 05:47:49.203 1.09 1400 2010-02-24 20:11:45.873 1.2
Conclusion
This was a nice little problem, and one which I’m sure will have many practical uses. I would be interested in hearing anyone else’s alternative ways to solve it.
About the poster

Dean Bullen is Impartica’s SQL course leader and has been working with SQL since 1998. He previously worked for Oracle and has been working with SQL Server since 2006.