In our ETL process, we had a data flow in our ETL tool which took about an hour to pump data from one database table into another. I tried to speed it up by converting that data flow to an SQL stored procedure, but found that some of the results were different.
After quite a bit of searching, I found a test case where the correct result and the calculated result differed in small numbers (12.33 and 14.37). I modified the select statement from the stored procedure to give me only that test case and found that it gave me the correct answer! Strange.
The statement in question was a UNION of two SELECT statements; since I knew that the test case in question could only have come from the first SELECT, I had removed the second part in my test. When I restored the second, I got the wrong answer again. Bizarre. Should the second statement be introducing the mistake? But when I executed only the second statement, I got no rows at all (as I expected).
I mentioned it to a colleague and he told me about having to be careful with UNION, and considering whether UNION or UNION ALL is what is wanted. Turned out that this was the problem: there were two rows with 12.33 and two with 14.37 and the UNION was squashing them into one since I was selecting only a subset of the columns so they seemed “the same” to it. So apparently all that was needed to fix the problem was adding “
(I like to think I'm fairly decent at SQL, but I haven't used UNION before, so I wasn't aware of this subtlety.)