SQL Performance in Subselects | KEVIN RESCHENBERG 04-11-2004 |
One of the easiest ways to improve performance of SQL is to ensure that we are using as many of the keys of a table as possible in subselects. This is the first thing
I check when asked to review a process that is showing poor performance.
For example, many installations don't use SetID functionality for particular tables (that is, they use only one SetID for all rows of a table). Programmers might then ignore
the SetID when writing SQL:
SELECT ...
FROM PS_JOB JOB
, PS_DEPT_TBL DEP
WHERE ...
AND DEP.DEPTID = JOB.DEPTID
AND DEP.EFFDT =
(SELECT MAX(EFFDT)
FROM PS_DEPT_TBL
WHERE DEPTID = DEP.DEPTID
AND EFFDT <= ...
)
In this example, the SETID column was not used at all. If you use only one SetID in the department table (and you are certain that this will never change!), the SQL will work.
But it is inefficient because the database cannot use the main index for the table. The code should be rewritten:
SELECT ...
FROM PS_JOB JOB
, PS_DEPT_TBL DEP
WHERE ...
AND DEP.SETID = JOB.SETID_DEPT
AND DEP.DEPTID = JOB.DEPTID
AND DEP.EFFDT =
(SELECT MAX(EFFDT)
FROM PS_DEPT_TBL
WHERE SETID = DEP.SETID
AND DEPTID = DEP.DEPTID
AND EFFDT <= ...
)
Notice that SETID is being used both in the outer part (DEP.SETID = JOB.SETID_DEPT ) and in the subselect (SETID = DEP.SETID ). Each of these
lines will improve performance. The performance gains can be dramatic. In one recent case we saw the run time of a heavily used view drop by 87% after a missing key
was added to the subselect.
There are cases in which you do not want to join on a particular key. It depends on your requirements. But I'd guess that over 95% of the time you would want to
use all keys as in the example above. If you have a poorly performing report or process, look over your SQL to see if this might be the cause.
|