Overview
In this post, we give an example to show how a function-based index in Oracle can increase the speed of a query where null and not null values are being compared.
Our customer asked us to modify their application to enable an ETL tool to collect invoice records from an Oracle database. The invoice table contained upwards of 8 million records, but we were really only interested in looking at records which had been flagged for collection but not yet flagged as collected. However, the ETL tool’s method of flagging these records did not allow us to use a simple bitmap index to quickly retrieve the rows we needed.
What is ETL?
Extract, Transform, Load refers to a process that extracts data from outside sources (e.g. a file produced by a customer), transforms it to fit operational needs, and then loads it into your target database.
You can read about an example ETL tool, Pentaho Data Integration on our Technologies page.
Invoices example
Consider the following example data set for our Invoices table
In this example, the two columns with names starting ETL_ behave like flags. The possible values are null or a varchar2 value which happens to be based on a timestamp. For the purposes of this exercise, we are only interested in whether the values are null or not null.
SELECT * FROM INVOICES
WHERE ETL_collect IS NOT NULL
AND ETL_done IS NULL;
which would retrieve the following rows
Optimising the query’s performance using an index
In a table containing several million rows, this query would take too long to run. Adding a B-Tree index on ETL_collect and ETL_done wouldn’t help because the null values would not be included in the index. Bitmap indexes do include nulls but we have such a range of different not null values in our table, that a bitmap index is not the best choice.
We need to transform our ETL columns from null / not null values to 0 / 1 and then use an index to optimise the query. First we create a function that tests if a value is null, and returns a 1 or 0 accordingly –
CREATE OR REPLACE FUNCTION ETL_MATCH (
field_in IN INTEGER
) RETURN INTEGER
DETERMINISTIC
IS
BEGIN
if field_in is not null then
RETURN(1);
ELSE
RETURN(0);
END IF;
END;
By declaring this function to be DETERMINISTIC, we indicate that the function returns the same result value whenever it is called with the same argument.
Now we create an index on our ETL columns which makes use of this function on both of the columns we are testing –
CREATE INDEX ETL_Match_idx
ON Invoices (ETL_Match(ETL_collect), ETL_Match(ETL_done));
And now to revisit our original query. To force it to use this index, we need to reference the indexed columns in the where clause, and we can do that as follows:
SELECT * FROM INVOICES
WHERE ETL_Match(ETL_collect) = 1
AND ETL_Match(ETL_done) = 0;
Results
With an underlying invoices table containing more than 8 million records, the original query took 17 seconds to retrieve 70 rows. The new query using a function-based index returns 70 rows in less than 3 seconds.
Can we help you?
If you have a database or application which is running slowly, we can help you make it perform better. Please give us a call or use the contact us form on the right, and we will get in touch.