I recently have been receiving emails for consulting work.  This is a good thing because I’m back on the market.  However; some of the emails I have been receiving are asking for PII.  This information can be used for identity theft.

Here’s an example: Asking for my date of birth and the last four digits of my Social Security Number.  There is no good reason for any head hunter to be asking for that information.  If you receive a query for a job asking for this information, hit the delete button.


Insert Performance analysis part 2

I promised I would have more today.  Here is the raw data from my analysis.  As you can see as rejection (dup_val_on_index) increased, the execution time for select + insert decreased and execution time increased for insert with exception.

This specific use case is a transactional system requiring  a unique index on ins_demo.column1 and we expected 20% of the time the unique index would be violated.

Here the two insert functions we are testing for performance.  They are really quite simple.  The first one is doing an insert into ins_demo and if there is a dup_val_on_index then return false. 

The second function does a select against ins_demo on the unique key.  If the key is not found then insert the value and return true, if the key is found then return false.

1 Begin 2 insert into ins_demo values (p_value); 3 return true; 4 exception when dup_val_on_index then 5 return false; 6 end;

1 begin 2 select count(*) 3 into dummy 4 from ins_demo 5 where column1 = p_value; 6 7 if dummy = 0 then 8 insert into ins_demo values (p_value); 9 return true; 10 else 11 return false; 12 end if; 13 end;

Here are the performance number for the two functions. I wrote a driver function that called the two functions 100,000 times and each once for a specific reject rate.


This bubble chart shows execution time on the y axis, rejection rate on the x axis and bubble size is the insert cpu time. 


We should all strive to analyze changes prior to making recommendations.  If I had gone with my knee jerk reaction to the select+insert and insisted on insert with an exception, system performance would have suffered.

This weekend I will run the same experiment using the merge command.  I expect the merge command to be the fastest option, but as always, we require hard data before we make any recommendations.

Insert performance on a unique constraint.

In the past, I have read about the performance of exception handlers and all that is said is, There is performance overhead. Okay, the generic answer of “performance overhead” tells us nothing.

One of the great pleasures of what I do is code reviews.  I get to learn techniques from other developers and I get to pass on some of my knowledge.  Code reviews are a win-win all around.

Recently, I reviewed some code that had the following code block to enforce a unique constraint. (tables and variables have been changed to protect intellectual property)  There is a unique constraint on ins_demo.column1, so the select will do a index unique scan for every select.  The use case for this example is a transactional system where different users will be imputing data.

select count(*)
into dummy
from ins_demo
where column1 = p_value;

if dummy = 0 then
insert into ins_demo values (p_value);
return true;
return false;
end if;

My knee jerk reaction is to change this from a select+insert to insert with an exception handler to improve performance.

insert into ins_demo values (p_value);
return true;
exception when dup_val_on_index then
return false;

Before making recommendations, make sure you first do no harm. So, in this case I decided to run some performance test against the select+insert and the insert with an exception.  The test will attempt to do 100,000 inserts starting will 100% success to 100% failures.

The results of the test have a few surprises.

Select+Insert 100% success:

execution time = 21.93 seconds

Insert CPU= 9.6

Select CPU = 2.55

Insert with exception 100% success:

execution time = 14.29 seconds

Insert CPU = 9.95

Recursive CPU = 8.96

 Select+Insert 100% reject due to duplicate rows:

execution time = 5.86 seconds

Insert CPU = 0

Select CPU = 1.73

Insert with exception 100% rejected due to duplicate rows:

execution time = 135.03 seconds

Insert CPU = 16.85

Exception CPU = 20.7

We can learn a few things from this.

1) In the select + insert method, as the number of rejections increased execution time decreased, insert CPU decreased and select CPU decreased.  This is due to a couple of things.  A unique index scan is faster when the data is found and as the number of rejected rows increase due to unique index violation, the fewer times an insert is executed.

2) In the insert with exception handler, execution time increases as the number of rows are rejected due to unique index violations increases.  This is because Oracle attempts to do the insert and then must roll the transaction back.

I will post a more detailed analysis with the raw performance data tomorrow.

displaying histograms of date and number data

It started with a bit of pl/sql in an ole’ library of mine. Knowing the old code was not the best way to display histogram data and If you can do something in one sql statement then that’s how you should do it.

Frequently when walking into a new environment we need to learn about the data.  The cardinality along with the clustering of the data has a big impact on the CBO, so understanding your data is critical to optimizing the performance of your database.

First I need to load up a table with some test data.  updt_stat simply takes the order date and returns a status.  This is to simulate and order entry system.

1 create or replace function updt_stat (pdate in date) return varchar2 as 2 value varchar2(10); 3 tmp number; 4 begin 5 tmp := dbms_random.value(0,10); 6 case 7 when pdate > sysdate - 50 and tmp > 9 8 then value := 'Canceled'; 9 when pdate <= sysdate - 50 10 then value := 'Shipped'; 11 when pdate >= sysdate - 50 and pdate < sysdate - 1 12 then value := 'Picked'; 13 when pdate >= sysdate 14 then value := 'Ordered'; 15 else 16 value := 'Back Order'; 17 end case; 18 return value; 19 end; 20 / 21

Then lets create a test table.

1 create table t1 ( 2 id number primary key, 3 order_date date not null, 4 col1 number not null, 5 status varchar2(25) not null); 6 7 alter table t1 modify id default t1_seq.nextval;

Now populate the table with some test data.

1 declare 2 --create or replace procedure simulate_order_entry as 3 4 i number; -- an index variable 5 sdate date; -- the date we are going to insert. 6 7 cursor c1 is 8 select dbms_random.value(0,10000) col1 9 from dual 10 connect by level <=36500; 11 begin 12 13 -- one year of data 14 select sysdate-365 into sdate from dual; 15 i := 1; 16 17 -- start the loop, 36,500 times 18 -- one year of data 19 20 for rec in c1 21 loop 22 if ( i / 100 = 1 ) then 23 i := 1; 24 sdate := sdate + 1; 25 end if; 26 27 insert into t1 (order_date, col1, status) 28 values 29 (sdate, rec.col1, updt_stat(sdate)); 30 31 i := i+1; -- i++ 32 end loop; 33 commit; 34 end; 35 / 36

It’s always been easy to display a histogram of text data. We’ve been doing this for decades.

1 RLOCKARD@pdev > select status, count(*) 2 from t1 3 group by status; 4 2 3 5 STATUS COUNT(*) 6 ------------------------- ---------- 7 Shipped 31284 8 Ordered 240 9 Picked 4382 10 Back Order 86 11 Canceled 508 12

But what about getting a histogram of the order date.  Oracle provides the function width_bucket that solves this problem.

Description of width_bucket.gif follows

1 RLOCKARD@pdev > select min(order_date), 2 max(order_date), 3 width_bucket(order_date, sysdate-365, sysdate+20, 12) bucket, 4 count(*) 5 from t1 6 group by width_bucket(order_date, sysdate-365, sysdate+20, 12) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(ORDER MAX(ORDER BUCKET COUNT(*) 10 --------- --------- ---------- ---------- 11 03-SEP-13 03-SEP-13 0 99 12 04-SEP-13 05-OCT-13 1 3168 13 06-OCT-13 06-NOV-13 2 3168 14 07-NOV-13 08-DEC-13 3 3168 15 09-DEC-13 09-JAN-14 4 3168 16 10-JAN-14 10-FEB-14 5 3168 17 11-FEB-14 14-MAR-14 6 3168 18 15-MAR-14 15-APR-14 7 3168 19 16-APR-14 17-MAY-14 8 3168 20 18-MAY-14 18-JUN-14 9 3168 21 19-JUN-14 20-JUL-14 10 3168 22 21-JUL-14 21-AUG-14 11 3168 23 22-AUG-14 06-SEP-14 12 1553 24 25

What about displaying numeric data.  We change the parameters of width_bucket to numeric data.

1 RLOCKARD@pdev > select min(col1), 2 max(col1), 3 width_bucket(col1, 0, 10000, 10) bucket, 4 count(*) 5 from t1 6 group by width_bucket(col1, 0, 10000, 10) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(COL1) MAX(COL1) BUCKET COUNT(*) 10 ---------- ---------- ---------- ---------- 11 .013540186 999.847819 1 3593 12 1000.32556 1999.71215 2 3619 13 2000.26786 2999.44229 3 3593 14 3000.33048 3999.69273 4 3746 15 4000.37659 4999.86446 5 3721 16 5000.02576 5999.80972 6 3565 17 6000.0222 6999.70225 7 3677 18 7000.05808 7999.67615 8 3644 19 8000.11224 8999.93555 9 3644 20 9000.31992 9999.63753 10 3698 21

Why does the Oracle cbo interest me so much?

There is a short answer and a very long answer to this.  I will try and keep the answer somewhere in the middle of short and long.

The first assumption we must accept and from my observations over the past several years, most Oracle developers don’t understand the Cost Based Optimizer, yet they use it everyday.

Way back in the dark ages, full table scans were evil and we used hints to get rid of this dreaded access path.  As you can see from the bellow query, we got a full table scan of t1

So we solve this by adding a hint.

That’s better, we think but lets look at a few things.  Did the hint really help? The first query had 70,308 consistent gets, 3,884 physical reads and cost was 1,065.  The second query where we thought we were being cleaver had 1,057,564 consistent gets, 5868 physical reads and the cost was 990K.

Well, maybe I was not so cleaver by forcing Oracle to use an index, maybe the CBO is smart enough to put together the correct plan and the full table scan is not evil after all.

A few years ago I was at ODTUG and heard Maria Colgan speaking about the CBO and I was hooked.  Maria made me realize I need to understand what the CBO was doing.  Yes sometimes the CBO gets it wrong, but there are ways to correct that.

I’m now putting together a series of presentations on understanding the CBO using the scientific approach.  Why not just publish how the CBO works? Why not publish ROT (Rule Of Thumb) for the CBO.  Well the answer to that is quite simple.  Every environment is different and it’s much better to understand your data and how the CBO works with your data and queries.  There will be a presentations that will go through how to read an explain plan, however the other presentations will teach you how to establish what the CBO is doing in your environment and you will be able to explain it.  That makes you smarter and more valuable.

Stay tuned, I will be posting blurbs and hints on the CBO.

A must read for anyone who #manages people

I have had the pleasure to work with some great managers who know about big boy/girl rules and sadly some micromanagers who well micromanage. Those who know me well can imagine just how well I’ve gotten along with micromanagers.


Oracle 10053 trace viewer

There are two trace files that I use when tuning.  Cary Millsap @CaryMillsap and Method-R @MethodR have a great tool for analyzing the 10046 trace file. There is the profiler and my personal favorite mrskew http://method-r.com/store .

But the 10053 trace file is cumbersome to read and interpret.  Jonathan Lewis @JLOracle posted this 10053 trace file viewer on oaktable.net. http://www.oaktable.net/contribute/10053-viewer .

If you are an Oracle DBA, Developer or Performance Engineer, these should be in your tool box.