displaying histograms of date and number data

It started with a bit of pl/sql in an ole’ library of mine. Knowing the old code was not the best way to display histogram data and If you can do something in one sql statement then that’s how you should do it.

Frequently when walking into a new environment we need to learn about the data.  The cardinality along with the clustering of the data has a big impact on the CBO, so understanding your data is critical to optimizing the performance of your database.

First I need to load up a table with some test data.  updt_stat simply takes the order date and returns a status.  This is to simulate and order entry system.

1 create or replace function updt_stat (pdate in date) return varchar2 as 2 value varchar2(10); 3 tmp number; 4 begin 5 tmp := dbms_random.value(0,10); 6 case 7 when pdate > sysdate - 50 and tmp > 9 8 then value := 'Canceled'; 9 when pdate <= sysdate - 50 10 then value := 'Shipped'; 11 when pdate >= sysdate - 50 and pdate < sysdate - 1 12 then value := 'Picked'; 13 when pdate >= sysdate 14 then value := 'Ordered'; 15 else 16 value := 'Back Order'; 17 end case; 18 return value; 19 end; 20 / 21

Then lets create a test table.

1 create table t1 ( 2 id number primary key, 3 order_date date not null, 4 col1 number not null, 5 status varchar2(25) not null); 6 7 alter table t1 modify id default t1_seq.nextval;

Now populate the table with some test data.

1 declare 2 --create or replace procedure simulate_order_entry as 3 4 i number; -- an index variable 5 sdate date; -- the date we are going to insert. 6 7 cursor c1 is 8 select dbms_random.value(0,10000) col1 9 from dual 10 connect by level <=36500; 11 begin 12 13 -- one year of data 14 select sysdate-365 into sdate from dual; 15 i := 1; 16 17 -- start the loop, 36,500 times 18 -- one year of data 19 20 for rec in c1 21 loop 22 if ( i / 100 = 1 ) then 23 i := 1; 24 sdate := sdate + 1; 25 end if; 26 27 insert into t1 (order_date, col1, status) 28 values 29 (sdate, rec.col1, updt_stat(sdate)); 30 31 i := i+1; -- i++ 32 end loop; 33 commit; 34 end; 35 / 36

It’s always been easy to display a histogram of text data. We’ve been doing this for decades.

1 RLOCKARD@pdev > select status, count(*) 2 from t1 3 group by status; 4 2 3 5 STATUS COUNT(*) 6 ------------------------- ---------- 7 Shipped 31284 8 Ordered 240 9 Picked 4382 10 Back Order 86 11 Canceled 508 12

But what about getting a histogram of the order date.  Oracle provides the function width_bucket that solves this problem.

Description of width_bucket.gif follows

1 RLOCKARD@pdev > select min(order_date), 2 max(order_date), 3 width_bucket(order_date, sysdate-365, sysdate+20, 12) bucket, 4 count(*) 5 from t1 6 group by width_bucket(order_date, sysdate-365, sysdate+20, 12) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(ORDER MAX(ORDER BUCKET COUNT(*) 10 --------- --------- ---------- ---------- 11 03-SEP-13 03-SEP-13 0 99 12 04-SEP-13 05-OCT-13 1 3168 13 06-OCT-13 06-NOV-13 2 3168 14 07-NOV-13 08-DEC-13 3 3168 15 09-DEC-13 09-JAN-14 4 3168 16 10-JAN-14 10-FEB-14 5 3168 17 11-FEB-14 14-MAR-14 6 3168 18 15-MAR-14 15-APR-14 7 3168 19 16-APR-14 17-MAY-14 8 3168 20 18-MAY-14 18-JUN-14 9 3168 21 19-JUN-14 20-JUL-14 10 3168 22 21-JUL-14 21-AUG-14 11 3168 23 22-AUG-14 06-SEP-14 12 1553 24 25

What about displaying numeric data.  We change the parameters of width_bucket to numeric data.

1 RLOCKARD@pdev > select min(col1), 2 max(col1), 3 width_bucket(col1, 0, 10000, 10) bucket, 4 count(*) 5 from t1 6 group by width_bucket(col1, 0, 10000, 10) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(COL1) MAX(COL1) BUCKET COUNT(*) 10 ---------- ---------- ---------- ---------- 11 .013540186 999.847819 1 3593 12 1000.32556 1999.71215 2 3619 13 2000.26786 2999.44229 3 3593 14 3000.33048 3999.69273 4 3746 15 4000.37659 4999.86446 5 3721 16 5000.02576 5999.80972 6 3565 17 6000.0222 6999.70225 7 3677 18 7000.05808 7999.67615 8 3644 19 8000.11224 8999.93555 9 3644 20 9000.31992 9999.63753 10 3698 21

Why does the Oracle cbo interest me so much?

There is a short answer and a very long answer to this.  I will try and keep the answer somewhere in the middle of short and long.

The first assumption we must accept and from my observations over the past several years, most Oracle developers don’t understand the Cost Based Optimizer, yet they use it everyday.

Way back in the dark ages, full table scans were evil and we used hints to get rid of this dreaded access path.  As you can see from the bellow query, we got a full table scan of t1

So we solve this by adding a hint.

That’s better, we think but lets look at a few things.  Did the hint really help? The first query had 70,308 consistent gets, 3,884 physical reads and cost was 1,065.  The second query where we thought we were being cleaver had 1,057,564 consistent gets, 5868 physical reads and the cost was 990K.

Well, maybe I was not so cleaver by forcing Oracle to use an index, maybe the CBO is smart enough to put together the correct plan and the full table scan is not evil after all.

A few years ago I was at ODTUG and heard Maria Colgan speaking about the CBO and I was hooked.  Maria made me realize I need to understand what the CBO was doing.  Yes sometimes the CBO gets it wrong, but there are ways to correct that.

I’m now putting together a series of presentations on understanding the CBO using the scientific approach.  Why not just publish how the CBO works? Why not publish ROT (Rule Of Thumb) for the CBO.  Well the answer to that is quite simple.  Every environment is different and it’s much better to understand your data and how the CBO works with your data and queries.  There will be a presentations that will go through how to read an explain plan, however the other presentations will teach you how to establish what the CBO is doing in your environment and you will be able to explain it.  That makes you smarter and more valuable.

Stay tuned, I will be posting blurbs and hints on the CBO.

Oracle 10053 trace viewer

There are two trace files that I use when tuning.  Cary Millsap @CaryMillsap and Method-R @MethodR have a great tool for analyzing the 10046 trace file. There is the profiler and my personal favorite mrskew http://method-r.com/store .

But the 10053 trace file is cumbersome to read and interpret.  Jonathan Lewis @JLOracle posted this 10053 trace file viewer on oaktable.net. http://www.oaktable.net/contribute/10053-viewer .

If you are an Oracle DBA, Developer or Performance Engineer, these should be in your tool box.

What about NULL

What is NULL?  Is NULL a space? Is NULL a NULL string?

NULL is unknown. If null is unknown then you can not do anything with null. You can not perform any operation on null:

Y := 5+UNKNOWN then Y=UNKNOWN
You can not compare null to anything:

IF UNKNOWN<5 THEN … END IF.

Well that will never evaluate to true because just don’t know.

This is part of what makes NULL so difficult to work with.  NULL complicates logic. So just say no to null.

Beyond the basic select statement: Using the WITH clause inline SUBQUERY

We looked at the with statement with embedded PL/SQL, now lets look at another use of the with statement. This time we are going to use it with embedded subqueries.

Query A This query searches a stocks table, returns the median closing price for a quarter, the standard deviation and plus and minus one standard deviation.

The top two arrows point to the subquerys and the bottom arrow points to referencing the subqueries.  The subqueries are named SDEV and MED, those can then be referenced in the from clause.

image

Query B shows the normal subquery we are accustom to seeing and returns the same values as Query A.

image

Query A and Query B are equivalent.

So, what execution plan did the CBO come up with for Query A

image

And what plan did the CBO come up with for Query B

image

I’ll dig into how the CBO came up with these execution plans later.

Beyond the basic select statement: Using the WITH clause inline PL/SQL

For years I used the basic select statement:

SELECT A.C1, A.C2, B.C1, B.C2, <…> FROM TAB1 A, TAB2 B WHERE A.C1 = B.C1;

Sometimes this basic select statement would not answer my question. You can make a select statement quite complicated, but then it gets unreadable and difficult to maintain. The WITH clause helps out quite a bit

A

B

WITH
FUNCTION monthly_amt(pamt number)
RETURN number IS
x number;
   BEGIN
     x := pamt*2;
     RETURN x;
   END;
SELECT pay22, monthly_amt(pay22)
  FROM policys;

create or replace function
  monthly_amt (pamt number) return number is
  x number;
BEGIN
  x := pamt*2;
  return x;
END;
/
select pay22, monthly_amt(pay22)
from policys;

We are going to start this discussion with the assumption that Query A and Query B are equivalent with caveats. The rules associated with using an inline function is the inline function will always take precedence over a stored function.

The results of Query A returned pay22 and the result of the inline function monthly_amt.

p1

Query B Returns pay22 and the results of the stored function monthly_amt.

p2

But what if the stored function returns a different value the argument * 2?

First we will redefine the stored function monthly_amt to return argument * 10. When using WITH to include an inline function, the inline function will always take precedence over a stored function with the same name.

p3

Now we will call the stored function monthly_amt and not making any reference to an inline function. As expected the stored function returns argument * 10.

p4

How would we use this? PL/SQL supports overloading functions; however both the stored function and the inline function have the same name and use the same arguments. I see using the inline function when: A) creating a stored function is not an option. I worked in one shop a while back where it was against policy to store PL/SQL in the database. B) when you have to overload a stored function. C) When you need / want to read the PL/SQL being executed.

Exploring the CBO

Lets start with a simple experiment on what the CBO can do with query optimization. We are going to build on the structure as we go to see if we can help out the CBO with more information.

Test 1) we will start with two tables bigtable and owners and see what plan the CBO comes up with.

Test 2) we will analyze the tables and see what plan the CBO comes up with.

Test 3) we will add primary key and foreign key constraints and see what plan the CBO comes up with.

Test 4) we will add an index on the foreign key constraint and see what plan the CBO comes up with.

 

We know that these two queries are equivalent. So, can the CBO get from query A to query B

A

B

SELECT COUNT(*)

FROM OWNERS O,

BIGTABLE BT

WHERE O.USERNAME = BT.OWNER

and o.username like ‘RL%’;

SELECT COUNT(*)

FROM BIGTABLE BT

WHERE BT.OWNER LIKE ‘RL%’;

 

 

The first pass we are going to use bigtable and owners with no statistics, no indexes and no primary key foreign key restraints.

drop table bigtable;

drop table owners;

create table bigtable as select * from dba_objects;

create table owners as select * from dba_users;

 

alter session set tracefile_identifier = owi_trace;

— This line will generate a trace file of what the CBO did.

alter session set events ’10053 trace name context forever, level 1′;

 

 

SELECT COUNT(*)

FROM OWNERS O,

BIGTABLE BT

WHERE O.USERNAME = BT.OWNER

and o.username like ‘RL%’;

 

What did the CBO come up with?

 

============

Plan Table

============

—————————————+———————————–+

| Id | Operation | Name | Rows | Bytes | Cost | Time |

—————————————+———————————–+

| 0 | SELECT STATEMENT | | | | 437 | |

| 1 | SORT AGGREGATE | | 1 | 15 | | |

| 2 | HASH JOIN | | 62 | 930 | 437 | 00:00:06 |

| 3 | TABLE ACCESS FULL | OWNERS | 1 | 9 | 3 | 00:00:01 |

| 4 | TABLE ACCESS FULL | BIGTABLE| 2398 | 14K | 434 | 00:00:06 |

—————————————+———————————–+

Predicate Information:

———————-

2 – access(“O”.”USERNAME”=”BT”.”OWNER”)

3 – filter(“O”.”USERNAME” LIKE ‘RL%’)

4 – filter(“BT”.”OWNER” LIKE ‘RL%’)

This is intresting, the CBO decided to apply two filters.

 

Well this is not the best execution plan, so lets see what happens when we analyze the tables.

 

analyze table bigtable compute statistics;

analyze table owners compute statistics;

 

alter session set tracefile_identifier = owi_trace;

alter session set events ’10053 trace name context forever, level 1′;

 

 

SELECT COUNT(*)

FROM OWNERS O,

BIGTABLE BT

WHERE O.USERNAME = BT.OWNER

and o.username like ‘RL%’;

============

Plan Table

============

—————————————+———————————–+

| Id | Operation | Name | Rows | Bytes | Cost | Time |

—————————————+———————————–+

| 0 | SELECT STATEMENT | | | | 437 | |

| 1 | SORT AGGREGATE | | 1 | 13 | | |

| 2 | HASH JOIN | | 62 | 806 | 437 | 00:00:06 |

| 3 | TABLE ACCESS FULL | OWNERS | 1 | 8 | 3 | 00:00:01 |

| 4 | TABLE ACCESS FULL | BIGTABLE| 2398 | 12K | 434 | 00:00:06 |

—————————————+———————————–+

Predicate Information:

———————-

2 – access(“O”.”USERNAME”=”BT”.”OWNER”)

3 – filter(“O”.”USERNAME” LIKE ‘RL%’)

4 – filter(“BT”.”OWNER” LIKE ‘RL%’)

 

That was not much help. Lets see what happens when we add in primary key, foreign key restraints.

 

alter table bigtable add constraint bigtable_pk primary key (object_id);

alter table owners add constraint owners_pk primary key (username);

alter table bigtable add constraint bigtable_fk foreign key (owner) references owners(username);

 

alter session set tracefile_identifier = owi_trace;

alter session set events ’10053 trace name context forever, level 1′;

 

 

SELECT COUNT(*)

FROM OWNERS O,

BIGTABLE BT

WHERE O.USERNAME = BT.OWNER

and o.username like ‘RL%’;

============

Plan Table

============

————————————–+———————————–+

| Id | Operation | Name | Rows | Bytes | Cost | Time |

————————————–+———————————–+

| 0 | SELECT STATEMENT | | | | 434 | |

| 1 | SORT AGGREGATE | | 1 | 5 | | |

| 2 | TABLE ACCESS FULL | BIGTABLE| 2398 | 12K | 434 | 00:00:06 |

————————————–+———————————–+

————————————–+———————————–+

Predicate Information:

———————-

2 – filter((“BT”.”OWNER” LIKE ‘RL%’ AND “BT”.”OWNER” IS NOT NULL))

Well this is better. By adding the foreign key constraint, the CBO eliminated the owners table. But we can do better. Let’s add a index on bigtable.owner and see what happens.

 

create index bigtable_idx1 on bigtable(owner);

analyze index bigtable_idx1 compute statistics;

alter session set tracefile_identifier = owi_trace;

alter session set events ’10053 trace name context forever, level 1′;

 

 

SELECT COUNT(*)

FROM OWNERS O,

BIGTABLE BT

WHERE O.USERNAME = BT.OWNER

and o.username like ‘RL%’;

============
Plan Table
============
------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------+-----------------------------------+
| 0 | SELECT STATEMENT | | | | 7 | |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | INDEX RANGE SCAN | BIGTABLE_IDX1| 2398 | 12K | 7 | 00:00:01 |
------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("BT"."OWNER" LIKE 'RL%')
2 - filter(("BT"."OWNER" LIKE 'RL%' AND "BT"."OWNER" IS NOT NULL))

 

Now lets see what the CBO did to get here.
SQL:******* UNPARSED QUERY IS *******

SELECT COUNT(*) “COUNT(*)” FROM “OPS$ORACLE”.”OWNERS” “O”,”OPS$ORACLE”.”BIGTABLE” “BT” WHERE “BT”.”OWNER”=”O”.”USERNAME” AND “O”.”USERNAME” LIKE ‘RL%’

JE: eliminate table: OWNERS (O)

JE: Replaced column: O.USERNAME with column: BT.OWNER

Registered qb: SEL$84E079A4 0xadec2578 (JOIN REMOVED FROM QUERY BLOCK SEL$1; SEL$1; “O”@”SEL$1″)

**************************

Predicate Move-Around (PM)

**************************

PM: PM bypassed: Outer query contains no views.

PM: PM bypassed: Outer query contains no views.

query block SEL$84E079A4 (#0) unchanged

FPD: Considering simple filter push in query block SEL$84E079A4 (#0)

“BT”.”OWNER” IS NOT NULL AND “BT”.”OWNER” LIKE ‘RL%’

try to generate transitive predicate from check constraints for query block SEL$84E079A4 (#0)

finally: “BT”.”OWNER” IS NOT NULL AND “BT”.”OWNER” LIKE ‘RL%’

Final query after transformations:******* UNPARSED QUERY IS *******

SELECT COUNT(*) “COUNT(*)” FROM “OPS$ORACLE”.”BIGTABLE” “BT” WHERE “BT”.”OWNER” IS NOT NULL AND “BT”.”OWNER” LIKE ‘RL%’

Well that’s special. The CBO came up with a query that is close to query B.

 

B

C

SELECT COUNT(*)

FROM BIGTABLE BT

WHERE BT.OWNER ‘RL%’;

SELECT COUNT(*) “COUNT(*)” FROM “OPS$ORACLE”.”BIGTABLE” “BT” WHERE “BT”.”OWNER” IS NOT NULL AND “BT”.”OWNER” LIKE ‘RL%’
The only real difference between query B and query C is the CBO added bt.owner is not null. This is curious because if we are looking for where owner like ‘RL%’ then by definition owner would be not null.
What did we learn?

1) When we only have tables with no supporting structures the CBO selected full table scans on the two tables then did a hash join.

2) When we analyzed the tables to get good statistics, the CBO still did full table scans and a hash join.

3) When we added in the primary key foreign key constraints, the CBO knew that if bigtable.owner existed then there was a matching row in owners.username. Because of this, the CBO was able to eliminate the join condition bt.owner = o.username and eliminate the owners table from the query. This got us a full table scan on bigtable.

4) When we added the index on bigtable.owner the CBO decided an index range scan on bigtable_idx1 would return the correct results. This is because all the information needed to satisfy the query.

 

Oracle Database, Oracle Tools, Database Security, and other stuff that happens

%d bloggers like this: