displaying histograms of date and number data

It started with a bit of pl/sql in an ole’ library of mine. Knowing the old code was not the best way to display histogram data and If you can do something in one sql statement then that’s how you should do it.

Frequently when walking into a new environment we need to learn about the data.  The cardinality along with the clustering of the data has a big impact on the CBO, so understanding your data is critical to optimizing the performance of your database.

First I need to load up a table with some test data.  updt_stat simply takes the order date and returns a status.  This is to simulate and order entry system.

1 create or replace function updt_stat (pdate in date) return varchar2 as 2 value varchar2(10); 3 tmp number; 4 begin 5 tmp := dbms_random.value(0,10); 6 case 7 when pdate > sysdate - 50 and tmp > 9 8 then value := 'Canceled'; 9 when pdate <= sysdate - 50 10 then value := 'Shipped'; 11 when pdate >= sysdate - 50 and pdate < sysdate - 1 12 then value := 'Picked'; 13 when pdate >= sysdate 14 then value := 'Ordered'; 15 else 16 value := 'Back Order'; 17 end case; 18 return value; 19 end; 20 / 21

Then lets create a test table.

1 create table t1 ( 2 id number primary key, 3 order_date date not null, 4 col1 number not null, 5 status varchar2(25) not null); 6 7 alter table t1 modify id default t1_seq.nextval;

Now populate the table with some test data.

1 declare 2 --create or replace procedure simulate_order_entry as 3 4 i number; -- an index variable 5 sdate date; -- the date we are going to insert. 6 7 cursor c1 is 8 select dbms_random.value(0,10000) col1 9 from dual 10 connect by level <=36500; 11 begin 12 13 -- one year of data 14 select sysdate-365 into sdate from dual; 15 i := 1; 16 17 -- start the loop, 36,500 times 18 -- one year of data 19 20 for rec in c1 21 loop 22 if ( i / 100 = 1 ) then 23 i := 1; 24 sdate := sdate + 1; 25 end if; 26 27 insert into t1 (order_date, col1, status) 28 values 29 (sdate, rec.col1, updt_stat(sdate)); 30 31 i := i+1; -- i++ 32 end loop; 33 commit; 34 end; 35 / 36

It’s always been easy to display a histogram of text data. We’ve been doing this for decades.

1 RLOCKARD@pdev > select status, count(*) 2 from t1 3 group by status; 4 2 3 5 STATUS COUNT(*) 6 ------------------------- ---------- 7 Shipped 31284 8 Ordered 240 9 Picked 4382 10 Back Order 86 11 Canceled 508 12

But what about getting a histogram of the order date.  Oracle provides the function width_bucket that solves this problem.

Description of width_bucket.gif follows

1 RLOCKARD@pdev > select min(order_date), 2 max(order_date), 3 width_bucket(order_date, sysdate-365, sysdate+20, 12) bucket, 4 count(*) 5 from t1 6 group by width_bucket(order_date, sysdate-365, sysdate+20, 12) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(ORDER MAX(ORDER BUCKET COUNT(*) 10 --------- --------- ---------- ---------- 11 03-SEP-13 03-SEP-13 0 99 12 04-SEP-13 05-OCT-13 1 3168 13 06-OCT-13 06-NOV-13 2 3168 14 07-NOV-13 08-DEC-13 3 3168 15 09-DEC-13 09-JAN-14 4 3168 16 10-JAN-14 10-FEB-14 5 3168 17 11-FEB-14 14-MAR-14 6 3168 18 15-MAR-14 15-APR-14 7 3168 19 16-APR-14 17-MAY-14 8 3168 20 18-MAY-14 18-JUN-14 9 3168 21 19-JUN-14 20-JUL-14 10 3168 22 21-JUL-14 21-AUG-14 11 3168 23 22-AUG-14 06-SEP-14 12 1553 24 25

What about displaying numeric data.  We change the parameters of width_bucket to numeric data.

1 RLOCKARD@pdev > select min(col1), 2 max(col1), 3 width_bucket(col1, 0, 10000, 10) bucket, 4 count(*) 5 from t1 6 group by width_bucket(col1, 0, 10000, 10) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(COL1) MAX(COL1) BUCKET COUNT(*) 10 ---------- ---------- ---------- ---------- 11 .013540186 999.847819 1 3593 12 1000.32556 1999.71215 2 3619 13 2000.26786 2999.44229 3 3593 14 3000.33048 3999.69273 4 3746 15 4000.37659 4999.86446 5 3721 16 5000.02576 5999.80972 6 3565 17 6000.0222 6999.70225 7 3677 18 7000.05808 7999.67615 8 3644 19 8000.11224 8999.93555 9 3644 20 9000.31992 9999.63753 10 3698 21