#Oracle #In-Memory

I started to play with in-memory to see what performance improvements I can get.  I set up four tables, zip, customers, products and sales and executed a simple queries against them.

Sales has 3M rows, and customers has 110K rows, zip 4271 rows and products 6 rows. 

SELECT p.name, sum(s.price),count(*)
FROM customers c,
       products p,
       sales s,
       zip_codes z
WHERE c.ID = s.cust_id
    AND p.ID = s.product_id
    AND c.zip = z.zip
GROUP BY p.NAME;

Run 1) using a default install of Oracle 12.1.0.2.  Execution time 17.21 seconds

1 SQL> sho parameter inmemory 2 3 NAME TYPE VALUE 4 ------------------------------------ ----------- ------------------------------ 5 inmemory_clause_default string 6 inmemory_force string DEFAULT 7 inmemory_max_populate_servers integer 0 8 inmemory_query string ENABLE 9 inmemory_size big integer 0 10 inmemory_trickle_repopulate_servers_ integer 1 11 percent 12 optimizer_inmemory_aware boolean TRUE 13 14 15 SQL> l 16 1 SELECT p.name, sum(s.price),count(*) 17 2 FROM customers c, 18 3 products p, 19 4 sales s, 20 5 zip_codes z 21 6 WHERE c.ID = s.cust_id 22 7 AND p.ID = s.product_id 23 8 AND c.zip = z.zip 24 9* GROUP BY p.NAME 25 SQL> / 26 27 Elapsed: 00:00:17.21 28 29 Execution Plan 30 ---------------------------------------------------------- 31 Plan hash value: 1738379704 32 33 ---------------------------------------------------------------------------------------------- 34 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 35 ---------------------------------------------------------------------------------------------- 36 | 0 | SELECT STATEMENT | | 1 | 47 | 3118 (1)| 00:00:01 | 37 | 1 | HASH GROUP BY | | 1 | 47 | 3118 (1)| 00:00:01 | 38 | 2 | NESTED LOOPS | | 1 | 47 | 3117 (1)| 00:00:01 | 39 | 3 | NESTED LOOPS | | 1 | 47 | 3117 (1)| 00:00:01 | 40 |* 4 | HASH JOIN | | 1 | 29 | 3116 (1)| 00:00:01 | 41 |* 5 | HASH JOIN | | 1 | 16 | 353 (1)| 00:00:01 | 42 | 6 | TABLE ACCESS FULL | ZIP | 4271 | 21355 | 11 (0)| 00:00:01 | 43 | 7 | TABLE ACCESS FULL | CUSTOMERS | 110K| 1181K| 342 (1)| 00:00:01 | 44 | 8 | TABLE ACCESS FULL | SALES | 3142K| 38M| 2755 (1)| 00:00:01 | 45 |* 9 | INDEX UNIQUE SCAN | SYS_C0011426 | 1 | | 0 (0)| 00:00:01 | 46 | 10 | TABLE ACCESS BY INDEX ROWID| PRODUCTS | 1 | 18 | 1 (0)| 00:00:01 | 47 ---------------------------------------------------------------------------------------------- 48 49 Predicate Information (identified by operation id): 50 --------------------------------------------------- 51 52 4 - access("C"."ID"="S"."CUST_ID") 53 5 - access("C"."ZIP"="Z"."ZIP") 54 9 - access("P"."ID"="S"."PRODUCT_ID") 55 56 57 Statistics 58 ---------------------------------------------------------- 59 2313 recursive calls 60 0 db block gets 61 3155556 consistent gets 62 16351 physical reads 63 34844 redo size 64 882 bytes sent via SQL*Net to client 65 551 bytes received via SQL*Net from client 66 2 SQL*Net roundtrips to/from client 67 179 sorts (memory) 68 0 sorts (disk) 69 5 rows processed

So we turn on In-Memory by setting inmemory_size to a value > 100M. (100M is the smallest value you can use)  We then add the sales and customers table to the In-Memory column store using the alter table <tablename> inmemory statement. Once I made the change I did a stop / start of the instance and executed the query again. Oh, I did one more thing to load the column store, select * from sales; and select * from customers;  If you want to load the column store when the database instance starts up then use the alter statement alter table <tablename> inmemory priority critical; (there are five levels of priority critical, high, medium, low and none. (none is the default)

Run 2) Execution time 10.73 seconds.  Yea, that’s all I did and it’s that simple.  Now imagine the performance improvement if I actually tried!

1 SQL> alter system set INMEMORY_SIZE=500M scope=spfile; 2 3 System altered. 4 5 SQL> alter table sales inmemory; 6 7 Table altered. 8 9 SQL> alter table customers inmemory; 10 11 Table altered. 12 13 SQL> 14

1 SQL> sho parameter inmemory 2 3 NAME TYPE VALUE 4 ------------------------------------ ----------- ------------------------------ 5 inmemory_clause_default string 6 inmemory_force string DEFAULT 7 inmemory_max_populate_servers integer 1 8 inmemory_query string ENABLE 9 inmemory_size big integer 500M 10 inmemory_trickle_repopulate_servers_ integer 1 11 percent 12 optimizer_inmemory_aware boolean TRUE 13 SQL> 14 15 16 SQL> set timing on 17 SQL> SELECT p.name, sum(s.price),count(*) 18 FROM customers c, 19 products p, 20 sales s, 21 zip z 22 WHERE c.ID = s.cust_id 23 AND p.ID = s.product_id 24 AND c.zip = z.zip 25 GROUP BY p.NAME 26 / 27 2 3 4 5 6 7 8 9 10 28 29 Elapsed: 00:00:10.73 30 31 Execution Plan 32 ---------------------------------------------------------- 33 Plan hash value: 1408061610 34 35 ------------------------------------------------------------------------------------------- 36 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 37 ------------------------------------------------------------------------------------------- 38 | 0 | SELECT STATEMENT | | 1 | 47 | 215 (17)| 00:00:01 | 39 | 1 | HASH GROUP BY | | 1 | 47 | 215 (17)| 00:00:01 | 40 |* 2 | HASH JOIN | | 1 | 47 | 214 (17)| 00:00:01 | 41 |* 3 | HASH JOIN | | 1 | 34 | 76 (2)| 00:00:01 | 42 | 4 | MERGE JOIN CARTESIAN | | 25626 | 575K| 62 (0)| 00:00:01 | 43 | 5 | TABLE ACCESS FULL | PRODUCTS | 6 | 108 | 3 (0)| 00:00:01 | 44 | 6 | BUFFER SORT | | 4271 | 21355 | 59 (0)| 00:00:01 | 45 | 7 | TABLE ACCESS FULL | ZIP | 4271 | 21355 | 10 (0)| 00:00:01 | 46 | 8 | TABLE ACCESS INMEMORY FULL| CUSTOMERS | 110K| 1181K| 14 (8)| 00:00:01 | 47 | 9 | TABLE ACCESS INMEMORY FULL | SALES | 3142K| 38M| 130 (20)| 00:00:01 | 48 ------------------------------------------------------------------------------------------- 49 50 Predicate Information (identified by operation id): 51 --------------------------------------------------- 52 53 2 - access("C"."ID"="S"."CUST_ID" AND "P"."ID"="S"."PRODUCT_ID") 54 3 - access("C"."ZIP"="Z"."ZIP") 55 56 Note 57 ----- 58 - dynamic statistics used: dynamic sampling (level=2) 59 - 1 Sql Plan Directive used for this statement 60 61 62 Statistics 63 ---------------------------------------------------------- 64 3894 recursive calls 65 0 db block gets 66 9231 consistent gets 67 14236 physical reads 68 0 redo size 69 882 bytes sent via SQL*Net to client 70 551 bytes received via SQL*Net from client 71 2 SQL*Net roundtrips to/from client 72 526 sorts (memory) 73 0 sorts (disk) 74 5 rows processed 75 76 SQL>

There will be more to come, but if you want to read more about Oracle in-memory there is an excellent white paper at http://www.oracle.com/technetwork/database/in-memory/overview/twp-oracle-database-in-memory-2245633.html

Okay, I looked at the memory usage and the size of my tables I loaded into the column store and made a change.  The customers table is over 1G, therefore I changed the inmemory_size to 2G.  Let’s see what that does for my performance.

1 SQL> alter system set inmemory_size=2G scope=spfile; 2 3 System altered. 4 5 SQL> shutdown immediate 6 Database closed. 7 Database dismounted. 8 ORACLE instance shut down. 9 SQL> startup 10 ORACLE instance started. 11 12 Total System Global Area 2449473536 bytes 13 Fixed Size 2948320 bytes 14 Variable Size 243270432 bytes 15 Database Buffers 50331648 bytes 16 Redo Buffers 5439488 bytes 17 In-Memory Area 2147483648 bytes 18 Database mounted. 19 Database opened. 20 SQL> 21

Okay, 6.02 seconds.  Nice try but I think can do better.  Physical reads went from  16,351 without in-memory to 14,236 with 500M used for in-memory and down to 1,084 when I sized in-memory appropriately.

1 SQL> @qry1 2 3 NAME SUM(S.PRICE) COUNT(*) 4 ----------------------------------- ------------ ---------- 5 Posidon Odin 527301055 627150 6 Scuba Pro G500 Regulator 453549388 628716 7 Stop Stink 1874278.96 627888 8 Scuba Fins 75210608.1 629888 9 Scuba Pro Mask 46894525.5 628395 10 11 Elapsed: 00:00:06.02 12 13 Execution Plan 14 ---------------------------------------------------------- 15 Plan hash value: 1408061610 16 17 ------------------------------------------------------------------------------------------- 18 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 19 ------------------------------------------------------------------------------------------- 20 | 0 | SELECT STATEMENT | | 1 | 47 | 215 (17)| 00:00:01 | 21 | 1 | HASH GROUP BY | | 1 | 47 | 215 (17)| 00:00:01 | 22 |* 2 | HASH JOIN | | 1 | 47 | 214 (17)| 00:00:01 | 23 |* 3 | HASH JOIN | | 1 | 34 | 76 (2)| 00:00:01 | 24 | 4 | MERGE JOIN CARTESIAN | | 25626 | 575K| 62 (0)| 00:00:01 | 25 | 5 | TABLE ACCESS FULL | PRODUCTS | 6 | 108 | 3 (0)| 00:00:01 | 26 | 6 | BUFFER SORT | | 4271 | 21355 | 59 (0)| 00:00:01 | 27 | 7 | TABLE ACCESS FULL | ZIP | 4271 | 21355 | 10 (0)| 00:00:01 | 28 | 8 | TABLE ACCESS INMEMORY FULL| CUSTOMERS | 110K| 1181K| 14 (8)| 00:00:01 | 29 | 9 | TABLE ACCESS INMEMORY FULL | SALES | 3142K| 38M| 130 (20)| 00:00:01 | 30 ------------------------------------------------------------------------------------------- 31 32 Predicate Information (identified by operation id): 33 --------------------------------------------------- 34 35 2 - access("C"."ID"="S"."CUST_ID" AND "P"."ID"="S"."PRODUCT_ID") 36 3 - access("C"."ZIP"="Z"."ZIP") 37 38 Note 39 ----- 40 - dynamic statistics used: dynamic sampling (level=2) 41 - 1 Sql Plan Directive used for this statement 42 43 44 Statistics 45 ---------------------------------------------------------- 46 3793 recursive calls 47 0 db block gets 48 9239 consistent gets 49 1084 physical reads 50 0 redo size 51 882 bytes sent via SQL*Net to client 52 551 bytes received via SQL*Net from client 53 2 SQL*Net roundtrips to/from client 54 491 sorts (memory) 55 0 sorts (disk) 56 5 rows processed 57 58 SQL> 59

Okay, I decided to do a little better.  I added three indexes to support the foreign key relationships and see what that does.  3.79 seconds, 44 physical reads.  My work here is done.

1 SQL> create index sales_cust_id on sales(cust_id); 2 3 Index created. 4 5 Elapsed: 00:00:17.77 6 SQL> create index product_id on sales(product_id); 7 8 Index created. 9 10 1* create index customers_zip on customers(zip) 11 SQL> / 12 13 Index created. 14 15 Elapsed: 00:00:00.58 16 17 Elapsed: 00:00:24.23 18 SQL> exec dbms_stats.gather_schema_stats(user); 19 20 PL/SQL procedure successfully completed. 21 22 Elapsed: 00:00:22.57 23 24 SQL> @qry1 25 26 NAME SUM(S.PRICE) COUNT(*) 27 ----------------------------------- ------------ ---------- 28 Posidon Odin 527301055 627150 29 Scuba Pro G500 Regulator 453549388 628716 30 Scuba Fins 75210608.1 629888 31 Scuba Pro Mask 46894525.5 628395 32 Stop Stink 1874278.96 627888 33 34 Elapsed: 00:00:03.79 35 36 Execution Plan 37 ---------------------------------------------------------- 38 Plan hash value: 463445694 39 40 ---------------------------------------------------------------------------------------------------------- 41 | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | 42 ---------------------------------------------------------------------------------------------------------- 43 | 0 | SELECT STATEMENT | | 5 | 235 | | 4081 (4)| 00:00:01 | 44 | 1 | HASH GROUP BY | | 5 | 235 | | 4081 (4)| 00:00:01 | 45 | 2 | MERGE JOIN | | 5 | 235 | | 4080 (4)| 00:00:01 | 46 | 3 | TABLE ACCESS BY INDEX ROWID | PRODUCTS | 6 | 108 | | 2 (0)| 00:00:01 | 47 | 4 | INDEX FULL SCAN | SYS_C0011426 | 6 | | | 1 (0)| 00:00:01 | 48 |* 5 | SORT JOIN | | 5 | 145 | | 4078 (4)| 00:00:01 | 49 | 6 | VIEW | VW_GBC_13 | 5 | 145 | | 4077 (4)| 00:00:01 | 50 | 7 | HASH GROUP BY | | 5 | 140 | | 4077 (4)| 00:00:01 | 51 |* 8 | HASH JOIN | | 3102K| 82M| | 4000 (2)| 00:00:01 | 52 | 9 | INDEX FAST FULL SCAN | ZIP_ZIP | 4271 | 21355 | | 4 (0)| 00:00:01 | 53 |* 10 | HASH JOIN | | 3102K| 68M| 2368K| 3988 (2)| 00:00:01 | 54 | 11 | TABLE ACCESS INMEMORY FULL| CUSTOMERS | 110K| 1074K| | 14 (8)| 00:00:01 | 55 | 12 | TABLE ACCESS INMEMORY FULL| SALES | 3142K| 38M| | 130 (20)| 00:00:01 | 56 ---------------------------------------------------------------------------------------------------------- 57 58 Predicate Information (identified by operation id): 59 --------------------------------------------------- 60 61 5 - access("P"."ID"="ITEM_1") 62 filter("P"."ID"="ITEM_1") 63 8 - access("C"."ZIP"="Z"."ZIP") 64 10 - access("C"."ID"="S"."CUST_ID") 65 66 Note 67 ----- 68 - dynamic statistics used: dynamic sampling (level=2) 69 - 1 Sql Plan Directive used for this statement 70 71 72 Statistics 73 ---------------------------------------------------------- 74 139 recursive calls 75 0 db block gets 76 189 consistent gets 77 44 physical reads 78 0 redo size 79 882 bytes sent via SQL*Net to client 80 551 bytes received via SQL*Net from client 81 2 SQL*Net roundtrips to/from client 82 13 sorts (memory) 83 0 sorts (disk) 84 5 rows processed 85 86 SQL> 87

#infosec #Oracle data #redaction

I had a rather disturbing conversion with my wife a couple nights ago, she had called a service provider to discuss a bill.  The customer service rep asked her for her social security number to verify her identity.  When my wife told the customer service rep she was not comfortable giving out that information, the customer service rep told her that he has her full social security number on the screen in front of him. WTF. This is a large corporation, and I’m wondering if they are ignoring the problem until there is a leak and they get embarrassed because after all it cost money to protect data. 

Do you store sensitive information? Does you’re user base have access to that information?  Does you’re user base need to see that sensitive information? I have always operated on the principle of “need to know.”  This simply states, does a person need to know something in order to do their job?  If the answer to that is no, then don’t give them the information.

In the customers table we are storing two very sensitive pieces of information, SSN and Credit Card Number.  Now sit back and think, what would happen to your company / job / reputation if this information were to leak? 

1 SQL> desc customers 2 Name Null? Type 3 ----------------------------------------- -------- ---------------------------- 4 ID NOT NULL NUMBER 5 SSN VARCHAR2(25) 6 FNAME VARCHAR2(55) 7 LNAME VARCHAR2(55) 8 UPDATE_DATE DATE 9 CREDIT_CARD_NBR VARCHAR2(19) 10 CREDIT_LIMIT NUMBER 11 12

Before we start, lets populate customers.ssn and customers.credit_card_nbr with random data just to make sure there is no live data is involved in this demonstration.

1 SQL> @redaction_demo 2 SQL> conn rlockard@pdev 3 Enter password: 4 Connected. 5 SQL> 6 SQL> update customers set ssn = 7 2 ceil(dbms_random.value(0,9)) || 8 3 ceil(dbms_random.value(0,9)) || 9 4 ceil(dbms_random.value(0,9)) || '-' || 10 5 ceil(dbms_random.value(0,9)) || 11 6 ceil(dbms_random.value(0,9)) || '-' || 12 7 ceil(dbms_random.value(0,9)) || 13 8 ceil(dbms_random.value(0,9)) || 14 9 ceil(dbms_random.value(0,9)) || 15 10 ceil(dbms_random.value(0,9)), 16 11 credit_card_nbr = 17 12 ceil(dbms_random.value(0,9)) || 18 13 ceil(dbms_random.value(0,9)) || 19 14 ceil(dbms_random.value(0,9)) || 20 15 ceil(dbms_random.value(0,9)) || ' ' || 21 16 ceil(dbms_random.value(0,9)) || 22 17 ceil(dbms_random.value(0,9)) || 23 18 ceil(dbms_random.value(0,9)) || 24 19 ceil(dbms_random.value(0,9)) || ' ' || 25 20 ceil(dbms_random.value(0,9)) || 26 21 ceil(dbms_random.value(0,9)) || 27 22 ceil(dbms_random.value(0,9)) || 28 23 ceil(dbms_random.value(0,9)) || ' ' || 29 24 ceil(dbms_random.value(0,9)) || 30 25 ceil(dbms_random.value(0,9)) || 31 26 ceil(dbms_random.value(0,9)) || 32 27 ceil(dbms_random.value(0,9)); 33 34 1666 rows updated. 35 36 SQL> 37 SQL> commit; 38

So lets connect at app_user and see what we can see.  Umm, I don’t like that.  Imagine the damage that could be caused if a less then honest employee went on a shopping spree.  I know of once such case where a customer service rep had access to information like this. When the theft was discovered, we estimate over $250,000 was stolen over a six month period.

1 SQL> conn app_user/app_user@pdev 2 Connected. 3 SQL> select ssn, credit_card_nbr from rlockard.customers; 4 /* snip a lot */ 5 428-72-3919 2671 6148 1798 1588 6 251-71-5615 7925 4145 5815 3778 7 345-69-2348 9457 1637 2244 2527 8 967-55-1843 7474 9595 9119 7212 9 969-21-1324 4642 3219 4458 6715 10

What if we could create a policy that only let the user see the last four digits of the ssn and the last for digits of the credit card number?  We use the dbms_redact.add_policy to a table column.  in this case we are redacting rlockard.customers.ssn and giving it a policy name of cust_sensitive.  There is also the expression; expression when evaluated to true the policy will be used. We are also passing function_type that is dbms_redact.partial and function_parameters.  Function parameters uses the predefined constant dbms_redact.redact_us_ssn_f5.  This redacts the first five characters of the social security number.

1 SQL> conn rlockard@pdev 2 Enter password: 3 Connected. 4 SQL> 5 SQL> declare 6 2 begin 7 3 dbms_redact.add_policy ( 8 4 object_schema => 'RLOCKARD', 9 5 object_name => 'CUSTOMERS', 10 6 policy_name => 'CUST_SENSITIVE', 11 7 expression => '1=1', 12 8 column_name => 'SSN', 13 9 function_type => dbms_redact.partial, 14 10 function_parameters => DBMS_REDACT.REDACT_US_SSN_F5 15 11 ); 16 12 end; 17 13 / 18 19 PL/SQL procedure successfully completed. 20

Okay that takes care of SSN, and oh by the way “expression => ‘1=1’ is not required on Oracle 12C.  function_parameters uses the constant constant ‘DBMS_REDACT.REDACT_US_SSN_F5 that redacts the first five characters of the SSN.

Now lets take care of credit_card_number.  In this case because we have already added the cust_sensitive policy we are going to alter the policy to add a column (this is the action parameter.)  We then add credit_card_nbr and use the predefined constant dbms_redact.redact_cc16_f12.

1 SQL> 2 SQL> BEGIN 3 2 DBMS_REDACT.ALTER_POLICY ( 4 3 object_schema => 'RLOCKARD', 5 4 object_name => 'CUSTOMERS', 6 5 policy_name => 'CUST_SENSITIVE', 7 6 column_name => 'CREDIT_CARD_NBR', 8 7 action => DBMS_REDACT.ADD_COLUMN, 9 8 function_type => DBMS_REDACT.PARTIAL, 10 9 function_parameters => DBMS_REDACT.REDACT_CCN16_F12, 11 10 expression => '1=1' 12 11 ); 13 12 END; 14 13 / 15 16 PL/SQL procedure successfully completed. 17

Well Oracle also provides us with the constant DBMS_REDACT.REDACT_CCN16_F12 that redacts the first twelve characters of a sixteen digit credit card number.  So what does this look like.

1 SQL> 2 SQL> conn app_user/app_user@pdev 3 Connected. 4 select ssn, credit_card_nbr from rlockard.customers; 5 /* snip a lot */ 6 XXX-XX-1979 ****-****-****-8336 7 XXX-XX-4224 ****-****-****-2794 8 XXX-XX-3475 ****-****-****-2386 9 XXX-XX-8333 ****-****-****-5569 10 11 SSN CREDIT_CARD_NBR 12 ------------------------- ------------------- 13 XXX-XX-6396 ****-****-****-1936 14 XXX-XX-8118 ****-****-****-7557 15 XXX-XX-8122 ****-****-****-9645 16 XXX-XX-3169 ****-****-****-2112 17 XXX-XX-2722 ****-****-****-1597

Well we have connected as app_user, and the data is now redacted to a point where it is semi-secure. I would not be comfortable with this being public, but it’s a big improvement over displaying the entire social security number and credit card number.

#Infosec

I recently have been receiving emails for consulting work.  This is a good thing because I’m back on the market.  However; some of the emails I have been receiving are asking for PII.  This information can be used for identity theft.

Here’s an example: Asking for my date of birth and the last four digits of my Social Security Number.  There is no good reason for any head hunter to be asking for that information.  If you receive a query for a job asking for this information, hit the delete button.

InfoSec

Insert Performance analysis part 2

I promised I would have more today.  Here is the raw data from my analysis.  As you can see as rejection (dup_val_on_index) increased, the execution time for select + insert decreased and execution time increased for insert with exception.

This specific use case is a transactional system requiring  a unique index on ins_demo.column1 and we expected 20% of the time the unique index would be violated.

Here the two insert functions we are testing for performance.  They are really quite simple.  The first one is doing an insert into ins_demo and if there is a dup_val_on_index then return false. 

The second function does a select against ins_demo on the unique key.  If the key is not found then insert the value and return true, if the key is found then return false.

1 Begin 2 insert into ins_demo values (p_value); 3 return true; 4 exception when dup_val_on_index then 5 return false; 6 end;

1 begin 2 select count(*) 3 into dummy 4 from ins_demo 5 where column1 = p_value; 6 7 if dummy = 0 then 8 insert into ins_demo values (p_value); 9 return true; 10 else 11 return false; 12 end if; 13 end;

Here are the performance number for the two functions. I wrote a driver function that called the two functions 100,000 times and each once for a specific reject rate.

image

This bubble chart shows execution time on the y axis, rejection rate on the x axis and bubble size is the insert cpu time. 

image

We should all strive to analyze changes prior to making recommendations.  If I had gone with my knee jerk reaction to the select+insert and insisted on insert with an exception, system performance would have suffered.

This weekend I will run the same experiment using the merge command.  I expect the merge command to be the fastest option, but as always, we require hard data before we make any recommendations.

Insert performance on a unique constraint.

In the past, I have read about the performance of exception handlers and all that is said is, There is performance overhead. Okay, the generic answer of “performance overhead” tells us nothing.

One of the great pleasures of what I do is code reviews.  I get to learn techniques from other developers and I get to pass on some of my knowledge.  Code reviews are a win-win all around.

Recently, I reviewed some code that had the following code block to enforce a unique constraint. (tables and variables have been changed to protect intellectual property)  There is a unique constraint on ins_demo.column1, so the select will do a index unique scan for every select.  The use case for this example is a transactional system where different users will be imputing data.

begin
select count(*)
into dummy
from ins_demo
where column1 = p_value;

if dummy = 0 then
insert into ins_demo values (p_value);
return true;
else
return false;
end if;
end;

My knee jerk reaction is to change this from a select+insert to insert with an exception handler to improve performance.

Begin
insert into ins_demo values (p_value);
return true;
exception when dup_val_on_index then
return false;
end;

Before making recommendations, make sure you first do no harm. So, in this case I decided to run some performance test against the select+insert and the insert with an exception.  The test will attempt to do 100,000 inserts starting will 100% success to 100% failures.

The results of the test have a few surprises.

Select+Insert 100% success:

execution time = 21.93 seconds

Insert CPU= 9.6

Select CPU = 2.55

Insert with exception 100% success:

execution time = 14.29 seconds

Insert CPU = 9.95

Recursive CPU = 8.96

 Select+Insert 100% reject due to duplicate rows:

execution time = 5.86 seconds

Insert CPU = 0

Select CPU = 1.73

Insert with exception 100% rejected due to duplicate rows:

execution time = 135.03 seconds

Insert CPU = 16.85

Exception CPU = 20.7

We can learn a few things from this.

1) In the select + insert method, as the number of rejections increased execution time decreased, insert CPU decreased and select CPU decreased.  This is due to a couple of things.  A unique index scan is faster when the data is found and as the number of rejected rows increase due to unique index violation, the fewer times an insert is executed.

2) In the insert with exception handler, execution time increases as the number of rows are rejected due to unique index violations increases.  This is because Oracle attempts to do the insert and then must roll the transaction back.

I will post a more detailed analysis with the raw performance data tomorrow.

displaying histograms of date and number data

It started with a bit of pl/sql in an ole’ library of mine. Knowing the old code was not the best way to display histogram data and If you can do something in one sql statement then that’s how you should do it.

Frequently when walking into a new environment we need to learn about the data.  The cardinality along with the clustering of the data has a big impact on the CBO, so understanding your data is critical to optimizing the performance of your database.

First I need to load up a table with some test data.  updt_stat simply takes the order date and returns a status.  This is to simulate and order entry system.

1 create or replace function updt_stat (pdate in date) return varchar2 as 2 value varchar2(10); 3 tmp number; 4 begin 5 tmp := dbms_random.value(0,10); 6 case 7 when pdate > sysdate - 50 and tmp > 9 8 then value := 'Canceled'; 9 when pdate <= sysdate - 50 10 then value := 'Shipped'; 11 when pdate >= sysdate - 50 and pdate < sysdate - 1 12 then value := 'Picked'; 13 when pdate >= sysdate 14 then value := 'Ordered'; 15 else 16 value := 'Back Order'; 17 end case; 18 return value; 19 end; 20 / 21

Then lets create a test table.

1 create table t1 ( 2 id number primary key, 3 order_date date not null, 4 col1 number not null, 5 status varchar2(25) not null); 6 7 alter table t1 modify id default t1_seq.nextval;

Now populate the table with some test data.

1 declare 2 --create or replace procedure simulate_order_entry as 3 4 i number; -- an index variable 5 sdate date; -- the date we are going to insert. 6 7 cursor c1 is 8 select dbms_random.value(0,10000) col1 9 from dual 10 connect by level <=36500; 11 begin 12 13 -- one year of data 14 select sysdate-365 into sdate from dual; 15 i := 1; 16 17 -- start the loop, 36,500 times 18 -- one year of data 19 20 for rec in c1 21 loop 22 if ( i / 100 = 1 ) then 23 i := 1; 24 sdate := sdate + 1; 25 end if; 26 27 insert into t1 (order_date, col1, status) 28 values 29 (sdate, rec.col1, updt_stat(sdate)); 30 31 i := i+1; -- i++ 32 end loop; 33 commit; 34 end; 35 / 36

It’s always been easy to display a histogram of text data. We’ve been doing this for decades.

1 RLOCKARD@pdev > select status, count(*) 2 from t1 3 group by status; 4 2 3 5 STATUS COUNT(*) 6 ------------------------- ---------- 7 Shipped 31284 8 Ordered 240 9 Picked 4382 10 Back Order 86 11 Canceled 508 12

But what about getting a histogram of the order date.  Oracle provides the function width_bucket that solves this problem.

Description of width_bucket.gif follows

1 RLOCKARD@pdev > select min(order_date), 2 max(order_date), 3 width_bucket(order_date, sysdate-365, sysdate+20, 12) bucket, 4 count(*) 5 from t1 6 group by width_bucket(order_date, sysdate-365, sysdate+20, 12) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(ORDER MAX(ORDER BUCKET COUNT(*) 10 --------- --------- ---------- ---------- 11 03-SEP-13 03-SEP-13 0 99 12 04-SEP-13 05-OCT-13 1 3168 13 06-OCT-13 06-NOV-13 2 3168 14 07-NOV-13 08-DEC-13 3 3168 15 09-DEC-13 09-JAN-14 4 3168 16 10-JAN-14 10-FEB-14 5 3168 17 11-FEB-14 14-MAR-14 6 3168 18 15-MAR-14 15-APR-14 7 3168 19 16-APR-14 17-MAY-14 8 3168 20 18-MAY-14 18-JUN-14 9 3168 21 19-JUN-14 20-JUL-14 10 3168 22 21-JUL-14 21-AUG-14 11 3168 23 22-AUG-14 06-SEP-14 12 1553 24 25

What about displaying numeric data.  We change the parameters of width_bucket to numeric data.

1 RLOCKARD@pdev > select min(col1), 2 max(col1), 3 width_bucket(col1, 0, 10000, 10) bucket, 4 count(*) 5 from t1 6 group by width_bucket(col1, 0, 10000, 10) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(COL1) MAX(COL1) BUCKET COUNT(*) 10 ---------- ---------- ---------- ---------- 11 .013540186 999.847819 1 3593 12 1000.32556 1999.71215 2 3619 13 2000.26786 2999.44229 3 3593 14 3000.33048 3999.69273 4 3746 15 4000.37659 4999.86446 5 3721 16 5000.02576 5999.80972 6 3565 17 6000.0222 6999.70225 7 3677 18 7000.05808 7999.67615 8 3644 19 8000.11224 8999.93555 9 3644 20 9000.31992 9999.63753 10 3698 21

Why does the Oracle cbo interest me so much?

There is a short answer and a very long answer to this.  I will try and keep the answer somewhere in the middle of short and long.

The first assumption we must accept and from my observations over the past several years, most Oracle developers don’t understand the Cost Based Optimizer, yet they use it everyday.

Way back in the dark ages, full table scans were evil and we used hints to get rid of this dreaded access path.  As you can see from the bellow query, we got a full table scan of t1

So we solve this by adding a hint.

That’s better, we think but lets look at a few things.  Did the hint really help? The first query had 70,308 consistent gets, 3,884 physical reads and cost was 1,065.  The second query where we thought we were being cleaver had 1,057,564 consistent gets, 5868 physical reads and the cost was 990K.

Well, maybe I was not so cleaver by forcing Oracle to use an index, maybe the CBO is smart enough to put together the correct plan and the full table scan is not evil after all.

A few years ago I was at ODTUG and heard Maria Colgan speaking about the CBO and I was hooked.  Maria made me realize I need to understand what the CBO was doing.  Yes sometimes the CBO gets it wrong, but there are ways to correct that.

I’m now putting together a series of presentations on understanding the CBO using the scientific approach.  Why not just publish how the CBO works? Why not publish ROT (Rule Of Thumb) for the CBO.  Well the answer to that is quite simple.  Every environment is different and it’s much better to understand your data and how the CBO works with your data and queries.  There will be a presentations that will go through how to read an explain plan, however the other presentations will teach you how to establish what the CBO is doing in your environment and you will be able to explain it.  That makes you smarter and more valuable.

Stay tuned, I will be posting blurbs and hints on the CBO.