#infosec #Oracle data #redaction

I had a rather disturbing conversion with my wife a couple nights ago, she had called a service provider to discuss a bill.  The customer service rep asked her for her social security number to verify her identity.  When my wife told the customer service rep she was not comfortable giving out that information, the customer service rep told her that he has her full social security number on the screen in front of him. WTF. This is a large corporation, and I’m wondering if they are ignoring the problem until there is a leak and they get embarrassed because after all it cost money to protect data. 

Do you store sensitive information? Does you’re user base have access to that information?  Does you’re user base need to see that sensitive information? I have always operated on the principle of “need to know.”  This simply states, does a person need to know something in order to do their job?  If the answer to that is no, then don’t give them the information.

In the customers table we are storing two very sensitive pieces of information, SSN and Credit Card Number.  Now sit back and think, what would happen to your company / job / reputation if this information were to leak? 

1 SQL> desc customers 2 Name Null? Type 3 ----------------------------------------- -------- ---------------------------- 4 ID NOT NULL NUMBER 5 SSN VARCHAR2(25) 6 FNAME VARCHAR2(55) 7 LNAME VARCHAR2(55) 8 UPDATE_DATE DATE 9 CREDIT_CARD_NBR VARCHAR2(19) 10 CREDIT_LIMIT NUMBER 11 12

Before we start, lets populate customers.ssn and customers.credit_card_nbr with random data just to make sure there is no live data is involved in this demonstration.

1 SQL> @redaction_demo 2 SQL> conn rlockard@pdev 3 Enter password: 4 Connected. 5 SQL> 6 SQL> update customers set ssn = 7 2 ceil(dbms_random.value(0,9)) || 8 3 ceil(dbms_random.value(0,9)) || 9 4 ceil(dbms_random.value(0,9)) || '-' || 10 5 ceil(dbms_random.value(0,9)) || 11 6 ceil(dbms_random.value(0,9)) || '-' || 12 7 ceil(dbms_random.value(0,9)) || 13 8 ceil(dbms_random.value(0,9)) || 14 9 ceil(dbms_random.value(0,9)) || 15 10 ceil(dbms_random.value(0,9)), 16 11 credit_card_nbr = 17 12 ceil(dbms_random.value(0,9)) || 18 13 ceil(dbms_random.value(0,9)) || 19 14 ceil(dbms_random.value(0,9)) || 20 15 ceil(dbms_random.value(0,9)) || ' ' || 21 16 ceil(dbms_random.value(0,9)) || 22 17 ceil(dbms_random.value(0,9)) || 23 18 ceil(dbms_random.value(0,9)) || 24 19 ceil(dbms_random.value(0,9)) || ' ' || 25 20 ceil(dbms_random.value(0,9)) || 26 21 ceil(dbms_random.value(0,9)) || 27 22 ceil(dbms_random.value(0,9)) || 28 23 ceil(dbms_random.value(0,9)) || ' ' || 29 24 ceil(dbms_random.value(0,9)) || 30 25 ceil(dbms_random.value(0,9)) || 31 26 ceil(dbms_random.value(0,9)) || 32 27 ceil(dbms_random.value(0,9)); 33 34 1666 rows updated. 35 36 SQL> 37 SQL> commit; 38

So lets connect at app_user and see what we can see.  Umm, I don’t like that.  Imagine the damage that could be caused if a less then honest employee went on a shopping spree.  I know of once such case where a customer service rep had access to information like this. When the theft was discovered, we estimate over $250,000 was stolen over a six month period.

1 SQL> conn app_user/app_user@pdev 2 Connected. 3 SQL> select ssn, credit_card_nbr from rlockard.customers; 4 /* snip a lot */ 5 428-72-3919 2671 6148 1798 1588 6 251-71-5615 7925 4145 5815 3778 7 345-69-2348 9457 1637 2244 2527 8 967-55-1843 7474 9595 9119 7212 9 969-21-1324 4642 3219 4458 6715 10

What if we could create a policy that only let the user see the last four digits of the ssn and the last for digits of the credit card number?  We use the dbms_redact.add_policy to a table column.  in this case we are redacting rlockard.customers.ssn and giving it a policy name of cust_sensitive.  There is also the expression; expression when evaluated to true the policy will be used. We are also passing function_type that is dbms_redact.partial and function_parameters.  Function parameters uses the predefined constant dbms_redact.redact_us_ssn_f5.  This redacts the first five characters of the social security number.

1 SQL> conn rlockard@pdev 2 Enter password: 3 Connected. 4 SQL> 5 SQL> declare 6 2 begin 7 3 dbms_redact.add_policy ( 8 4 object_schema => 'RLOCKARD', 9 5 object_name => 'CUSTOMERS', 10 6 policy_name => 'CUST_SENSITIVE', 11 7 expression => '1=1', 12 8 column_name => 'SSN', 13 9 function_type => dbms_redact.partial, 14 10 function_parameters => DBMS_REDACT.REDACT_US_SSN_F5 15 11 ); 16 12 end; 17 13 / 18 19 PL/SQL procedure successfully completed. 20

Okay that takes care of SSN, and oh by the way “expression => ‘1=1’ is not required on Oracle 12C.  function_parameters uses the constant constant ‘DBMS_REDACT.REDACT_US_SSN_F5 that redacts the first five characters of the SSN.

Now lets take care of credit_card_number.  In this case because we have already added the cust_sensitive policy we are going to alter the policy to add a column (this is the action parameter.)  We then add credit_card_nbr and use the predefined constant dbms_redact.redact_cc16_f12.

1 SQL> 2 SQL> BEGIN 3 2 DBMS_REDACT.ALTER_POLICY ( 4 3 object_schema => 'RLOCKARD', 5 4 object_name => 'CUSTOMERS', 6 5 policy_name => 'CUST_SENSITIVE', 7 6 column_name => 'CREDIT_CARD_NBR', 8 7 action => DBMS_REDACT.ADD_COLUMN, 9 8 function_type => DBMS_REDACT.PARTIAL, 10 9 function_parameters => DBMS_REDACT.REDACT_CCN16_F12, 11 10 expression => '1=1' 12 11 ); 13 12 END; 14 13 / 15 16 PL/SQL procedure successfully completed. 17

Well Oracle also provides us with the constant DBMS_REDACT.REDACT_CCN16_F12 that redacts the first twelve characters of a sixteen digit credit card number.  So what does this look like.

1 SQL> 2 SQL> conn app_user/app_user@pdev 3 Connected. 4 select ssn, credit_card_nbr from rlockard.customers; 5 /* snip a lot */ 6 XXX-XX-1979 ****-****-****-8336 7 XXX-XX-4224 ****-****-****-2794 8 XXX-XX-3475 ****-****-****-2386 9 XXX-XX-8333 ****-****-****-5569 10 11 SSN CREDIT_CARD_NBR 12 ------------------------- ------------------- 13 XXX-XX-6396 ****-****-****-1936 14 XXX-XX-8118 ****-****-****-7557 15 XXX-XX-8122 ****-****-****-9645 16 XXX-XX-3169 ****-****-****-2112 17 XXX-XX-2722 ****-****-****-1597

Well we have connected as app_user, and the data is now redacted to a point where it is semi-secure. I would not be comfortable with this being public, but it’s a big improvement over displaying the entire social security number and credit card number.

#Infosec

I recently have been receiving emails for consulting work.  This is a good thing because I’m back on the market.  However; some of the emails I have been receiving are asking for PII.  This information can be used for identity theft.

Here’s an example: Asking for my date of birth and the last four digits of my Social Security Number.  There is no good reason for any head hunter to be asking for that information.  If you receive a query for a job asking for this information, hit the delete button.

InfoSec

Insert Performance analysis part 2

I promised I would have more today.  Here is the raw data from my analysis.  As you can see as rejection (dup_val_on_index) increased, the execution time for select + insert decreased and execution time increased for insert with exception.

This specific use case is a transactional system requiring  a unique index on ins_demo.column1 and we expected 20% of the time the unique index would be violated.

Here the two insert functions we are testing for performance.  They are really quite simple.  The first one is doing an insert into ins_demo and if there is a dup_val_on_index then return false. 

The second function does a select against ins_demo on the unique key.  If the key is not found then insert the value and return true, if the key is found then return false.

1 Begin 2 insert into ins_demo values (p_value); 3 return true; 4 exception when dup_val_on_index then 5 return false; 6 end;

1 begin 2 select count(*) 3 into dummy 4 from ins_demo 5 where column1 = p_value; 6 7 if dummy = 0 then 8 insert into ins_demo values (p_value); 9 return true; 10 else 11 return false; 12 end if; 13 end;

Here are the performance number for the two functions. I wrote a driver function that called the two functions 100,000 times and each once for a specific reject rate.

image

This bubble chart shows execution time on the y axis, rejection rate on the x axis and bubble size is the insert cpu time. 

image

We should all strive to analyze changes prior to making recommendations.  If I had gone with my knee jerk reaction to the select+insert and insisted on insert with an exception, system performance would have suffered.

This weekend I will run the same experiment using the merge command.  I expect the merge command to be the fastest option, but as always, we require hard data before we make any recommendations.

Insert performance on a unique constraint.

In the past, I have read about the performance of exception handlers and all that is said is, There is performance overhead. Okay, the generic answer of “performance overhead” tells us nothing.

One of the great pleasures of what I do is code reviews.  I get to learn techniques from other developers and I get to pass on some of my knowledge.  Code reviews are a win-win all around.

Recently, I reviewed some code that had the following code block to enforce a unique constraint. (tables and variables have been changed to protect intellectual property)  There is a unique constraint on ins_demo.column1, so the select will do a index unique scan for every select.  The use case for this example is a transactional system where different users will be imputing data.

begin
select count(*)
into dummy
from ins_demo
where column1 = p_value;

if dummy = 0 then
insert into ins_demo values (p_value);
return true;
else
return false;
end if;
end;

My knee jerk reaction is to change this from a select+insert to insert with an exception handler to improve performance.

Begin
insert into ins_demo values (p_value);
return true;
exception when dup_val_on_index then
return false;
end;

Before making recommendations, make sure you first do no harm. So, in this case I decided to run some performance test against the select+insert and the insert with an exception.  The test will attempt to do 100,000 inserts starting will 100% success to 100% failures.

The results of the test have a few surprises.

Select+Insert 100% success:

execution time = 21.93 seconds

Insert CPU= 9.6

Select CPU = 2.55

Insert with exception 100% success:

execution time = 14.29 seconds

Insert CPU = 9.95

Recursive CPU = 8.96

 Select+Insert 100% reject due to duplicate rows:

execution time = 5.86 seconds

Insert CPU = 0

Select CPU = 1.73

Insert with exception 100% rejected due to duplicate rows:

execution time = 135.03 seconds

Insert CPU = 16.85

Exception CPU = 20.7

We can learn a few things from this.

1) In the select + insert method, as the number of rejections increased execution time decreased, insert CPU decreased and select CPU decreased.  This is due to a couple of things.  A unique index scan is faster when the data is found and as the number of rejected rows increase due to unique index violation, the fewer times an insert is executed.

2) In the insert with exception handler, execution time increases as the number of rows are rejected due to unique index violations increases.  This is because Oracle attempts to do the insert and then must roll the transaction back.

I will post a more detailed analysis with the raw performance data tomorrow.

displaying histograms of date and number data

It started with a bit of pl/sql in an ole’ library of mine. Knowing the old code was not the best way to display histogram data and If you can do something in one sql statement then that’s how you should do it.

Frequently when walking into a new environment we need to learn about the data.  The cardinality along with the clustering of the data has a big impact on the CBO, so understanding your data is critical to optimizing the performance of your database.

First I need to load up a table with some test data.  updt_stat simply takes the order date and returns a status.  This is to simulate and order entry system.

1 create or replace function updt_stat (pdate in date) return varchar2 as 2 value varchar2(10); 3 tmp number; 4 begin 5 tmp := dbms_random.value(0,10); 6 case 7 when pdate > sysdate - 50 and tmp > 9 8 then value := 'Canceled'; 9 when pdate <= sysdate - 50 10 then value := 'Shipped'; 11 when pdate >= sysdate - 50 and pdate < sysdate - 1 12 then value := 'Picked'; 13 when pdate >= sysdate 14 then value := 'Ordered'; 15 else 16 value := 'Back Order'; 17 end case; 18 return value; 19 end; 20 / 21

Then lets create a test table.

1 create table t1 ( 2 id number primary key, 3 order_date date not null, 4 col1 number not null, 5 status varchar2(25) not null); 6 7 alter table t1 modify id default t1_seq.nextval;

Now populate the table with some test data.

1 declare 2 --create or replace procedure simulate_order_entry as 3 4 i number; -- an index variable 5 sdate date; -- the date we are going to insert. 6 7 cursor c1 is 8 select dbms_random.value(0,10000) col1 9 from dual 10 connect by level <=36500; 11 begin 12 13 -- one year of data 14 select sysdate-365 into sdate from dual; 15 i := 1; 16 17 -- start the loop, 36,500 times 18 -- one year of data 19 20 for rec in c1 21 loop 22 if ( i / 100 = 1 ) then 23 i := 1; 24 sdate := sdate + 1; 25 end if; 26 27 insert into t1 (order_date, col1, status) 28 values 29 (sdate, rec.col1, updt_stat(sdate)); 30 31 i := i+1; -- i++ 32 end loop; 33 commit; 34 end; 35 / 36

It’s always been easy to display a histogram of text data. We’ve been doing this for decades.

1 RLOCKARD@pdev > select status, count(*) 2 from t1 3 group by status; 4 2 3 5 STATUS COUNT(*) 6 ------------------------- ---------- 7 Shipped 31284 8 Ordered 240 9 Picked 4382 10 Back Order 86 11 Canceled 508 12

But what about getting a histogram of the order date.  Oracle provides the function width_bucket that solves this problem.

Description of width_bucket.gif follows

1 RLOCKARD@pdev > select min(order_date), 2 max(order_date), 3 width_bucket(order_date, sysdate-365, sysdate+20, 12) bucket, 4 count(*) 5 from t1 6 group by width_bucket(order_date, sysdate-365, sysdate+20, 12) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(ORDER MAX(ORDER BUCKET COUNT(*) 10 --------- --------- ---------- ---------- 11 03-SEP-13 03-SEP-13 0 99 12 04-SEP-13 05-OCT-13 1 3168 13 06-OCT-13 06-NOV-13 2 3168 14 07-NOV-13 08-DEC-13 3 3168 15 09-DEC-13 09-JAN-14 4 3168 16 10-JAN-14 10-FEB-14 5 3168 17 11-FEB-14 14-MAR-14 6 3168 18 15-MAR-14 15-APR-14 7 3168 19 16-APR-14 17-MAY-14 8 3168 20 18-MAY-14 18-JUN-14 9 3168 21 19-JUN-14 20-JUL-14 10 3168 22 21-JUL-14 21-AUG-14 11 3168 23 22-AUG-14 06-SEP-14 12 1553 24 25

What about displaying numeric data.  We change the parameters of width_bucket to numeric data.

1 RLOCKARD@pdev > select min(col1), 2 max(col1), 3 width_bucket(col1, 0, 10000, 10) bucket, 4 count(*) 5 from t1 6 group by width_bucket(col1, 0, 10000, 10) 7 order by 1; 8 2 3 4 5 6 7 9 MIN(COL1) MAX(COL1) BUCKET COUNT(*) 10 ---------- ---------- ---------- ---------- 11 .013540186 999.847819 1 3593 12 1000.32556 1999.71215 2 3619 13 2000.26786 2999.44229 3 3593 14 3000.33048 3999.69273 4 3746 15 4000.37659 4999.86446 5 3721 16 5000.02576 5999.80972 6 3565 17 6000.0222 6999.70225 7 3677 18 7000.05808 7999.67615 8 3644 19 8000.11224 8999.93555 9 3644 20 9000.31992 9999.63753 10 3698 21

Why does the Oracle cbo interest me so much?

There is a short answer and a very long answer to this.  I will try and keep the answer somewhere in the middle of short and long.

The first assumption we must accept and from my observations over the past several years, most Oracle developers don’t understand the Cost Based Optimizer, yet they use it everyday.

Way back in the dark ages, full table scans were evil and we used hints to get rid of this dreaded access path.  As you can see from the bellow query, we got a full table scan of t1

So we solve this by adding a hint.

That’s better, we think but lets look at a few things.  Did the hint really help? The first query had 70,308 consistent gets, 3,884 physical reads and cost was 1,065.  The second query where we thought we were being cleaver had 1,057,564 consistent gets, 5868 physical reads and the cost was 990K.

Well, maybe I was not so cleaver by forcing Oracle to use an index, maybe the CBO is smart enough to put together the correct plan and the full table scan is not evil after all.

A few years ago I was at ODTUG and heard Maria Colgan speaking about the CBO and I was hooked.  Maria made me realize I need to understand what the CBO was doing.  Yes sometimes the CBO gets it wrong, but there are ways to correct that.

I’m now putting together a series of presentations on understanding the CBO using the scientific approach.  Why not just publish how the CBO works? Why not publish ROT (Rule Of Thumb) for the CBO.  Well the answer to that is quite simple.  Every environment is different and it’s much better to understand your data and how the CBO works with your data and queries.  There will be a presentations that will go through how to read an explain plan, however the other presentations will teach you how to establish what the CBO is doing in your environment and you will be able to explain it.  That makes you smarter and more valuable.

Stay tuned, I will be posting blurbs and hints on the CBO.

A must read for anyone who #manages people

I have had the pleasure to work with some great managers who know about big boy/girl rules and sadly some micromanagers who well micromanage. Those who know me well can imagine just how well I’ve gotten along with micromanagers.

http://dbakevlar.com/2011/07/the-care-and-feeding-of-good-skilled-employees/