A cool thing with EXCHANGE PARTITION (part 2)

In the previous post, I showed that even though a partition was “removed” (ie, exchanged out) from a table, a query running against the table could still successfully run the completion.

However, of course, if once that partition is exchanged out, it is now a table in it’s own right…and is subject to the whims of what a DBA may wish to do with it.   If that table is dropped, or truncated, then as you might expect, our query is going to struggle to find that data ! Smile

Here’s an example of what happens when the query cannot successfully run:

Those pesky dates as strings

You might be thinking “Nothing is more frustrating that encountering a string column that is full of dates”.

But there is something worse than that…and that is, when that string column is full of potential dates, and your job is to sift out the good data from the bad data. For example, if your table looks like this:


SQL> select * from T;

DATE_STR
--------------------
qwe
01/01/2000
31/02/2000
12-jan-14
20001212
Jan 14, 2016

6 rows selected.

Some of those values are definitely not dates, some of them definitely are dates, and some of them are might be dates. The only real way of knowing is to try convert those strings to dates, and see what happens. But we cannot just throw a TO_DATE around the column, because the moment we encounter a bad value, our query will crash. Even if all of the data was valid, because the formats of the strings are variable, we’d still get issues with that approach.

And the comes the kicker (because we get this on AskTom all the time)…

“Can we do it without creating a PLSQL function?”

This always bamboozles me…it is like saying “I need to write a book, but I’m only allowed to use a DVORAK keyboard and my tongue, with one eye closed”.

Why restrict yourself on the facilities available ?

Anyway, here is my workaround and not a stored function in sight Smile


SQL> with
  2    function date_checker(p_str varchar2) return date is
  3      l_format sys.odcivarchar2list :=
  4         sys.odcivarchar2list('dd/mm/yyyy','dd-mon-yyyy','yyyymmdd','Mon DD, YYYY');
  5      l_dte date;
  6    begin
  7      for i in 1 .. l_format.count loop
  8        begin
  9          l_dte := to_date(p_str,l_format(i));
 10          return l_dte;
 11        exception
 12          when others then
 13            if i = l_format.count then return null; end if;
 14        end;
 15      end loop;
 16    end;
 17  select date_str, date_checker(date_str) str_as_date
 18  from t
 19  /

DATE_STR             STR_AS_DA
-------------------- ---------
qwe
01/01/2000           01-JAN-00
31/02/2000
12-jan-14            12-JAN-14
20001212             12-DEC-00
Jan 14, 2016         14-JAN-16

6 rows selected.

Gotta love 12c Smile

Subtle variations in optimizer stats

Subtle variances in the way you work with statistics can have a significant impact on how your optimizer plans work out…so you need to take care.

Let’s look at the following example


SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create index TX on T ( upper(x) ) ;

Index created.

So you can see that in reality, ALL of the rows have a single value for UPPER(X), namely “X”. So let’s look at an execution plan.


SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 1501193905

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |  1882 |   189K|   354   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1882 |   189K|   354   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TX   |   753 |       |   342   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("X")=:B1)

14 rows selected.

That seems incorrect. Bind peeking etc aside (because we are just using standard EXPLAIN), we would expect to not use an index when the there is only 1 value across the whole table.

Let’s try that experiment again.


SQL> drop table T;

Table dropped.

SQL>
SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL>
SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> create index TX on T ( upper(x) ) ;

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL>
SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   188K|    18M|   786   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   188K|    18M|   786   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(UPPER("X")=:B1)

13 rows selected.

And this time it has got the estimate just right. So what did we do different ? If you look at the two scripts, in the first one, we did

“gather table stats, create index, gather index stats”

and in the second one, we did

“create index, gather table stats, gather index stats”

The subtle difference there is what statistics were calculated at column level on the hidden column (which got created as part of our function based index).

In the second (“correct”) example, statistics were gathered on ALL of the columns, including the hidden one


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$                              1           2          0          1

whereas in the first example, because the hidden column did not yet exist when we gathered the table stats, it got missed.


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$

So if you see execution plans that don’t seem right…double check your statistics – you might be missing some.

Quick tip on Function Based Indexes

For “normal” indexes, USER_IND_COLUMNS will contain the column name(s) for an index, but things (appear to) go astray when looking at function-based indexes.


SQL> create table blah ( x varchar2(30));

Table created.

SQL> create index blah_ix on blah ( upper(x));

Index created.

SQL> select column_name from user_ind_columns
  2  where index_name = 'BLAH_IX'
  3  /

COLUMN_NAME
------------------------------
SYS_NC00002$

Don’t panic. Simply take a look at USER_IND_EXPRESSIONS to find the function that you used.


SQL> select column_expression from user_ind_expressions
  2  where index_name = 'BLAH_IX'
  3  /

COLUMN_EXPRESSION
--------------------------------------------------------------------------------
UPPER("X")

Easy peasy

A little known RI clause

Most people are aware of this standard referential integrity control, namely you can’t delete the parent if children exist


SQL> drop table PAR cascade constraints purge;

Table dropped.

SQL> create table PAR ( p int primary key, data int);

Table created.

SQL> insert into PAR
  2  select rownum, rownum
  3  from dual connect by level <= 7;

7 rows created.

SQL>
SQL> drop table CHD purge;

Table dropped.

SQL> create table CHD ( c int primary key, p int references PAR ( p ) );

Table created.

SQL>
SQL> insert into CHD
  2  select rownum, mod(rownum,4)+1
  3  from dual connect by level <= 12;

12 rows created.

SQL>
SQL> delete from PAR
  2  where p = 2;
delete from PAR
*
ERROR at line 1:
ORA-02292: integrity constraint (MCDONAC.SYS_C0018225) violated - child record found

And most people are also aware that you can head to the other extreme, and wipe out the children when you wipe out the parent


SQL> drop table PAR cascade constraints purge;

Table dropped.

SQL> create table PAR ( p int primary key, data int);

Table created.

SQL> insert into PAR
  2  select rownum, rownum
  3  from dual connect by level <= 7;

7 rows created.

SQL>
SQL> drop table CHD purge;

Table dropped.

SQL> create table CHD ( c int primary key, p int references PAR ( p ) ON DELETE CASCADE);

Table created.

SQL>
SQL> insert into CHD
  2  select rownum, mod(rownum,4)+1
  3  from dual connect by level <= 12;

12 rows created.

SQL>
SQL> delete from PAR
  2  where p = 2;

1 row deleted.

SQL> select * from CHD;

         C          P
---------- ----------
         2          3
         3          4
         4          1
         6          3
         7          4
         8          1
        10          3
        11          4
        12          1

9 rows selected.

But don’t forget, there is also a third option that you can implement declaratively


SQL> drop table PAR cascade constraints purge;

Table dropped.

SQL> create table PAR ( p int primary key, data int);

Table created.

SQL> insert into PAR
  2  select rownum, rownum
  3  from dual connect by level <= 7;

7 rows created.

SQL>
SQL> drop table CHD purge;

Table dropped.

SQL> create table CHD ( c int primary key, p int references PAR ( p ) ON DELETE SET NULL);

Table created.

SQL>
SQL> insert into CHD
  2  select rownum, mod(rownum,4)+1
  3  from dual connect by level <= 12;

12 rows created.

SQL>
SQL> delete from PAR
  2  where p = 2;

1 row deleted.

SQL> select * from CHD;

         C          P
---------- ----------
         1
         2          3
         3          4
         4          1
         5
         6          3
         7          4
         8          1
         9
        10          3
        11          4
        12          1

12 rows selected.

SQL>
SQL>

I don’t think I’ve seen this used in a Production instance. If you’ve seen it, please add a comment – I’d be curious to see a use case.