Subtle variations in optimizer stats

Subtle variances in the way you work with statistics can have a significant impact on how your optimizer plans work out…so you need to take care.

Let’s look at the following example


SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create index TX on T ( upper(x) ) ;

Index created.

So you can see that in reality, ALL of the rows have a single value for UPPER(X), namely “X”. So let’s look at an execution plan.


SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 1501193905

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |  1882 |   189K|   354   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1882 |   189K|   354   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TX   |   753 |       |   342   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("X")=:B1)

14 rows selected.

That seems incorrect. Bind peeking etc aside (because we are just using standard EXPLAIN), we would expect to not use an index when the there is only 1 value across the whole table.

Let’s try that experiment again.


SQL> drop table T;

Table dropped.

SQL>
SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL>
SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> create index TX on T ( upper(x) ) ;

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL>
SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   188K|    18M|   786   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   188K|    18M|   786   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(UPPER("X")=:B1)

13 rows selected.

And this time it has got the estimate just right. So what did we do different ? If you look at the two scripts, in the first one, we did

“gather table stats, create index, gather index stats”

and in the second one, we did

“create index, gather table stats, gather index stats”

The subtle difference there is what statistics were calculated at column level on the hidden column (which got created as part of our function based index).

In the second (“correct”) example, statistics were gathered on ALL of the columns, including the hidden one


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$                              1           2          0          1

whereas in the first example, because the hidden column did not yet exist when we gathered the table stats, it got missed.


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$

So if you see execution plans that don’t seem right…double check your statistics – you might be missing some.

Quick tip on Function Based Indexes

For “normal” indexes, USER_IND_COLUMNS will contain the column name(s) for an index, but things (appear to) go astray when looking at function-based indexes.


SQL> create table blah ( x varchar2(30));

Table created.

SQL> create index blah_ix on blah ( upper(x));

Index created.

SQL> select column_name from user_ind_columns
  2  where index_name = 'BLAH_IX'
  3  /

COLUMN_NAME
------------------------------
SYS_NC00002$

Don’t panic. Simply take a look at USER_IND_EXPRESSIONS to find the function that you used.


SQL> select column_expression from user_ind_expressions
  2  where index_name = 'BLAH_IX'
  3  /

COLUMN_EXPRESSION
--------------------------------------------------------------------------------
UPPER("X")

Easy peasy

A little known RI clause

Most people are aware of this standard referential integrity control, namely you can’t delete the parent if children exist


SQL> drop table PAR cascade constraints purge;

Table dropped.

SQL> create table PAR ( p int primary key, data int);

Table created.

SQL> insert into PAR
  2  select rownum, rownum
  3  from dual connect by level <= 7;

7 rows created.

SQL>
SQL> drop table CHD purge;

Table dropped.

SQL> create table CHD ( c int primary key, p int references PAR ( p ) );

Table created.

SQL>
SQL> insert into CHD
  2  select rownum, mod(rownum,4)+1
  3  from dual connect by level <= 12;

12 rows created.

SQL>
SQL> delete from PAR
  2  where p = 2;
delete from PAR
*
ERROR at line 1:
ORA-02292: integrity constraint (MCDONAC.SYS_C0018225) violated - child record found

And most people are also aware that you can head to the other extreme, and wipe out the children when you wipe out the parent


SQL> drop table PAR cascade constraints purge;

Table dropped.

SQL> create table PAR ( p int primary key, data int);

Table created.

SQL> insert into PAR
  2  select rownum, rownum
  3  from dual connect by level <= 7;

7 rows created.

SQL>
SQL> drop table CHD purge;

Table dropped.

SQL> create table CHD ( c int primary key, p int references PAR ( p ) ON DELETE CASCADE);

Table created.

SQL>
SQL> insert into CHD
  2  select rownum, mod(rownum,4)+1
  3  from dual connect by level <= 12;

12 rows created.

SQL>
SQL> delete from PAR
  2  where p = 2;

1 row deleted.

SQL> select * from CHD;

         C          P
---------- ----------
         2          3
         3          4
         4          1
         6          3
         7          4
         8          1
        10          3
        11          4
        12          1

9 rows selected.

But don’t forget, there is also a third option that you can implement declaratively


SQL> drop table PAR cascade constraints purge;

Table dropped.

SQL> create table PAR ( p int primary key, data int);

Table created.

SQL> insert into PAR
  2  select rownum, rownum
  3  from dual connect by level <= 7;

7 rows created.

SQL>
SQL> drop table CHD purge;

Table dropped.

SQL> create table CHD ( c int primary key, p int references PAR ( p ) ON DELETE SET NULL);

Table created.

SQL>
SQL> insert into CHD
  2  select rownum, mod(rownum,4)+1
  3  from dual connect by level <= 12;

12 rows created.

SQL>
SQL> delete from PAR
  2  where p = 2;

1 row deleted.

SQL> select * from CHD;

         C          P
---------- ----------
         1
         2          3
         3          4
         4          1
         5
         6          3
         7          4
         8          1
         9
        10          3
        11          4
        12          1

12 rows selected.

SQL>
SQL>

I don’t think I’ve seen this used in a Production instance. If you’ve seen it, please add a comment – I’d be curious to see a use case.

Active and Inactive Sessions

Most people are aware of the STATUS column in V$SESSION. If it’s ‘ACTIVE’ then that connection is in the process of consuming database resources (running a SQL statement etc).

However, a lesser known column which is probably even more useful is the LAST_CALL_ET column. The name suggests the time since the last call, but the documentation provides a more useful description:

 

If the session STATUS is currently ACTIVE, then the value represents the elapsed time (in seconds) since the session has become active.

If the session STATUS is currently INACTIVE, then the value represents the elapsed time (in seconds) since the session has become inactive.

 

Thus this is a great column to get a view on what sessions are doing on your system, for example


select s.sid||','||s.serial# sess, 
       s.USERNAME,
       s.last_call_et, 
       s.status, 
       s.sql_address, 
       s.program
from v$session s
where ( s.status = 'ACTIVE' and s.last_call_et > 10 ) or      -- has been active for 10 seconds or more
      ( s.status != 'ACTIVE' and s.last_call_et > 1200 );     -- has been inactive for 20 mins or more