SAMPLE costing

People often think when using the SAMPLE clause, that because they are only dealing with a subset of the data, that immediately it should be a much faster operation. And whilst sometimes this is indeed the case, it is not a guarantee. The optimizer probably gives us a hint ( no pun intended) as to how we should interpret the SAMPLE clause.

Lets compare a standard scan of a table, with a sampled scan.



SQL> create table T as select * from dba_Objects;

Table created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from T;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 96399 |    10M|   451   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    | 96399 |    10M|   451   (1)| 00:00:01 |
--------------------------------------------------------------------------

SQL>
SQL> select * from T sample (1);

Execution Plan
----------------------------------------------------------
Plan hash value: 2767392432

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   964 |   108K|   450   (1)| 00:00:01 |
|   1 |  TABLE ACCESS SAMPLE| T    |   964 |   108K|   450   (1)| 00:00:01 |
----------------------------------------------------------------------------

The “Rows” seems to make sense, ie, a one percent sample will return about one percent of the rows, but notice that the Cost is the same in both cases. Now at first thought, you might be thinking that this is wrong – after all, if I only have to produce 1/100th of the data, then why would it cost the same ? Well, its because we are sampling rows. To get 1/100th of the rows, it is still quite possible that we’d have to scan every single block in the table – that’s what sampling is, getting a pseudo-randomly distributed set of rows.

What most people think sampling is, is to pick a random subset of blocks, and then grab the rows from that. You can indeed so that, and the costs are reflected accordingly.




SQL> select * from T sample block (1);

Execution Plan
----------------------------------------------------------
Plan hash value: 2767392432

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   964 |   108K|     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS SAMPLE| T    |   964 |   108K|     6   (0)| 00:00:01 |
----------------------------------------------------------------------------


Dont forget though, that as an empty table grows, then its quite possible that rows clumped together in blocks are not as random a sample of the data you might think Smile

One thought on “SAMPLE costing

  1. rownum (stopkey) with joins also disappoints me

    SQL> select * from a,b where a=b;
    9970 rows selected.

    Execution Plan
    ———————————————————-
    Plan hash value: 3198038340

    ——————————————————————————-
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ——————————————————————————-
    | 0 | SELECT STATEMENT | | 9970 | 79760 | 15 (7)| 00:00:01 |
    |* 1 | HASH JOIN | | 9970 | 79760 | 15 (7)| 00:00:01 |
    | 2 | INDEX FAST FULL SCAN| IND_A | 9970 | 39880 | 7 (0)| 00:00:01 |
    | 3 | INDEX FAST FULL SCAN| IND_B | 10000 | 40000 | 7 (0)| 00:00:01 |
    ——————————————————————————-

    SQL> select * from a,b where a=b and rownum <= 100;
    100 rows selected.

    Execution Plan
    ———————————————————-
    Plan hash value: 82431281
    ——————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ——————————————————————————–
    | 0 | SELECT STATEMENT | | 100 | 800 | 15 (7)| 00:00:01 |
    |* 1 | COUNT STOPKEY | | | | | |
    |* 2 | HASH JOIN | | 100 | 800 | 15 (7)| 00:00:01 |
    | 3 | INDEX FAST FULL SCAN| IND_A | 9970 | 39880 | 7 (0)| 00:00:01 |
    | 4 | INDEX FAST FULL SCAN| IND_B | 10000 | 40000 | 7 (0)| 00:00:01 |
    ——————————————————————————–

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s