Attribute clustering (part 1)

One of the nice facilities in 12c is the ability to instruct the database to co-locate data during loading in order to facilitate reduced resource consumption for subsequent queries.

Here’s an example of it in use. First I’ll create table SOURCE_DATA to mimic some randomised incoming data feed



SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

SQL> drop table source_data purge;
drop table source_data purge
           *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL>
SQL> create table source_data  as
  2      select d.*
  3      from dba_objects d
  4      where object_id is not null
  5      order by dbms_random.random;

Table created.

So let’s now populate (create) our table T with a straight copy of the data from SOURCE_DATA and index a column of interest


SQL> create table t as select * from source_data;

Table created.

SQL>
SQL> select object_id, owner from t where rownum < 50;

 OBJECT_ID OWNER
---------- ------------------------------
     37627 PUBLIC
     79081 ORDSYS
     26953 SYS
     74744 PUBLIC
     47975 PUBLIC
     94003 APEX_050000
     59014 SYS
     66409 PUBLIC
     36977 PUBLIC
     74886 PUBLIC
       565 SYS
     35456 SYS
     74656 SYS
     61451 PUBLIC
     11486 SYS
     61444 SYS
     44048 SYS
     24810 PUBLIC
     27578 SYS
     68871 PUBLIC
     22010 SYS
     31915 PUBLIC
     60762 SYS
     45858 SYS
     44413 PUBLIC
     61323 PUBLIC
     81339 ORDSYS
     31560 SYS
      8621 SYS
     42483 PUBLIC
     35172 SYS
     88978 APEX_040200
     67118 PUBLIC
     27550 SYS
      7244 SYS
     96985 APEX_050000
     80677 PUBLIC
     65032 SYS
     42454 SYS
     69067 PUBLIC
     15227 SYS
     29374 SYS
     19280 XDB
     45063 PUBLIC
     61815 PUBLIC
     94727 APEX_050000
     31122 SYS
     51869 PUBLIC
     39565 PUBLIC

49 rows selected.

SQL>
SQL> create index t_idx on t(object_id);

Index created.

You can see the data is in pseudo-random order on both OBJECT_ID and OWNER. (We only care about OBJECT_ID for the sake of this demo). Let us now see what the I/O of an index range scan query on OBJECT_ID costs us.


SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;

4997 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5339  consistent gets
         12  physical reads
          0  redo size
     629403  bytes sent via SQL*Net to client
       4214  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4997  rows processed

SQL> set autotrace off
SQL>
SQL>

Just under 5400 consistent gets, which is not dissimilar from the rows we fetched, because those rows from OBJECT_ID 10,000 to 15,000 they are scattered throughout the table. Let us now repeat the exercise, this time using the clustering facilities available under 12c.


SQL>
SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t
  2  clustering by linear order(object_id)
  3  as select * from source_data;

Table created.

SQL>
SQL> select object_id, owner from t where rownum < 50;

 OBJECT_ID OWNER
---------- ------------------------------
         2 SYS
         3 SYS
         4 SYS
         5 SYS
         6 SYS
         7 SYS
         8 SYS
         9 SYS
        10 SYS
        11 SYS
        12 SYS
        13 SYS
        14 SYS
        15 SYS
        16 SYS
        17 SYS
        18 SYS
        19 SYS
        20 SYS
        21 SYS
        22 SYS
        23 SYS
        24 SYS
        25 SYS
        26 SYS
        27 SYS
        28 SYS
        29 SYS
        30 SYS
        31 SYS
        32 SYS
        33 SYS
        34 SYS
        35 SYS
        36 SYS
        37 SYS
        38 SYS
        39 SYS
        40 SYS
        41 SYS
        42 SYS
        43 SYS
        44 SYS
        45 SYS
        46 SYS
        47 SYS
        48 SYS
        49 SYS
        50 SYS

49 rows selected.

SQL>
SQL> create index t_idx on t(object_id);

Index created.

SQL>
SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;

4997 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        756  consistent gets
         12  physical reads
          0  redo size
     629403  bytes sent via SQL*Net to client
       4214  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4997  rows processed

SQL> set autotrace off
SQL>
SQL>

Before we even used the index, you could see from the sample of rows we queried from the table, the data has been stored in an ordered sequence. As a result, our index lookup is dramatically improved, dropping down to just ~750 consistent gets.  Clustering is covered in great detail in the Data Warehousing Guide, but I presented this example to help with my next blog post on the same topic (coming soon).

Some people have mentioned that Attribute Clustering is an engineered system feature only.  This is not correct – you’re free to Attribute Clustering in Enterprise Edition. It is only when you want to incorporate Zone Maps that licensing things get more complicated Smile

4 thoughts on “Attribute clustering (part 1)

  1. Looks nice, but is this just a new implementation of clustering by cluster key, available since about (?) Version 6?

    Single table hash clusters are I suppose a bit more complex to define and old style clustering was not used much in my experience.

  2. There is a lot more to it that the conventional clustering by key – check the docs. There linear clustering, grouped clustering, data movement handling, and the combination of zone maps can make order of magnitude differences. Plus no overhead of clustered indexes etc, or needing to know the final size of the object in advance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s