How important is "up to date" to you ?

I wrote a post a while back showing how one of the options with an Oracle Text index is to keep that index perfectly updated with its underlying table.  (Long termers with Oracle may know that this is a relatively recent addition to Text indexes, which historically were updated with calls to CTXSYS.CTX_DDL.SYNC_INDEX, either on demand or via the scheduler).

I also said that there is “no such thing as a free lunch”, as refreshing that index for every single commit introduces overheads.  So let’s explore that a little more thoroughly

First I sourced a file of hundreds of thousands of random english sentences, and exposed them to Oracle via an external table


SQL> DROP TABLE sentences;

Table dropped.

SQL>
SQL> CREATE TABLE sentences (
  2    line  VARCHAR2(4000)
  3  )
  4  ORGANIZATION EXTERNAL
  5  (
  6    TYPE ORACLE_LOADER
  7    DEFAULT DIRECTORY temp
  8    ACCESS PARAMETERS
  9    (
 10      RECORDS DELIMITED BY NEWLINE
 11      BADFILE 'sentences.bad'
 12      LOGFILE 'sentences.log'
 13      FIELDS TERMINATED BY '@'
 14      MISSING FIELD VALUES ARE NULL
 15      (
 16        line  CHAR(4000)
 17      )
 18    )
 19    LOCATION ('english.dat')
 20  )
 21  REJECT LIMIT UNLIMITED
 22  /

Table created.

SQL> SELECT * FROM sentences where rownum <= 10;

LINE
-------------------------------------------------------------
Let's try something.
I have to go to sleep.
Today is June 18th and it is Muiriel's birthday!
Muiriel is 20 now.
The password is "Muiriel".
I will be back soon.
I'm at a loss for words.
This is never going to end.
I just don't know what to say.
That was an evil bunny.

10 rows selected.

And now we will load 50,000 of those sentences into a Text-indexed table, one row at a time, committing (and syncing) for every row inserted.


SQL> create table t1 (
  2  x int,
  3  y varchar2(3000) ) tablespace demo
  4  ;

Table created.

SQL>
SQL>
SQL> begin
  2  ctx_ddl.create_preference('demo_text_idx', 'BASIC_STORAGE');
  3  ctx_ddl.set_attribute('demo_text_idx', 'I_TABLE_CLAUSE', 'tablespace demo');
  4  ctx_ddl.set_attribute('demo_text_idx', 'K_TABLE_CLAUSE', 'tablespace demo');
  5  ctx_ddl.set_attribute('demo_text_idx', 'R_TABLE_CLAUSE', 'tablespace demo');
  6  ctx_ddl.set_attribute('demo_text_idx', 'N_TABLE_CLAUSE', 'tablespace demo');
  7  ctx_ddl.set_attribute('demo_text_idx', 'I_INDEX_CLAUSE', 'tablespace demo');
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> create index t1_iz on T1  ( y )
  2  indextype is ctxsys.context
  3  parameters ('sync (on commit) storage demo_text_idx');

Index created.

SQL> conn mcdonac/******
Connected.

SQL> set timing on
SQL>
SQL> begin
  2  for i in ( select rownum r, line from sentences where rownum <= 50000 )   3  loop   4    insert into t1 values (i.r, i.line );   5    commit;   6  end loop;   7  end;   8  / PL/SQL procedure successfully completed. Elapsed: 00:14:34.98 SQL> @stat

[some stats omitted]

 SID_STAT# NAME                                                    VALUE
---------- -------------------------------------------------- ----------
         6 user commits                                           250000
         9 recursive calls                                       4351635
        14 session logical reads                                 7248007
        19 CPU used by this session                                35831
        55 physical read total bytes                           410017792
       128 db block gets                                         2387570
       132 consistent gets                                       4860437
       146 physical read bytes                                 407281664
       147 db block changes                                      3112305
       148 consistent changes                                      50041
       159 physical write bytes                                   712704
       258 redo size                                           666954796
      1089 execute count                                         2000136

1162 rows selected.

SQL> select segment_name, bytes from dba_segments
  2  where tablespace_name = 'DEMO'
  3  order by 1;

SEGMENT_NAME                                  BYTES
---------------------------------------- ----------
DR$T1_IZ$I                                 14680064
DR$T1_IZ$R                                    65536
DR$T1_IZ$X                                 10485760
DRC$T1_IZ$R                                   65536
SYS_IL0000162067C00006$$                      65536
SYS_IL0000162072C00002$$                      65536
SYS_IOT_TOP_162070                          2097152
SYS_IOT_TOP_162076                            65536
SYS_LOB0000162067C00006$$                    131072
SYS_LOB0000162072C00002$$                   2293760
T1                                          3145728

11 rows selected.

So we can see that

  • the process took a little over 14 minutes
  • consumed around 660M of redo
  • the segment size totalled around 31M

Let’s now repeat the exercise with a more relaxed strategy on index freshness.  We will still commit every row inserted, but only sync the index every 100 rows.



SQL> drop table t1 purge;

Table dropped.

SQL>
SQL> create table t1 (
  2  x int,
  3  y varchar2(3000) ) tablespace demo
  4  ;

Table created.

SQL>
SQL>
SQL> create index t1_iz on T1  ( y )
  2  indextype is ctxsys.context
  3  parameters ('storage demo_text_idx');

Index created.

SQL>
SQL> conn mcdonac/******
Connected.

SQL> begin
  2  for i in ( select rownum r, line from sentences where rownum <= 50000 )   3  loop   4    insert into t1 values (i.r, i.line );   5    commit;   6   7    if mod(i.r,100) = 0 then   8      ctxsys.ctx_ddl.sync_index('T1_IZ');   9    end if;  10  end loop;  11  end;  12  / PL/SQL procedure successfully completed. Elapsed: 00:00:38.45 SQL> select segment_name, bytes from dba_segments
  2  where tablespace_name = 'DEMO'
  3  order by 1;

SEGMENT_NAME                        BYTES
------------------------------ ----------
DR$T1_IZ$I                       10485760
DR$T1_IZ$R                          65536
DR$T1_IZ$X                        7340032
DRC$T1_IZ$R                         65536
SYS_IL0000162255C00006$$            65536
SYS_IL0000162260C00002$$            65536
SYS_IOT_TOP_162258                2097152
SYS_IOT_TOP_162264                  65536
SYS_LOB0000162255C00006$$          131072
SYS_LOB0000162260C00002$$         2293760
T1                                3145728

11 rows selected.

Elapsed: 00:00:00.02
SQL>
SQL> @stat

 SID_STAT# NAME                                                    VALUE
---------- -------------------------------------------------- ----------
         6 user commits                                            52000
         9 recursive calls                                        349084
        14 session logical reads                                 1352820
        19 CPU used by this session                                 2719
        55 physical read total bytes                             6709248
       128 db block gets                                          940374
       132 consistent gets                                        412446
       146 physical read bytes                                   4087808
       147 db block changes                                      1184146
       148 consistent changes                                        539
       159 physical write bytes                                   712704
       258 redo size                                           270443980
      1089 execute count                                          176163

1162 rows selected.


By relaxing our sync strategy, now we have

  • the process took less than 1 minute
  • consumed around 270M of redo
  • the segment size totalled around 24M

Hence if your document loading apparatus is high volume OLTP style, you might want to carefully consider the impact of saying to your customers “Hey, we’ll keep that index updated in real time”.  If it truly is high volume OLTP style…it is unlikely they would even notice a more streamlined approach with regular sync’s, where “regular” is defined either on a time interval or pending-document count threshold.  Conversely, if document loading is rare, then an on-commit strategy combined with an occasional optimize and/or rebuild might be perfectly adequate.

Advertisements

One thought on “How important is "up to date" to you ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s