How important is "up to date" to you ?

I wrote a post a while back showing how one of the options with an Oracle Text index is to keep that index perfectly updated with its underlying table.  (Long termers with Oracle may know that this is a relatively recent addition to Text indexes, which historically were updated with calls to CTXSYS.CTX_DDL.SYNC_INDEX, either on demand or via the scheduler).

I also said that there is “no such thing as a free lunch”, as refreshing that index for every single commit introduces overheads.  So let’s explore that a little more thoroughly

First I sourced a file of hundreds of thousands of random english sentences, and exposed them to Oracle via an external table


SQL> DROP TABLE sentences;

Table dropped.

SQL>
SQL> CREATE TABLE sentences (
  2    line  VARCHAR2(4000)
  3  )
  4  ORGANIZATION EXTERNAL
  5  (
  6    TYPE ORACLE_LOADER
  7    DEFAULT DIRECTORY temp
  8    ACCESS PARAMETERS
  9    (
 10      RECORDS DELIMITED BY NEWLINE
 11      BADFILE 'sentences.bad'
 12      LOGFILE 'sentences.log'
 13      FIELDS TERMINATED BY '@'
 14      MISSING FIELD VALUES ARE NULL
 15      (
 16        line  CHAR(4000)
 17      )
 18    )
 19    LOCATION ('english.dat')
 20  )
 21  REJECT LIMIT UNLIMITED
 22  /

Table created.

SQL> SELECT * FROM sentences where rownum <= 10;

LINE
-------------------------------------------------------------
Let's try something.
I have to go to sleep.
Today is June 18th and it is Muiriel's birthday!
Muiriel is 20 now.
The password is "Muiriel".
I will be back soon.
I'm at a loss for words.
This is never going to end.
I just don't know what to say.
That was an evil bunny.

10 rows selected.

And now we will load 50,000 of those sentences into a Text-indexed table, one row at a time, committing (and syncing) for every row inserted.


SQL> create table t1 (
  2  x int,
  3  y varchar2(3000) ) tablespace demo
  4  ;

Table created.

SQL>
SQL>
SQL> begin
  2  ctx_ddl.create_preference('demo_text_idx', 'BASIC_STORAGE');
  3  ctx_ddl.set_attribute('demo_text_idx', 'I_TABLE_CLAUSE', 'tablespace demo');
  4  ctx_ddl.set_attribute('demo_text_idx', 'K_TABLE_CLAUSE', 'tablespace demo');
  5  ctx_ddl.set_attribute('demo_text_idx', 'R_TABLE_CLAUSE', 'tablespace demo');
  6  ctx_ddl.set_attribute('demo_text_idx', 'N_TABLE_CLAUSE', 'tablespace demo');
  7  ctx_ddl.set_attribute('demo_text_idx', 'I_INDEX_CLAUSE', 'tablespace demo');
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> create index t1_iz on T1  ( y )
  2  indextype is ctxsys.context
  3  parameters ('sync (on commit) storage demo_text_idx');

Index created.

SQL> conn mcdonac/******
Connected.

SQL> set timing on
SQL>
SQL> begin
  2  for i in ( select rownum r, line from sentences where rownum <= 50000 )
  3  loop
  4    insert into t1 values (i.r, i.line );
  5    commit;
  6  end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:14:34.98
SQL> @stat

[some stats omitted]

 SID_STAT# NAME                                                    VALUE
---------- -------------------------------------------------- ----------
         6 user commits                                           250000
         9 recursive calls                                       4351635
        14 session logical reads                                 7248007
        19 CPU used by this session                                35831
        55 physical read total bytes                           410017792
       128 db block gets                                         2387570
       132 consistent gets                                       4860437
       146 physical read bytes                                 407281664
       147 db block changes                                      3112305
       148 consistent changes                                      50041
       159 physical write bytes                                   712704
       258 redo size                                           666954796
      1089 execute count                                         2000136

1162 rows selected.

SQL> select segment_name, bytes from dba_segments
  2  where tablespace_name = 'DEMO'
  3  order by 1;

SEGMENT_NAME                                  BYTES
---------------------------------------- ----------
DR$T1_IZ$I                                 14680064
DR$T1_IZ$R                                    65536
DR$T1_IZ$X                                 10485760
DRC$T1_IZ$R                                   65536
SYS_IL0000162067C00006$$                      65536
SYS_IL0000162072C00002$$                      65536
SYS_IOT_TOP_162070                          2097152
SYS_IOT_TOP_162076                            65536
SYS_LOB0000162067C00006$$                    131072
SYS_LOB0000162072C00002$$                   2293760
T1                                          3145728

11 rows selected.

So we can see that

  • the process took a little over 14 minutes
  • consumed around 660M of redo
  • the segment size totalled around 31M

Let’s now repeat the exercise with a more relaxed strategy on index freshness.  We will still commit every row inserted, but only sync the index every 100 rows.



SQL> drop table t1 purge;

Table dropped.

SQL>
SQL> create table t1 (
  2  x int,
  3  y varchar2(3000) ) tablespace demo
  4  ;

Table created.

SQL>
SQL>
SQL> create index t1_iz on T1  ( y )
  2  indextype is ctxsys.context
  3  parameters ('storage demo_text_idx');

Index created.

SQL>
SQL> conn mcdonac/******
Connected.

SQL> begin
  2  for i in ( select rownum r, line from sentences where rownum <= 50000 )
  3  loop
  4    insert into t1 values (i.r, i.line );
  5    commit;
  6
  7    if mod(i.r,100) = 0 then
  8      ctxsys.ctx_ddl.sync_index('T1_IZ');
  9    end if;
 10  end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:38.45
SQL> select segment_name, bytes from dba_segments
  2  where tablespace_name = 'DEMO'
  3  order by 1;

SEGMENT_NAME                        BYTES
------------------------------ ----------
DR$T1_IZ$I                       10485760
DR$T1_IZ$R                          65536
DR$T1_IZ$X                        7340032
DRC$T1_IZ$R                         65536
SYS_IL0000162255C00006$$            65536
SYS_IL0000162260C00002$$            65536
SYS_IOT_TOP_162258                2097152
SYS_IOT_TOP_162264                  65536
SYS_LOB0000162255C00006$$          131072
SYS_LOB0000162260C00002$$         2293760
T1                                3145728

11 rows selected.

Elapsed: 00:00:00.02
SQL>
SQL> @stat

 SID_STAT# NAME                                                    VALUE
---------- -------------------------------------------------- ----------
         6 user commits                                            52000
         9 recursive calls                                        349084
        14 session logical reads                                 1352820
        19 CPU used by this session                                 2719
        55 physical read total bytes                             6709248
       128 db block gets                                          940374
       132 consistent gets                                        412446
       146 physical read bytes                                   4087808
       147 db block changes                                      1184146
       148 consistent changes                                        539
       159 physical write bytes                                   712704
       258 redo size                                           270443980
      1089 execute count                                          176163

1162 rows selected.


By relaxing our sync strategy, now we have

  • the process took less than 1 minute
  • consumed around 270M of redo
  • the segment size totalled around 24M

Hence if your document loading apparatus is high volume OLTP style, you might want to carefully consider the impact of saying to your customers “Hey, we’ll keep that index updated in real time”.  If it truly is high volume OLTP style…it is unlikely they would even notice a more streamlined approach with regular sync’s, where “regular” is defined either on a time interval or pending-document count threshold.  Conversely, if document loading is rare, then an on-commit strategy combined with an occasional optimize and/or rebuild might be perfectly adequate.

OOW16 – The Park Central

Some of my sessions this year at OpenWorld are at the Park Central.

So…you’ve arrived at the Moscone Center, and you’re wondering “How do I get to the Park Central ?  After all…that’s where all of the hot talks will be” Smile

Here’s my simple guide:

1) From Howard St, head up the stairs next to the Moscone Center entrance

image

2) Walk down through Yerba Buena Gardens… You can’t miss them, they are a great grassy area to grab a rest during the conference

image

2) Turn right at the St Patrick’s church, and walk down to the 3rd street. Turn left at 3rd and just a couple of minutes away is the Park Central

image

Or if your prefer an overhead view, here’s the route

image

 

See you there !!!

A bumper OpenWorld 2016 for me…

My first OpenWorld was in the Australia in ..well… I’m not sure when Smile but it might have even been the late 1990’s.  Time flies.

But 2016 will be my first OpenWorld as an Oracle employee…and hence, I’ll be busy Smile

Hopefully you can come along to some or all of the sessions I’m involved in…or you can probably catch me during the week at the OTN lounge.

I’ll blog more shortly on how I can think you can have an awesome OpenWorld….because I know will !

 

image

Syntax formatter might change your data

I saw this on an AskTom question today answered by my colleague Chris.  Check out this simple example


SQL> create table T (
  2    x int default 1,
  3    y int default 1
  4   ,z int);

Table created.

It looks like I’ve assigned a default of “1” to both X and Y. But lets now dump out the default definition from the dictionary.


SQL>
SQL> set serverout on
SQL> declare
  2    longcol varchar2(200);
  3  begin
  4    for i in (
  5      select column_name, data_default
  6      from   user_tab_cols
  7      where  table_name = 'T'
  8      )
  9    loop
 10      longcol := i.data_Default;
 11      dbms_output.put_line(i.column_name||'-'||length(longcol));
 12    end loop;
 13  end;
 14  /
X-1
Y-3
Z-

Notice the length of the default for Y. If we output those 3 characters (not shown here) we get “1” (the default), then a line feed, then a space.

So where you put the commas makes a difference !  Of course, ultimately (because its a numeric) the assigned value will not change.

Uber’s move to MySQL

I was reading a very interesting article on Uber’s move from Postgres to MySQL. I really like it when IT professionals and/or companies take the time to explain their technology decisions.  It’s a brave thing to do, because it’s easy for people to jump on the bashing bandwagon (“Ha ha … Company X chose Y and now they’re bust” etc etc).  It’s the same reason you rarely see detailed customer reference information about the technology they are using, or how they are succeeding or failing.  It’s generally kept pretty quiet.  So for Uber engineering to be open about it is impressive, and a lesson for us all.

Not being as familiar with either Postgres or MySQL, one statement really caught my attention (colour emphasis mine):

“if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes

That blew me away.  Funnily enough, as an Oracle DBA, I’ve always had that reluctance on creating indexes on tables.  I find it’s always a good thing to ask for justification – the aim always being to have “just the right amount” of indexes.  What is that right amount ?  Enough to meet the needs of the application and the customers, and not a single one more Smile 

It strikes me as that philosophy being really really important in a Postgres environment, because if the Uber statement is true, then indexes have the potential to have an enormous overhead on any transactional system.

That got me thinking – how does Oracle handle index updates?.  I was pretty sure we only update indexes where the relevant columns are modified, but how could we construct a demo to show that.  Here’s one such way.  We’ll create a table with some indexes, and put one of those indexes into a read only tablespace.  Updates to the indexes in normal tablespaces should go ahead without error, whilst those in the read only tablespace would be expected to return an error.


SQL> create table t ( pkcol int default seq.nextval, name varchar2(30), birthdate date ) tablespace USERS;

Table created.

SQL> insert into t (name,birthdate)
  2  select object_name, created
  3  from dba_objects;

96744 rows created.

SQL> create index ix1 on t ( pkcol ) tablespace USERS;

Index created.

SQL> create index ix2 on t ( name ) tablespace DEMO;

Index created.

SQL> create index ix3 on t ( birthdate ) tablespace USERS;

Index created.

SQL>
SQL> alter tablespace DEMO read only;

Tablespace altered.

SQL>
SQL> update t
  2  set birthdate = birthdate + 10
  3  where pkcol between 100 and 200;

101 rows updated.

SQL>
SQL> insert into t (birthdate)
  2  values (sysdate);

1 row created.

SQL>
SQL>
SQL> insert into t (name,birthdate)
  2  values ('Connor',sysdate);
insert into t (name,birthdate)
          *
ERROR at line 1:
ORA-00372: file 9 cannot be modified at this time
ORA-01110: data file 9: 'C:\ORACLE\ORADATA\NP12\DEMO.DBF'


SQL>
SQL>
SQL>

As we can see, the Oracle database does not act in the same way as Uber has reported that Postgres does.  We handle index updates as efficiently as we can.

Little enhancements are always nice

Before LATERAL and CROSS APPLY were added (exposed to us) in 12c, a common technique to do correlated joins was using the TABLE/MULTISET technique.

For example, we might have had a couple of tables:


SQL> create table t as
  2  select object_type, min(created) min_dte, max(created) max_dte
  3  from dba_objects
  4  where owner = 'SCOTT'
  5  group by object_type;

Table created.

SQL> select * from t;

OBJECT_TYPE         MIN_DTE   MAX_DTE
------------------- --------- ---------
INDEX               09-OCT-13 09-OCT-13
TABLE               09-OCT-13 09-OCT-13

SQL>
SQL> create table t1 as
  2  select object_type, object_name, object_id
  3  from dba_objects
  4  where owner = 'SCOTT';

Table created.

SQL> select * from t1;

OBJECT_TYPE         OBJECT_NAME                               OBJECT_ID
------------------- ---------------------------------------- ----------
INDEX               PK_DEPT                                       86892
TABLE               DEPT                                          86891
TABLE               EMP                                           86893
INDEX               PK_EMP                                        86894
TABLE               BONUS                                         86895
TABLE               SALGRADE                                      86896

6 rows selected.

and then tried to join them in what seems to be a logical fashion, but got the following error


SQL> select *
  2  from   t,
  3         ( select object_id
  4           from   t1
  5           where  object_type = t.object_type );
         where  object_type = t.object_type )
                              *
ERROR at line 5:
ORA-00904: "T"."OBJECT_TYPE": invalid identifier

Now this query could be trivially recast as a normal join, but there are examples where there is not the case, but we’re keeping this one as simple as possible.

We could workaround this by using TABLE and MULTICAST,


SQL> select *
  2  from t,
  3       table(cast(multiset(
  4                    select object_id
  5                    from   t1
  6                    where  object_type = t.object_type
  7                   ) as sys.odcinumberlist));

OBJECT_TYPE         MIN_DTE   MAX_DTE   COLUMN_VALUE
------------------- --------- --------- ------------
INDEX               09-OCT-13 09-OCT-13        86892
TABLE               09-OCT-13 09-OCT-13        86891
TABLE               09-OCT-13 09-OCT-13        86893
INDEX               09-OCT-13 09-OCT-13        86894
TABLE               09-OCT-13 09-OCT-13        86895
TABLE               09-OCT-13 09-OCT-13        86896

6 rows selected.

which is a little “trick”, in effect being

  • Take our child rows
  • Collapse them into a single complex type (cast-multiset) to “avoid” this being seen as correlated join
  • Use the TABLE operator to expand that type back out into rows.

But what if our query within the MULTISET is several “layers” deep. Then we might run into problems – I get this on 11g


SQL> select *
  2  from t,
  3       table(cast(multiset(
  4                    select object_id
  5                        from (
  6                           select distinct object_id
  7                           from   t1
  8                           where  object_type = t.object_type
  9                         )
 10                   ) as sys.odcinumberlist))
 11  order by 1,4;
                         where  object_type = t.object_type
                                              *
ERROR at line 8:
ORA-00904: "T"."OBJECT_TYPE": invalid identifier

Because the reference on line 7 to “T.OBJECT_TYPE” is not a single layer from the originating table T, this query will not run in 11g.

But take a look what happens in 12c


SQL> select *
  2  from t,
  3       table(cast(multiset(
  4                    select object_id
  5                        from (
  6                           select distinct object_id
  7                           from   t1
  8                           where  object_type = t.object_type
  9                         )
 10                   ) as sys.odcinumberlist))
 11  order by 1,4;

OBJECT_TYPE             MIN_DTE   MAX_DTE   COLUMN_VALUE
----------------------- --------- --------- ------------
INDEX                   20-NOV-15 16-DEC-15        92614
INDEX                   20-NOV-15 16-DEC-15        92615
INDEX                   20-NOV-15 16-DEC-15        99130
PROCEDURE               31-MAY-16 31-MAY-16       148999
TABLE                   20-NOV-15 31-MAY-16        92607
TABLE                   20-NOV-15 31-MAY-16        92608
TABLE                   20-NOV-15 31-MAY-16        92609
TABLE                   20-NOV-15 31-MAY-16        92610
TABLE                   20-NOV-15 31-MAY-16        92616
TABLE                   20-NOV-15 31-MAY-16        99128
TABLE                   20-NOV-15 31-MAY-16       149002
TRIGGER                 27-NOV-15 27-NOV-15        98228
TRIGGER                 27-NOV-15 27-NOV-15        98229
TRIGGER                 27-NOV-15 27-NOV-15        98230

14 rows selected.

No dramas at all. I always like these little touches that come along without too much fanfare, but make a developer’s life easier.

      OpenWorld 2016 is not far away

      OpenWorld is just around the corner, and the Ask Tom team will be involved in a number of panels where you can chat to us, ask questions, debate topics and basically have a relaxed 45mins during all the frenzied activity that is OpenWorld.  So if you’ve got any questions you would like answered “face to face”, rather than via Ask Tom, either drop them as a comment here, or feel free to post them on the AskTom site and just annotate it in some way that lets us know you’d like to talk about it on the panels during OpenWorld.

      See in in San Francisco !

      image