Parsing … no big deal eh ?

Most of us have probably seen the standard demo when it comes to emphasizing the need for sharable SQL, aka, using bind variables where appropriate.  The demo traditionally compares two similar scripts, where one of them generates a lot of SQL statements with literals, and the other recasts the same script with bind variables for dramatic improvement.

Here’s a simple version I’ve whipped up:


SQL> create table t ( x int primary key) organization index;

Table created.

SQL> insert into t
  2  select rownum from dual
  3  connect by level <= 100000;

100000 rows created.

SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> declare
  2    v int;
  3  begin
  4    for i in 1 .. 100000 loop
  5      select x into v from t where x = i;
  6    end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.56

SQL> set timing on
SQL> declare
  2    v int;
  3  begin
  4    for i in 1 .. 100000 loop
  5      execute immediate 'select x from t where x = '||i into v;
  6    end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:04.92

So typically what follows is a “Woo Hoo! Look how awesome bind variables are, and look how crappy the literal SQL is” style of statement, with much hand waving and pointing in the direction of the elapsed time.

But lately, there has been people happy to provide a rebuttal, namely, “So what?”.  The argument runs like this:

In the “terrible” example, we are performing approximately 1600 statements per second.  Is that anywhere near as good as the binding case ?  No…but who cares.  There’s plenty of systems out there for which 16 statements per second would be fine, let alone 1600.  So is all the fuss about binding really justified?

Whilst such a rebuttal falls to pieces on other grounds – such as latch contention once we introduce multi-user scenarios, and probably more importantly in today’s security world – the risk of SQL injection, I’m not going to re-hash those.  I want to look at the claim of “Oh… the performance is good enough anyway”.

We seem to have forgotten, then when the first such “bind versus literals” scripts were being produced and published, it was a different era.  Personally, in the early 90’s, my Oracle database (version 7) was running on a Sparcstation 10, with it’s tremendously powerful and incredibly expensive, 50 Mhz processor !!!! Smile (For a trip down memory lane, see here https://en.wikipedia.org/wiki/SPARCstation_10 )

 

So in those days, the differential between binding and literals was massive, because processor cycles were incredibly valuable.  But we tend to forget, that just as processors have evolved, so too have our requirements to access data.  Yes, it’s true that a modern processor can probably easily handle those literal-based simple primary key lookup queries at a rate that meets our needs.  But that’s not always what a “modern” query looks like.  Nowadays, queries have evolved just like the processors – they can be much more complicated.  What happens to the parsing costs then ? So let’s bring some complexity into the fold, and re-run our test with some more complicated queries.

Obviously a complicated query might have a long execution time, so we’ll take that out of the equation by using DBMS_SQL to only parse (and not execute) the query.  We’ll use a couple of dictionary views for our query, which themselves comprise several dictionary views, so there’s plenty of complexity even though the query is just a simple join.


SQL> set timing on
SQL> declare
  2    c int;
  3  begin
  4   c := dbms_sql.open_cursor;
  5   for i in 1 .. 100000 loop
  6      dbms_sql.parse(c,
  7        q'{
  8        select o.object_name, o.last_ddl_time, sum(s.bytes)
  9        from   all_objects o,
 10               dba_segments s
 11        where  o.owner = s.owner
 12        and    o.object_name = s.segment_name
 13        and    o.object_type = 'TABLE'
 14        and    o.object_id = :1
 15        group by o.object_name, o.last_ddl_time
 16        }',
 17        dbms_sql.native );
 18    end loop;
 19    dbms_sql.close_cursor(c);
 20  end;
 21  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.14

SQL> declare
  2    c int;
  3  begin
  4   c := dbms_sql.open_cursor;
  5   for i in 1 .. 100000 loop
  6      dbms_sql.parse(c,
  7        q'{
  8        select o.object_name, o.last_ddl_time, sum(s.bytes)
  9        from   all_objects o,
 10               dba_segments s
 11        where  o.owner = s.owner
 12        and    o.object_name = s.segment_name
 13        and    o.object_type = 'TABLE'
 14        and    o.object_id = }'||i||
 15        q'{
 16        group by o.object_name, o.last_ddl_time
 17        }',
 18        dbms_sql.native );
 19    end loop;
 20    dbms_sql.close_cursor(c);
 21  end;
 22  /

PL/SQL procedure successfully completed.

Elapsed: 10:07:12.71


Yes…you are reading that right. Over 10 hours to get through all of that parsing load.

So just because the processors are bigger and smarter….don’t forget, for high volume calls, those literals are literally going to hurt your systems Smile

Advertisements

3 thoughts on “Parsing … no big deal eh ?

  1. There was a brief exchange on twitter about this. I referred to the Thick Database paradigm. I explain this notion in my “Why Use PL/SQL?” paper on the Oracle Database PL/SQL and EBR blog here:

    https://blogs.oracle.com/plsql-and-ebr/entry/why_use_pl_sql

    I note in the paper that the overwhelmingly common case is that an application’s requirement for issuing insert, update, delete, and select statements can be met by using PL/SQL’s static SQL. I also explain that when static SQL is used, all discussions of binding simply vanish from the PL/SQL programmer’s mental model. You use PL/SQL identifiers (denoting variables or subprogram formal parameters) at the spots where in dynamic SQL you ought to use a place holder — but where naive programmers are tempted (as Connor shows) to concatenate literal values.

    In other words, I claimed, that using the Thick Database paradigm neatly avoids all the evils that Connor describes in this blog.

    At least, I did my best to convey all this in just 140 characters!

    Then Stew Ashton responded to my tweet. He noted that, in the Thick Database paradigm, the only allowed SQL statement from client code to the database is the “call” statement to invoke a PL/SQL subprogram — and that, of course, actual values must be used to invoke these subprograms. In other words, one has the same possibility for naive error here too: the proper approach is to use placeholders in the “call” statement to invoke a subprogram and to bind actual values to these; and the naive programmer would concatenate literal values into the “call” statement’s SQL text. This is fair comment.

    I left unsaid the fact that choosing to use the Thick Database paradigm implies a good understanding of Oracle Database, and clear thinking about the design of the database’s PL/SQL API. Such a thinker, I claim, would never fall into the trap that snares naive programmers.

    Then Jason Bucata commented that the RPC mechanism that is used by an Oracle Forms client would be a nice way to avoid that trap. (This is the same scheme that you use to invoke a PL/SQL subprogram in a remote database, over a database link, from a PL/SQL subprogram in the local database.) Jason’s scheme implies, therefore, using client-side PL/SQL. Sadly, this is available only in Oracle Forms and its close cousins — and it isn’t, therefore, a generally viable idea.

    Bryn Llewellyn
    Product Manager for PL/SQL
    Oracle HQ

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s