Performing a large correlated update

We had a question on AskTom recently asking for the best way to update a new column on a very large table.  Four in-place update options are at our disposal:

  • Serial update SQL
  • Parallel DML update SQL
  • DBMS_PARALLEL_EXECUTE
  • PL/SQL batching (as long as sensible restart-after-error logic is readily available)

all of which will achieve the required outcome but they have a significant redo and undo cost associated with them.  Also, depending on the size of the new column, such an update might create chaos with row migration, because whilst unlikely, there is definitely the potential for every row to grow beyond the available block space required to hold it in situ.

So a common approach to tackling a large “update” is to recast the problem to creating a new version (copy) of the table.  This can be done with a CREATE TABLE AS SELECT (CTAS).  Using this method we get the benefits of less resource consumption, but two issues can get in the way of this approach:

  1. It requires significant downtime whilst the new table is populated
  2. It requires careful planning to ensure all of the dependent objects (indexes, constraints, etc etc) are correctly recreated on the new table

But perhaps there is an alternative – it might not be 100% as fast as a pure CTAS, but what if it solves both the issues mentioned above ?  That alternative is DBMS_REDEFINITION

People often dismiss DBMS_REDEFINITION as purely for changing the structure of a table without changing the underlying data (for example, partitioning a table).  But they fail to take notice of the column mapping functionality that is available – and as long as our column mapping expressions are deterministic, then we can get more value out of the package, including a correlated update.

So here’s our example:

We will have a table called T1, which is a copy of DBA_OBJECTS and will be the table to which we will want a add a new column




SQL>
SQL> create table t1  as
  2  select owner, object_name, subobject_name,
  3       object_id, data_object_id, object_type,
  4       created, last_ddl_time, timestamp, status,
  5       temporary, generated, secondary
  6    from all_objects
  7    where object_id is not null;

Table created.

SQL>
SQL> alter table t1
  2    add constraint t1_pk
  3    primary key(object_id);

Table altered.

T2 is the table from which we will want to source the new data to apply to T1. In this case, T2 has a primary key of OBJECT_ID which can be used to source a new data element called FNAME. In our example, this is just owner and object name concatenated.


SQL>
SQL> create table t2 as select object_id, owner||'.'||object_name fname
  2  from all_objects
  3    where object_id is not null;

Table created.

SQL>
SQL> alter table t2
  2    add constraint t2_pk
  3    primary key(object_id);

Table altered.

So here is the requirement. T1 gets a new column called FULL_NAME. It must be populated via the FNAME column on T2 via the OBJECT_ID lookup. We can see that T1 starts with FULL_NAME being null, and we can see a sample of the values that should come from T2


SQL> alter table t1 add full_name varchar2(200);

Table altered.

SQL>
SQL> select object_id, owner, full_name from t1
  2  where rownum < 10;

 OBJECT_ID OWNER                          FULL_NAME
---------- ------------------------------ --------------------------------------------------
        30 SYS
        47 SYS
        32 SYS
        49 SYS
        17 SYS
         2 SYS
        29 SYS
        45 SYS
        10 SYS

9 rows selected.

SQL>
SQL> select object_id, fname from t2
  2  where rownum < 10;

 OBJECT_ID FNAME
---------- ---------------------------
        30 SYS.I_COBJ#
        47 SYS.I_USER2
        32 SYS.CCOL$
        49 SYS.I_COL2
        17 SYS.FILE$
         2 SYS.C_OBJ#
        29 SYS.C_COBJ#
        45 SYS.I_TS1
        10 SYS.C_USER#

We can’t do joins or subqueries in DBMS_REDEFINITION but we can do deterministic expressions. In our case, we’ll use a deterministic PL/SQL function to mimic the join


SQL>
SQL> create or replace
  2  function get_full(p_id int )  return varchar2 deterministic is
  3    f varchar2(100);
  4  begin
  5    select  fname
  6    into f
  7    from t2
  8    where object_id = p_id;
  9
 10    return f;
 11  end;
 12  /

Function created.

As per normal, we create an interim table to hold the converted data as we redefine the table.


SQL>
SQL> create table t_interim (
  2   owner                         varchar2(128),
  3   object_name                   varchar2(128),
  4   subobject_name                varchar2(128),
  5   object_id                     number,
  6   data_object_id                number,
  7   object_type                   varchar2(23),
  8   created                       date,
  9   last_ddl_time                 date,
 10   timestamp                     varchar2(19),
 11   status                        varchar2(7),
 12   temporary                     varchar2(1),
 13   generated                     varchar2(1),
 14   secondary                     varchar2(1),
 15   full_name                     varchar2(200)
 16  );

Table created.

And now we are ready to use DBMS_REDEFINITION. The critical part here is the COL_MAPPING parameter. We are using the call to GET_FULL to populate the new column that will ultimately end up on T1.


SQL>
SQL>
SQL> declare
  2      l_colmap varchar(512);
  3    begin
  4      l_colmap :=
  5            'OWNER
  6            ,OBJECT_NAME
  7            ,SUBOBJECT_NAME
  8            ,OBJECT_ID
  9            ,DATA_OBJECT_ID
 10            ,OBJECT_TYPE
 11            ,CREATED
 12            ,LAST_DDL_TIME
 13            ,TIMESTAMP
 14            ,STATUS
 15            ,TEMPORARY
 16            ,GENERATED
 17            ,SECONDARY
 18            ,get_full(OBJECT_ID) FULL_NAME';
 19
 20      dbms_redefinition.start_redef_table
 21      (  uname           => user,
 22         orig_table      => 'T1',
 23         int_table       => 'T_INTERIM',
 24         col_mapping   => l_colmap );
 25   end;
 26   /

PL/SQL procedure successfully completed.

And here is the nice part. By using DBMS_REDEFINITION, we dont need to worry about the dependencies – we can let COPY_TABLE_DEPENDENTS do the heavy lifting.


SQL>
SQL> variable nerrors number
SQL> begin
  2    dbms_redefinition.copy_table_dependents
  3      ( user, 'T1', 'T_INTERIM',
  4        copy_indexes => dbms_redefinition.cons_orig_params,
  5        num_errors => :nerrors );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL>
SQL> print nerrors

   NERRORS
----------
         0

SQL> begin
  2    dbms_redefinition.finish_redef_table
  3      ( user, 'T1', 'T_INTERIM' );
  4  end;
  5  /

PL/SQL procedure successfully completed.

And we’re done ! A correlated update using DBMS_REDEFINITION.


SQL>
SQL> select object_id, owner, full_name from t1
  2  where rownum < 10;

 OBJECT_ID OWNER                          FULL_NAME
---------- ------------------------------ --------------------------------------------------
        30 SYS                            SYS.I_COBJ#
        47 SYS                            SYS.I_USER2
        32 SYS                            SYS.CCOL$
        49 SYS                            SYS.I_COL2
        17 SYS                            SYS.FILE$
         2 SYS                            SYS.C_OBJ#
        29 SYS                            SYS.C_COBJ#
        45 SYS                            SYS.I_TS1
        10 SYS                            SYS.C_USER#

9 rows selected.

SQL>
SQL>

You might be thinking “But we’ll be slow because we’re calling that PL/SQL millions of times”.  Yes, this will be slower than a pure CTAS operation, but by using DBMS_REDEFINITION the process is online with nearly zero service disruption.  Hence the overall execution time is not nearly so critical anymore as a CTAS where applications are unavailable during the activity.

3 thoughts on “Performing a large correlated update

  1. Connor,

    I can’t help thinking that you ought to have made something of the fact that a function that queries the database shouldn’t be declared deterministic since someone can change the result of the function call by changing the content of the table.

    Somewhere you need a session to lock the table exclusively for the duration, or make it readonly.

    (P.S. I’m not saying I wouldn’t do this type of thing myself, but the principle needs a mention.)

  2. Indeed, and remiss of me not to mention it. I’ve “generic-ised” the post from the AskTom question – where the lookup table was static, so yes, definitely worth a mention.

    You could conceivable work around this with something like:

    create or replace
    function get_full(p_id int, p_scn int ) return varchar2 deterministic is
    f varchar2(100);
    begin
    select fname
    into f
    from t2 as of scn p_scn
    where object_id = p_id;

    return f;
    end;
    /

    and then the process could be:

    lock table t2 in exclusive mode;
    col x new_value y
    select dbms_flashback.get_system_change_number x from dual;
    commit;

    declare
    l_colmap varchar(512);
    begin
    l_colmap :=
    ‘OWNER
    ,OBJECT_NAME
    ,SUBOBJECT_NAME
    ,OBJECT_ID
    ,DATA_OBJECT_ID
    ,OBJECT_TYPE
    ,CREATED
    ,LAST_DDL_TIME
    ,TIMESTAMP
    ,STATUS
    ,TEMPORARY
    ,GENERATED
    ,SECONDARY
    ,get_full(OBJECT_ID,&&y) FULL_NAME’;

    dbms_redefinition.start_redef_table
    ( uname => user,
    orig_table => ‘T1’,
    int_table => ‘T_INTERIM’,
    col_mapping => l_colmap );
    end;
    /

    and whilst this appeared to work, I see this interesting artifact:

    SQL> select version_count
    2 from v$sqlarea
    3 where sql_text like ‘SELECT FNAME FROM T2 AS OF SCN :B2 WHERE OBJECT_ID = :B1%’;

    VERSION_COUNT
    ————-
    486

    so perhaps a more appropriate solution would be a private copy of the lookup table.

    Cheers,
    Connor

  3. Interestingly, in 12.2 they’ve added a procedure called execute_update in dbms_redefinition. You might think this would achieve what you’ve written here without the additional plsql call and deterministic function lie and it kind of does BUT it is really as offline as the CTAS method – it will just handle your constraints, indexes etc. All DML done to the the table during the execute_update call just get ignored, I can see this causing trouble.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s