[ Team LiB ] |
3.9 Data LoadsDatabases frequently have multiple information providers as shown in Figure 3-9. Figure 3-9. Database Information Providers.OLTP providers usually perform single-row inserts or updates�these transactions are usually just a few rows. Data can also be loaded in from other systems. These are batch data loads and occur when there is less OLTP activity. Data loads typically fall into two categories. One type is a schema initialization load. This process brings data into the database for the first time and coincides with the application development. The load may be from a legacy system that is being converted to Oracle. These loads require data "scrubbing" (e.g., the legacy may require string conversions to load time/day fields into an Oracle DATE type). Constraints and indexes can be built after the data is verified and loaded. Other batch loads occur on a periodic base. A load can initiate from a user, an operating system-scheduled job, or possibly a PL/SQL procedure scheduled through the Oracle DBMS_JOB queue. SQL*Loader is an Oracle utility for loading fixed-format or delimited fields from an ASCII file into a database. It has a conventional and direct load option. The default option is conventional. The direct load method disables constraints before the data load and enables them afterward. This incurs some overhead. For large amounts of data, the direct method is much faster than the conventional method. Postload processing includes not just the enabling of constraints, but the rebuilding of indexes for primary key and unique constraints. If a direct load contains duplicates, the post process of enabling constraints and rebuilding of indexes fails. For a duplicate primary key or unique constraint, the failed state leaves the index in a "direct load" state. Log messages in the SQL*Loader file will highlight this type of failure with the following:
Following a direct load, you should check the SQL*Loader log file but also check the status of your indexes. A simple query for troubleshooting is the following:
If you have bogus data following a direct load, you need to remove all duplicates before you can enable constraints and rebuild the indexes. For a conventional SQL*Loader path, duplicate records are written to the SQL*Loader "bad" file with corresponding messages in the SQL*Loader "log" file. If no errors occur, then there is no "bad" file. SQL*Loader is a callable program and can be invoked in a client/server environment where the end user takes an action to load a file that is stored on the server. The mere existence of a bad file, following the load, will indicate errors during the load. You can use SQL*Loader as a callable program to implement daily loads using a conventional path. You can use this utility to load large files with millions of rows into a database with excellent results. Each SQL*Loader option (conventional and direct load) provides a mechanism to trap and resolve records that conflict with your primary key or any other constraint; however, direct load scenarios can be more time consuming. Alternatives to SQL*Loader are SQL*Plus scripts and PL/SQL. You can load the data with constraints on and capture failed records through exception handling. Bad records can be written to a file using the UTL_FILE package. Bad records can also be written to a temporary table that has no constraints. You also have the option to disable constraints, load data into a table, and then enable the constraint. If the data is bad you cannot enable the constraint. To resolve bad records, start with an EXCEPTIONS table. The exceptions table can have any name, but must have the following columns.
The SQL for this exceptions table is found in the ORACLE_HOME/RDBMS/ADMIN directory in the file utlecpt.sql. The RDBMS/ADMIN directory, under ORACLE_HOME, is the standard repository for many scripts including the SQL scripts to build the data dictionary catalog, scripts to compile the SYS packages, and scripts like the exceptions table. We use the exceptions table to capture rows that violate a constraint. This capturing is done as we attempt to enable our constraint. The following TEMP table is created with a primary key.
Insert some good data:
The following disables the constraint. This is done here prior to inserting new data.
Now we insert some data; in this example, this is one row that we know to be duplicate row.
The following shows the error when we enable the constraint with SQL*Plus.
What if we had started with a million rows in TEMP and loaded another million. The task of identifying the offending rows can be tedious. Use an exceptions table when enabling constraints. The exceptions table captures the ROW ID of all offending rows.
The constraints are still off. All records are in TEMP, but you can identify the bad records.
This works for all types of constraint violations. You may not be able to enable constraints after a load, but you can capture the ROWID and constraint type through an exceptions table. |
[ Team LiB ] |
No comments:
Post a Comment