Tuesday, August 14, 2012

Extract Abends With Bounded Recovery Errors in "ggserr.log"

Extract Abends With Bounded Recovery Errors in "ggserr.log"


Affected Versions: Oracle GoldenGate - Version 11.1.1.0.6 and later


Issues:



When running Oracle Golden Gate 11.1.1.0.6 or higher,  extract is "abending" every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.

ERROR

2012-08-14 15:15:36  WARNING OGG-01573  Oracle GoldenGate Capture for Oracle, eaatt.prm:  br_validate_bcp: failed in call to: ggcrc64valid.
2012-08-14 15:15:36  WARNING OGG-01573  Oracle GoldenGate Capture for Oracle, eaatt.prm:  br_validate_bcp: failed in call to: br_validate_bcp.
2012-08-14 15:15:36  INFO    OGG-01639  Oracle GoldenGate Capture for Oracle, eaatt.prm:  BOUNDED RECOVERY: ACTIVE: for object pool 1: p21900_Redo Thread 1.


Cause
There may be a problem with the Bounded Recovery Checkpoint file or Network Timeout to the file. 


Solution:

If the problem is with Bounded Recovery Checkpoint file, then restarting the extract by resetting Bounded Recovery will fix the issue, like:

GGSCI> start extract <extract> BRRESET

Tuesday, May 1, 2012

ALL_CONSTRAINTS on TOP by GOLDENGATE user


POSSIBLE SLOW QUERY ON ALL_CONSTRAINTS VIEW RUN BY REPLICAT


GoldenGate software version:   
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 11 on Sept 13 2010 20:12:09
  
Operating System Version:
RedHat 
Version #1 SMP Tue Mar 28 11:22:12 PST 2010, Release 2.6.18-194.el5
Machine: x86_64
  
Database Version:
Oracle Database 11g Release 11.2.0.2.0 - 64bit Production
  
There can be an incorrect query which might cause problems but presently now real problem can be seen:
 
select B.owner || '.' || B.table_name,        
       A.owner || '.' || A.table_name,        
       A.owner || '.' || A.constraint_name, 
       B.owner || '.' || B.constraint_name, 
       B.deferred  
from   all_constraints A, 
       all_constraints B 
where  B.owner = '<OWNER>' 
and    B.table_name = '<TABLE_NAME>' 
and    A.owner = B.owner 
and    A.constraint_name = B.r_constraint_name 
and    B.constraint_type = 'R'

Join condition between A.owner = B.owner seems to be wrong.
 
Condition "and a1.owner = a2.r_owner"  must be used instead. (like it is already used - "a1.constraint_name = a2.r_constraint_name" ). 

Because owner primary constraint and foreign constraint owner can be different.
  
When owner of the primary constraint is different than the owner of foreign constraint and replicat is in BATCHSQL mode, replicat will fail to order the batches based on foreign key dependency. This might lead to batch with foreign keys being processed before the batch with primary keys, and result into foreign key constraint vioation error.

FIX/WORKAROUND:
Run replicat in non-BATCHSQL mode by commenting out BATCHSQL parameter in replicat parameter file.

Tuesday, April 3, 2012

NFS and Oracle GoldenGate


NFS and Oracle GoldenGate
Oracle GoldenGate Best Practice: NFS Mount for use with Oracle GoldenGate


GoldenGate - Version: 10 or later


The purpose of this document is to setup the file system mount options to be used when using GoldenGate to run with NFS mounted file system.

Unless IO buffering is OFF, then NFS mounts should not be used when running any Oracle GoldenGate processes. The danger occurs when one process registers the end of a trail file or transaction log and moves on to the next in sequence yet after this event data in the NFS IO buffer gets flushed to disk. The net result is skipped data and this cannot be compensated for with Oracle GoldenGate parameter EOFDELAY.

When using an NFS mounted file system with Oracle GoldenGate files, the setting for file system caching or buffered IO must be disabled on both NFS client and server.


The important factor to consider when configuring Oracle GoldenGate processes to run on NFS mounted file system is to make sure that buffered IO (data and attribute caching) is always set to OFF on both NFS client and server.

Oracle GoldenGate Best Practice: NFS Mount options for use with GoldenGate



Oracle GoldenGate files includes Binaries, Checkpoint files, Parameter files, Report files and Trail Data files 



NFS Client Operating SystemNFS Client Mount options for Oracle GoldenGate Files
SUN Solaris *rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,forcedirectio, vers=3,suid
AIX (5L) *cio,rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,vers=3,timeo=600
HPUX 11.23 *rw,bg,vers=3,proto=tcp,noac,forcedirectio,hard,nointr,timeo=600,rsize=32768,wsize=32768,suid
Linux (x86-32/x86-64/Itanium) *rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,vers=3,timeo=600


*Although data caching or buffered IO is set to OFF on the NFS client system, sometimes for other specialized file system such as Veritas File System (VxFS) or with NAS device/server that supports additonal caching feature such as FlexCache system on NetApp this will not take into effect unless you explicitly disable this function on the server side. For VxFS, this can be done by setting MINCACHE to DIRECT or UNBUFFERED and for the NetApp the FlexCache system must not be used at all with Oracle GoldenGate processes.





NFS Server Operating SystemAdditional Mount option on NFS server local disk **
SUN Solarisforecedirectio
AIX 5Lcio
HPUX 11.23no_fs_async
Linux (x86-32/x86-64/Itanium)sync
NetApp (Data OnTap)Optional FlexCache system must be disabled



** This option is in addition to the regular local file system mount options used to mount the local disk to be used by the NFS client system where Oracle GoldenGate files will be used. This setting will foreced the IO behavior setting on the file system to be synchronous "sync". Asynchronous IO behavior setting on the file system is not recommended for Oracle GoldenGate files and must be turned off at all times.

(Referenced from metalink)


Saturday, March 24, 2012

Oracle GoldenGate : Numcnv_getnumeric - Buffer Overflow When Using Filter "Where" in a Table Statement.

Numcnv_getnumeric - Buffer Overflow When Using Filter "Where" in a Table Statement.

Oracle GoldenGate - Version: 11.1.1 or later


Problem Statement:

Oracle GoldenGate (OGG) extract abends with the following error when using Filter on a numeric column


ERROR OGG-01072 NUMCNV_getNumeric(const char *, size_t, int64_t *, short *, BOOL): Buffer overflow, needed: 21, allocated: 19

The issue happens when using a filter clause against a numeric column if the column length is more than 19 digits. For example:


Extract param file:


EXTRACT ea021
EXTTRAIL ./dirdat/mm
userid ogg, password XXXX
TABLE OGGUSER.LONGTEST, FILTER (ROLLCALL > 0) ;



CREATE TABLE STATEMENT:

CREATE TABLE "LONGTEST"
( "ROLLCALL" NUMBER(31,0),
"NAME" VARCHAR2(43),
PRIMARY KEY ("ROLLCALL")
);



SAMPLE TABLE STATEMENT:

insert into OGGUSER.LONGTEST values (12345678909999999990,'Fredie');
commit;



The extract abend with the error

2012-03-24 18:06:53 ERROR OGG-01072 Oracle GoldenGate Capture for Oracle, ea021.prm: NUMCNV_getNumeric: Buffer overflow, needed: 112, allocated: 19.



Avoiding the error:
If the filter was removed, the extract would capture the record without any problems.
If a record is inserted with column length 18 or less, the filter will work fine.


REASON:

Numeric conversion code throws the above error when an input string that is greater than 19 digits. Here, scale is 0 and the value passed in has a length of 20. Hence digits will be 20. int64_t does not support such big numbers. This occurs when trying to evaluate the filter.
Without the filter, the function is not called to convert to numeric.



SOLUTION:

The solution will be implemented in OGG v12.1.

The workaround is to avoid using the filter for a column which might have more than 19 digits in its input string.

BUG REFERENCE (From metalink):

BUG:13087646 - EXTRACT ABEND WITH NUMCNV_GETNUMERIC WHEN USING FILTER

Wednesday, March 21, 2012

Conflict detection in an Active-Active environment

****************************************
Conflict detection in an Active-Active environment
****************************************

As Goldengate operates independently of other processes, in an active-active environment. Conflicts can occur when changes are made to the identical data on different databases at the same time. These can occur when the timing of simultaneous changes results in one of the out of sync conditioins:

# A replicated insert attempts to add a row that already exists in the target.
# The before image of a replicated update does not match the current row in the target.
# A replicated delete attempts to remove a row that doesnot exist on the target.

An instance, when UserA on dbA updates a row, and UserB on dbB updates the same row. If userB's transaction occurs before userA's transactions is synchronized to Database B, there will be a confict on the replicated transaction.

In these scenarious, Goldengate doesn't provide default procedures for handling the conflicts and this can be handled with custom conflict resolution scripts that needs to be included in the parameter files.

Here is a small demonstration in an online retail environment:

* An online shopping store is stocked with 20 limited ipods. If the quantity goes to 0, then this item is tagged to be out of stock on the online system.

* Customer A places an order for 15 ipods that came from XYZ database:
sql would be:
"update products set quantity_in_stock = quantity_in_stock -15 where product_name = 'ipod' and quantity_in_stock = 20;"

* Customer B simultaneiously places an order for 4 ipods which came from ABC database.

sql would be:
"update products set quantity_in_stock = quantity_in_stock - 4 where products_name = 'ipod' and quantity_in_stock = 20;"


In the above example, the number of ipods in stock according to the XYZ database is 5 and the number of ipods in stock according to ABC database is 16. When Goldengate attempts to update the ABC database with the transaction from customer A, the transaction will fail because the before image of quantity_in_stock is 20 which does not match the current figure value which is 16. Reversely the transaction replicated to the ABC database will also fail because the before image of quantity_in_stock 20 does not match the current value of 5.

So, unless a conflict resolution script is not written, Goldengate doesnot veirfy target row data before replicating the changes. And this conflict resolution can be handled by writing a sql stored procedure or a query using the SQLEXEC parameter in the replicat parameter file on both the databases.

Wednesday, February 8, 2012

GGS ERROR 190 Unexpected error fetching from the table - 1

Oracle GoldenGate - Version: 10.4.x and later


Error:
Oracle GoldenGate (OGG) Extract abends with the following error.


GGS ERROR 190 Unexpected error fetching from table - X




This fetching issue is related to specifying fetchcols for a table (or more) when dynamic table resolution is used. The fetch syntax is built for the first table and then not rebuilt so the same syntax is used for all tables that have a FETCHCOLS option. 

Fix:
Add WILDCARDRESOLVE IMMEDIATE parameter to extract


This parameter causes table resolution during start-up processing and the fetch syntax will be created for each table and not reused. This is the same as NODYNAMICRESOLUTION. 



Related Bugs (Ref from Metalink):

BUG:12432903  BUG:12582889 

Tuesday, January 17, 2012

Extract Slow until it is restarted


Oracle GoldenGate - Version: 11.1.1.1.0 and/or later releases. 




Problem:
An Oracle goldenGate (OGG) extract on RAC is slow from time to time, and restart will resolve the performance problem until next occurence.
OGG version is 11.1.1.1.0, and logretention is not registered.

From OEM History, this PL/SQL procedure was running for over 10 hours and lag was increasing. Some traces of parsing (and no execute) was also seen:

declare 
   extract_name varchar2(100) := :1; 
   min_scn number := :2; 
   capture_found number := 0; 
   alternate_capture_found number := 0; 
   ogg_capture_name varchar2(30); 
   alternate_ogg_capture_name varchar2(30); 
   queue_table_name varchar2(30); 
   queue_name varchar2(30); 
   chosen_capture_name varchar2(30); 
   logminer_session_number number := 0; 
BEGIN /* Construct the capture name and alternate capture 
name */ ogg_capture_name := SUBSTR('OGG$_' || extract_name, 1, 30); 
alternate_ogg_capture_name := SUBSTR('OGG2$_' || extract_name, 1, 30); /* 
Construct the queue table and queue name */ queue_table_name := 
SUBSTR('OGGQT$_' || extract_name, 1, 23); queue_name := SUBSTR('OGGQ$_' 
|| extract_name, 1, 23); /* If the queue doesn't exist, create it. We can 
use the same queue for * both the capture and the alternate capture. 
Note that this function * is idempotent, so it can be called even if the 
queue or queuetable exists */ 
dbms_streams_adm.set_up_queue(queue_table= 
>queue_table_name, queue_name=>queue_name); 
/* See if a dummy capture already exists for the given extract name. * 
We assume that OGG extract names do not have the same canonicalization * 
principles as the RDBMS, so using UPPER is sufficient. */ SELECT 
COUNT(*) INTO capture_found FROM DBA_CAPTURE WHERE capture_name= 
UPPER(ogg_capture_name); SELECT COUNT(*) INTO alternate_capture_found 
from DBA_CAPTURE WHERE capture_name=UPPER(alternate_ogg_capture_name); 
/* Assert that two captures don't exist at the same time */ IF 
capture_found > 0 AND alternate_capture_found > 0 THEN 
RAISE_APPLICATION_ERROR(-20001, 'Multiple 
captures ' || ogg_capture_name || ' and ' || 
alternate_ogg_capture_name || ' exist'); END IF; /* Create the new 
capture process */ IF capture_found > 0 THEN chosen_capture_name := 
alternate_ogg_capture_name; ELSE chosen_capture_name := 
ogg_capture_name; END IF; dbms_capture_adm.create_capture(queue_name => 
queue_name, capture_name => 
chosen_capture_name, first_scn=>min_scn); 
/* If insert_checkpoints is specified, need to alter capture and insert 
* checkpoints. */ /* Get the logminer session# from 
dba_logmnr_session */ SELECT ID INTO logminer_session_number FROM 
DBA_LOGMNR_SESSION WHERE name = UPPER(chosen_capture_name); /* 
insert two checkpoints */ INSERT INTO system.logmnr_restart_ckpt$ 
(session#, valid, ckpt_scn, xidusn, xidslt, xidsqn, session_num, 
serial_num, spare1) VALUES (logminer_session_number, 1, min_scn + 1, 
1, 1, 1, 1, 1, min_scn); INSERT INTO system.logmnr_restart_ckpt$ 
(session#, valid, ckpt_scn, xidusn, xidslt, xidsqn, session_num, 
serial_num, spare1) VALUES (logminer_session_number, 1, min_scn, 1, 
1, 1, 1, 1, 1); /* The applied scn of capture must be > min_scn, so 
alter capture */ UPDATE sys.streams$_capture_process SET spare2= 
min_scn+100 WHERE capture_name = UPPER(chosen_capture_name); /* 
Remove existing capture */ IF capture_found > 0 THEN 
dbms_capture_adm.drop_capture(ogg_capture_name); ELSIF 
alternate_capture_found > 0 THEN 
dbms_capture_adm.drop_capture(alternate_ogg_capture_name); 
END IF; 
END; 





DBA_CAPTURE view should be checked to see if logretention is added to the extract or not.


Cause
This is due to logretention new feature in version 11.1.1.1.
For Oracle RAC, database patch 11879974 is needed if logretention is needed.
As logretention is the default setting, the query will still be parsed even if the extract is not registered for logretention.  Without the database patch, performance will suffer.

To check if an extract is registered:
  select * from dba_capture;

Fix:
  Logretention can be disabled at the extract level. 

The parameter is:

TRANLOGOPTIONS LOGRETENTION DISABLED



1) check DBA_CAPTURE to confirm the logretention is added
2) Ignoring at the extrace level 

$./ggsci
ggsci> stop <extract name>
ggsci> dblogin USERID <username> , PASSWORD <password>
ggsci> unregister extract <extract name> LOGRETENTION
3) add parameter TRANLOGOPTIONS LOGRETENTION DISABLED