Wednesday, December 7, 2011

Incompatible record in the trail file

Extract or Replicat abends with error message "Incompatible Record" or ERROR 514 or ERROR 509.


An "Incompatible record" is when an incomplete record is encountered in a trail of regular records. An Incomplete record would miss out the header (where it is actually suppose to start) and would start in the middle of the record. The record is considered corrupted.

It can result in an incompatible record. These could occur from bad transmission, bad data recording, TCP/IP errors or other hardware related errors.

A few cases when it can happen:

1) Extract is writing to the trail, and a portion around the begin and end of a record became corrupt. In this case, GoldenGate is missing a portion of the data.

2) Two Extracts over writing the same set of trail files by mistake.

3) An extract pump is writing to a trail file and crashes before it could update its checkpoint file and after the data has flushed out to the trail. When it is restarted

Some ways to avoid:
Oracle recommends disabling cache if AsyncIO is enabled.
Test your hardware and keep constant eye on any disk or fibre related errors.


Fixing and getting past this error:

There is no easy and right way way to recover from such errors. At the end, there is a chance you might end up loosing a transaction which you have to re-factor from source and make your data consistent.

If the trail is written by the pump, you could effectively stop the replicat on target and pump on source and alter the pump to resend all files from source from the point it failed on. It's not easy to compare and map out the transactions between the source and target because the file sizes will rarely match thus seqno and rba being different on source and target.
Here are the steps:

- Stop the extract pump
- Create a new trail sequence (to avoid confusion and keep the old one intact)
- Change the extract checkpoint after you locate the correct rba (make sure you backup the checkpoint file)
- alter the replicat to this new trail sequence
- start the extract data pump and replicat

This above, if done correctly, could avoid having to skip ANY transactions/records keeping the data consistent.


If for some reason the record is bad on source as well, then you can just alter the replicat to skip this error and move to the next record (This might be a legit record and you will end up missing it)

On an abended replicat change the following in it's paramter file and restart:
MAXTRANSOPS 1
GROUPTRANSOPS 1

After replicat abends again, use logdump to look at the last seqno and rba the replicat is at:
./logdump
open
ghdr on
detail on
pos
scanforheader



At this point note down the seqno and rba and alter the replicat with this new rba

alter replicat extseqno extrba

start replicat

You will likely have lost data probably just a record.

In most cases it is best to reconfigure the pump so send data through a different trails and re-apply the data (with HANDLECOLLISIONS if you couldn't find the exact rba).

Data integrity issues may exist when using any of these recovery techniques above.

2 comments:

  1. Thank you very much for the information.
    It is very helpful.
    Regards,
    ADC

    ReplyDelete
  2. Thank you very much. These suggestions were extremely helpful to me as I am trying to solve a production problem.
    Regards,
    ADC

    ReplyDelete