Tuesday, July 24, 2012

How to deal with HeuristicMixedException in WebSphere?


During an incident such as a database issue, applications deployed on WebSphere may sometimes get exceptions of type HeuristicMixedException. The meaning of this exception is defined by the JTA specification:

Thrown to indicate that a heuristic decision was made and that some relevant updates have been committed while others have been rolled back.

The XA Specification describes the concept of a "heuristic decision" as follows:

Some RMs [Resource Managers] may employ heuristic decision-making: an RM that has prepared to commit a transaction branch may decide to commit or roll back its work independently of the TM [Transaction Manager]. It could then unlock shared resources. This may leave them in an inconsistent state. When the TM ultimately directs an RM to complete the branch, the RM may respond that it has already done so. The RM reports whether it committed the branch, rolled it back, or completed it with mixed results (committed some work and rolled back other work).

This means that a transaction with a heuristic outcome may lead to data integrity problems because some resources have been rolled back while others have been committed, i.e. the transaction is no longer atomic. However, a HeuristicMixedException doesn't necessarily mean that this actually occurred, and in many cases, the transaction is actually rolled back successfully.

One interesting case where HeuristicMixedException exceptions are often seen in WebSphere is a distributed transaction where one of the participating resources is a SIBus messaging engine and that cannot be completed because of an issue that affects the message store.

It is important to know that a messaging engine typically doesn't persist messages immediately, but only when the transaction is committed. If there is a problem with the message store, then the transaction manager will get an exception from the SIBus resource adapter during the prepare phase. This will generate log messages of type J2CA0027E (An exception occurred while invoking prepare on an XA Resource Adapter) and WTRN0046E (An attempt by the transaction manager to call prepare on a transactional resource has resulted in an error. The error code was XAER_RMFAIL).

When the transaction manager gets the exception from the resource adapter, it will decide to rollback the transaction. However, it doesn't know whether the resource that produced the exception has actually completed the prepare phase or not. From the point of view of the transaction manager, it could be that the prepare phase completed successfully and that the exception was caused by a communication failure just afterwards. Therefore the transaction manager needs to query the resource manager to check the status of the transaction branch and to instruct it to roll back the prepared transaction if necessary. WebSphere will attempt that periodically until the resource manager is available again. Each unsuccessful attempt will result in a WTRN0049W (An attempt by the transaction manager to call rollback on a transactional resource has resulted in an XAER_RMFAIL error) message being logged. While WebSphere is attempting to complete the rollback, the transaction will also appear in the list of retry transactions in the admin console:



If the error is not transient, then completing the transaction may take a significant amount of time. For obvious reasons, WebSphere cannot simply block the application until the status of the transaction is resolved; at some point it has to return control to the application. The problem is that it cannot report the transaction as rolled back (by throwing a HeuristicRollbackException or a RollbackException) because from the point of view of the transaction manager, part of the transaction may have been prepared. Reporting the transaction as rolled back would be incorrect because it may cause the application to attempt to reexecute the transaction, although reexecuting a transaction that has been partially prepared is likely to fail.

WebSphere internally puts this kind of transaction into status 11, which is the numeric value for HEURISTIC_HAZARD (see this documentation):



The HEURISTIC_HAZARD status means that "The transaction branch may have been heuristically completed". Unfortunately, JTA defines no exception corresponding to HEURISTIC_HAZARD that could be thrown by the commit method in UserTransaction. Therefore WebSphere uses the closest match, which in this case is HeuristicMixedException.

7 comments:

  1. Hi Andreas,

    I am getting the similar kind of exception in my application. Could you please suggest me how to resolve this exception.

    Thanks,
    Rumit

    ReplyDelete
    Replies
    1. Rumit,

      You are saying that you are getting a "similar kind of exception" without any additional information. How do you expect anybody to help you if you provide basically zero information? Probably you got here by googling for "HeuristicMixedException", but did you actually read the article?

      Andreas

      Delete
    2. Sorry Andreas,
      By reading you article only I came to know that this is the similar kind of issue i am getting in my appication.
      My application is on server WAS 7.0, databse DB2 and i am trying to board some data into my application via tool upload utility tool which takes an excel file and board the data in bunch into the database.
      Now when i am tryin to put some large number of data like 10K data in 1 batch means in 1 go then the batch process is getting hung in around 7K only.At this ppoint of time I am getting exception as :
      "The error code was XAER_RMFAIL. The exception stack trace follows: javax.transaction.xa.XAException: CWSIC8007E"

      When i checked Admin Console there in Transaction Service i am getting 1 entry each in both Heuristic Transaction and Retry Transaction. In Heuristic Transaction Review one Id is there with same 11 as Heuristic Outcome and one Id is there in Retry Transaction with outcome as 9.

      I can send you the full stack trace also if you want to see the exception log.
      Please help me on this Andreas, as i am stuck in this from last 10 days and tried every possible option to resolve this, but nothing is going in my way.

      Delete
    3. You need to determine the reason for the XAER_RMFAIL. If the standard logs don't give you enough information, then enable the traces as described in the following document:

      http://www-01.ibm.com/support/docview.wss?uid=swg21153216

      Delete
    4. Thanks Andreas for your reply,

      The issue is with the PermanentStore File size which get created along with 2 other files Log File and TemporaryStore. These files get created when we create the Messaging Engine PermanentStore is reaching to its maximum limit. When i changed the maximum limit from 500MB(default value) to Unlimited size, issue get resolved. But to make the size to unlimited size is not the feasible option because this file keeps on increasing and even reached to 2-3GB. So i made some performance tuning in the WAS admin console as per the follwing link:
      http://publib.boulder.ibm.com/infocenter/wsdoc400/v6r0/index.jsp?topic=/com.ibm.websphere.iseries.doc/info/ae/si_tasks/tjn0026_.html

      maxConcurrency = 40
      maxDefaultThreaPool = 41
      JDBC-StatementCacheSize = 40

      But now my transactions are taking more time. Can you please suggest me to reduce my transaction time or is there anything else i can do to limit the PermanentStoreFile size

      Delete
    5. These are questions that are out of the scope of this blog post. You should work on them with IBM support or take them to developerWorks or Stackoverflow.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete