Sunday, December 13, 2009

Zitat des Tages

Ich wünschte, wir hätten [...] weniger Finanzingenieure und dafür mehr echte Ingenieure.
Paul Volcker, Vorsitzender des Economic Recovery Advisory Board

Sunday, November 15, 2009

Exciting new feature in WebSphere Process Server 7

A couple of years ago, I've worked in a project where we used MQ Workflow, which is one of the ancestors of WebSphere Process Server. Some of the business processes that we implemented were long running, and by long I mean really long, that is up to several weeks. It happened to us that one of the processes that we deployed in production had a minor bug that had not been discovered during testing. If I remember well, it was just a missing data connector between two activities. Even if that was a minor bug, it completely corrupted the state of the process when that particular transition was triggered.
Based on the number of running process instances with the incorrect template, and the probability that the particular transition is triggered in the process, we were able to estimate the number of instances that would terminate abnormally. Fortunately the number (both the estimated and the actual) turned out to be small (around 10), so that the business impact was quite low. Nevertheless this was a very frustrating experience because you can only sit and wait until another process instance becomes corrupted. It was not possible to proactively fix the running instances because MQ Workflow didn't allow you to migrate running process instances to a new version of the process template (and recreating these instances using some custom built ad hoc tool was far too risky).
Ever since that incident, whenever I meet somebody who happens to be (or pretends to be) a specialist in BPM, I always ask the question of how to address this type of issue. I even asked that during a IBM training session on WebSphere Process Server. I never got a satisfactory answer. It was only when I worked for Accenture that I discovered that some smart guys in their labs had studied that issue and come up with a pattern to solve it. If I remember well, the pattern somehow suggested to implement a single business process using three different BPEL processes that would then interact together. Even if the overall process is long running, one of these BPELs would only be short running so that it could be replaced by a new version at any time. Obviously this type of pattern far from optimal since it is expensive to implement and tends to further increase the gap between the process designed by the business analyst and the BPEL executing this process.
Recently IBM announced the release of WPS 7 and the announcement mentions the following new feature: "Deliver migration of running processes to new process model versions". If this is really what I think it is (and if the IBM people are able to deliver what they promise, which of course they have always been ;-), then this will be a major step forward.

EJB and Web Services: getting the best of both worlds

If you have ever worked in a project where both EJBs and Web Services are used, it is very likely that you have gotten into discussions about whether a given component should be implemented as an EJB or a Web Service. You might also wonder what is the best way to bridge between these two technologies. For example, you might have been in a situation where you wanted to reuse an existing EJB in a BPEL process. In this post I will demonstrate how you can avoid these questions by making your service implementations independent of the protocol used to invoke them. While this type of protocol independence can also be achieved using SCA, in this post I will focus on EJB 3.0 and JAX-WS 2.1 because these standards are part of JEE 5 and are in wider use than SCA.
Before describing the pattern, it might be useful to explain why EJB is still relevant as an integration and remote invocation protocol:
  • EJB relies on Java serialization and binary protocols such as IIOP which are more efficient than SOAP/HTTP.
  • Propagation of the transaction and security context is built into EJB from the ground up.
  • Most EJB containers have support for load balancing and failover with a degree of reliability that is not as easy to achieve with Web Services.
It is also clear what are the major drawbacks of using EJBs in a Service Oriented Architecture:
  • Since EJBs are not (necessarily) described by a WSDL interface, it is not easy to reuse them, e.g. in a BPEL process.
  • EJB is specific to the Java platform and interoperability with other platforms (or e.g. XML appliances) is limited.
Obviously it is possible to expose a component both as a traditional EJB and a Web service, e.g. by wrapping the EJB in a Web service interface. This however doesn't achieve true protocol independence because switching between EJB remote invocation and SOAP is not transparent to the consumer of the service. It also requires additional effort when reusing an EJB that has not yet been wrapped as a Web service. What I will show in this post is that by leveraging the new features introduced in EJB 3.0 and by carefully designing the EJBs it is possible to provide a protocol independent client view with minimal effort and without making any concessions in terms of best practices in Web service design.

Starting with version 3.0, the EJB specification allows to expose stateless session beans as JAX-WS style Web services. More precisely, a stateless session bean now may have up to three different types of client views: remote, local and Web service. The basic idea of the pattern proposed here is to make the choice between EJB remote invocation and SOAP transparent to the client by using the same Java interface for the remote and Web service views. Taking into account best practices in Web service design, the procedure can be summarized as follows:
  1. Design the service contract using WSDL and XML schema.
  2. Use JAX-WS (wsimport) to generate a corresponding Java interface. Since this artifact will be used by the bean implementation and the (Java) clients, it is strongly recommended to make extensive use of JAX-WS and JAXB bindings to customize the code generation so that the end result is a convenient and easy to use API. It is also recommended to package the generated artifacts in a separate JAR that can be referenced by the bean implementation as well as the client.
  3. Create a stateless session bean implementing the interface and declare this interface as both the remote and Web service view of the bean. Note that the interface that the bean needs to implement is the one generated from the portType in the WSDL, i.e. the one annotated with @WebService.
While this looks simple and straightforward, there are however some additional points that need to be taken into account. The first is that in order to be a valid remote view, the interface must conform to RMI rules. The good news is that starting with EJB 3.0, the methods of a remote interface are no longer required to declare RemoteException. However, the restriction that all method arguments must be serializable is of course still applicable. This is not a fundamental issue since the classes generated by JAXB are simple POJOs that only refer to primitive types, serializable types such as String, Date, etc., collections and other generated POJOs. Therefore they can be made serializable by adding the Serializable interface. Fortunately, a simple JAXB customization (xjc:serializable) is sufficient to do this. It should also be noted that while it is allowed to use a single interface for the remote and Web service views, it is not possible to use the same interface as local and remote view. This problem is easy to overcome with the usual pattern of creating two interfaces that extend the interface generated by JAX-WS and declare them as local and remote respectively.
One should be aware that customizing the JAX-WS code generation is a task that requires effort that should not be underestimated, at least if one wants a clean Java interface. A complete discussion of the best practices in this area is out of the scope of this article, but I would like to draw attention to a feature of JAX-WS which is not very well known but which is important to make sure that JAX-WS generates a Java interface with convenient method signatures for operations using the document/literal style. This feature is called "Wrapper Style" and allows JAX-WS to unwrap the request object. A more detailed description of this feature as well as the criteria that the WSDL must meet can be found in section of the JAX-WS 2.1 specification.
A second point that I would like to mention is that when customizing the code generation, one has to choose between declaring the JAX-WS and JAXB bindings inline in the WSDL and the schemas or using a separate binding file. Very often it is argued that these bindings are specific to the implementation of the service provider and/or consumer and should therefore be separated from the WSDL. This is certainly true in cases where the service is always invoked using SOAP. On the other hand, if the service can also be invoked as an EJB, one can argue that since the JAX-WS and JAXB bindings together with the WSDL completely describe the EJB interface, they are an integral part of the service contract and should be added to the WSDL. From this point of view, the JAX-WS/JAXB bindings are similar to the wsdl:binding elements mapping the abstract interface to the SOAP/HTTP protocol. Note however that while the abstract interface (i.e. the portType) and the SOAP/HTTP bindings can be separated into two WSDL files, this is not possible for the JAX-WS/JAXB bindings.

Assuming that all necessary customizations have been done and that a local interface is not required, the declaration of the stateless session bean would look as follows:
import employee.EmployeeService;

public class EmployeeServiceBean implements EmployeeService {
Note that the methods of the bean don't need any special annotations since they are all present on the interface. Except for container specific procedures (e.g. running endptEnabler on WebSphere), no further action is required to expose the bean as a Web service. It can now be invoked as an EJB or a Web service, and if the client is implemented in Java, the same Java interface can be used for both invocation styles.
Let's look more closely at the latter aspect, i.e. the invocation from a Java client. If the client should invoke the service as an EJB, we can use the @EJB annotation to let the container inject a reference:
private EmployeeService employeeService;
On the other hand, if the client should use SOAP, then we can use the @WebServiceRef annotation. Note that this annotation can be used to inject either the interface annotated with @WebService (corresponding to the portType in the WSDL) or the interface annotated with @WebServiceClient (corresponding to the service element in the WSDL). Since the latter is not meaningful when invoking the service as an EJB, we use the first approach in order to achieve protocol independence:
private EmployeeService employeeService;
In this sample, EmployeeServiceClient is the @WebServiceClient annotated interface (A JAX-WS binding has been used to assign this class name). As you can see, switching between EJB and SOAP is just a matter of changing the annotation. Note that in a JEE 5 compliant container, @WebServiceRef can be used wherever @EJB is recognized, in particular in session beans and servlets. Of course these references can alternatively be declared in the deployment descriptor and looked up using JNDI.
Making a @WebServiceRef work properly is actually a bit more tricky than @EJB. Two conditions must be met:
  • The WSDL file must be available. Ideally it should be included (together with all dependency artifacts such as imported WSLDs and schemas) as a resource in the JAR that contains the JAX-WS generated artifacts.
  • The wsdlLocation attribute of the @WebServiceClient annotation must be set correctly. This must either be an absolute URL (if the WSDL is not included in the JAR) or specify the location relative to the root of the module (see section 4.2.2 of JSR109 v1.2). Since the @WebServiceClient annotated interface is part of the code generated by JAX-WS, this can only be done by correctly configuring wsimport, namely using the -wsdlLocation option (or the wsdlLocation configuration element when using jaxws-maven-plugin).
Another important point is that the endpoint URI used by the container to invoke the service defaults to the one specified in the soap:address element in the WSDL. Since the endpoint URI is environment specific, it needs to be overridden at deployment time. This step is container specific. E.g. in WebSphere 7, endpoint URIs can be changed in the settings of the EJB or Web module in the admin console.

To conclude the discussion we should also address the question of where this pattern should be used. It would certainly be wrong to use this pattern for all services and it would also be wrong to use it for all EJBs. It really depends on the granularity of the service. On one end of the spectrum, we have course grained composite services that may potentially be implemented using BPEL or as a mediation flow. Here the pattern is not meaningful and the services should be implemented as Web services. On the other end of the spectrum, we have EJBs representing very fine grained services or components that should not be invoked directly by consumers in a different business domain. These services are best implemented as pure EJBs. The pattern is actually the most useful in the middle of the spectrum, where we find services that in a traditional J2EE architecture would be designed following the session facade pattern. Implementing session facades using the pattern described here makes them highly reusable and allows you to get the best of both worlds. E.g. a Web application co-located with the EJBs may use the local interfaces for maximum efficiency, while the same service is reused in a business process by invoking it as a Web service through a partner link in BPEL.
Interestingly, applying the pattern systematically actually has the additional benefit of enforcing proper usage of the good old Session Facade pattern. To see this, let's recall the goals of using session facades:
  • Provide a simpler interface to the clients by hiding all the complex interactions between business components.
  • Reduce the number of business objects that are exposed to the client across the service layer over the network.
  • Hide from the client the underlying interactions and interdependencies between business components. This provides better manageability, centralization of interactions (responsibility), greater flexibility, and greater ability to cope with changes.
  • Provide a uniform coarse-grained service layer to separate business object implementation from business service abstraction.
  • Avoid exposing the underlying business objects directly to the client to keep tight coupling between the two tiers to a minimum.
Usually the Session Facade pattern is combined with the Transfer Object pattern, i.e. the facade doesn't use entity objects (entity beans in EBJ 1.x and 2.x; JPA entity classes in EJB 3.0) directly, but value objects specifically designed for that facade. It is easy to see how these patterns can be enforced using the pattern described in this article:
  • The WSDL-first approach forces the designer of the service to make the contract independent of the underlying business components and to use the right level of abstraction.
  • The POJOs generated by JAXB are in general not suitable for use as entity classes in JPA. While at first glance this may seem to be a drawback of the pattern, it actually enforces the Transfer Object pattern and avoids exposing the underlying business objects directly to the client. At the same time, the code generation step relieves the developer from the task of manually creating the classes used in the Transfer Object pattern.
  • Keeping the volume of interactions over the network at the right level is a common goal for EJBs and Web services. This concern is probably easier to address in a WSDL-first approach.

Saturday, November 14, 2009

Creating a test data source in WebSphere 7

If you need to quickly set up a test database and a corresponding JDBC data source in WebSphere 7, you can use the preconfigured Derby JDBC provider for that purpose. Here is the procedure:
  • Choose a directory to store the database files. Make sure that the user ID running the server process has write access to the parent directory. Don't create the directory yet. It will be created automatically by Derby.
  • In the admin console, create a new data source with the following properties:
    • JDBC Provider: Derby JDBC Provider (existing)
    • Database name: the path of the directory chosen above
    • Authentication aliases: none
  • Go to the "Custom properties" page for the data source and change the value of the "createDatabase" property to "create".
  • Save the changes to the master configuration.
  • In the "Data sources" overview page, select the newly created data source and click "Test connection". This should create and start the database (you can verify this by looking at the configured file system directory).
When you no longer need the database, just remove the data source and delete the database directory.

Note that the following restrictions apply to data sources created using this procedure:
  • The database will be empty. Thus, the approach works best for applications able to create the database schema themselves (with JTA, use the "openjpa.jdbc.SynchronizeMappings" property in persistence.xml).
  • The database can't be used in a cluster.
  • The data source doesn't support XA.

Thursday, November 12, 2009

Euphemism of the day: restoring backward compatibility

Today somebody from IBM did the following commit on the Axiom project:
-        } else if ("".equals(symbolicName)) {
+ } else if ("IBM".equals(symbolicName)) {
I gently pointed out that the change looks strange and is probably a mistake (symbolicName and vendor are attributes extracted from an OSGi bundle manifest):
Shouldn't this be "IBM".equals(vendor) instead of "IBM".equals(symbolicName)???
Shortly afterwards, a new commit:
-        } else if ("IBM".equals(symbolicName)) {
+ } else if ("IBM".equals(vendor) ||
"".equals(symbolicName)) {
Guess what was the commit comment?
Need to insure that the dialect detector remains backwards compatible
So, if you don't want to say "I fixed a bug that I introduced", just say "I restored backward compatibility"...

PS: That reminds me of the story where IBM tried to hide the fact that the first version of their StAX parser didn't conform to the StAX specifications. Maybe I will blog about this story some day.

Wednesday, November 4, 2009

Quote of the day

By means of ever more effective methods of mind-manipulation, the democracies will change their nature; the quaint old forms -- elections, parliaments, Supreme Courts and all the rest -- will remain. The underlying substance will be a new kind of non-violent totalitarianism. All the traditional names, all the hallowed slogans will remain exactly what they were in the good old days. Democracy and freedom will be the theme of every broadcast and editorial [...]. Meanwhile the ruling oligarchy and its highly trained elite of soldiers, policemen, thought-manufacturers and mind-manipulators will quietly run the show as they see fit.
Aldous Huxley, Brave New World Revisited, 1958

Sunday, November 1, 2009

Understanding StAX: how to correctly use XMLStreamWriter

Note: This is a slightly edited version of a text that I wrote for the Axiom documentation. Some of the content is based on a reply posted by Tatu Saloranta on the Axiom mailing list. Tatu is the main developer of the Woodstox project.

Semantics of the setPrefix and setDefaultNamespace methods

The meaning and precise semantics of the setPrefix and setDefaultNamespace methods defined by XMLStreamWriter is probably one of the most obscure aspects of the StAX specifications. As we will see later, even the people who wrote the first version of IBM's StAX parser (called XLXP-J) failed to implement these two methods correctly. In order to understand how these method are supposed to work, it is necessary to look at different parts of the specification (For simplicity we will concentrate on setPrefix):

  • The Javadoc of the setPrefix method.
  • The table shown in the Javadoc of the XMLStreamWriter class in Java 6.
  • Section 5.2.2, “Binding Prefixes” of the StAX specification.
  • The example shown in section 5.3.2, “XMLStreamWriter” of the StAX specification.

In addition, it is important to note the following facts:

  • The terms defaulting prefixes used in section 5.2.2 of the specification and namespace repairing used in the Javadocs of XMLStreamWriter are synonyms.
  • The methods writing namespace qualified information items, i.e. writeStartElement, writeEmptyElement and writeAttribute all come in two variants: one that takes a namespace URI and a prefix as arguments and one that only takes a namespace URI, but no prefix.

The purpose of the setPrefix method is simply to define the prefixes that will be used by the variants of the writeStartElement, writeEmptyElement and writeAttribute methods that only take a namespace URI (and the local name). This becomes clear by looking at the table in the XMLStreamWriter Javadoc. Note that a call to setPrefix doesn't cause any output and it is still necessary to use writeNamespace to actually write the namespace declarations. Otherwise the produced document will not be well formed with respect to namespaces.

The Javadoc of the setPrefix method also clearly defines the scope of the prefix bindings defined using that method: a prefix bound using setPrefix remains valid till the invocation of writeEndElement corresponding to the last invocation of writeStartElement. While not explicitly mentioned in the specifications, it is clear that a prefix binding may be masked by another binding for the same prefix defined in a nested element. (Interestingly enough, BEA's reference implementation didn't get this aspect entirely right.)

An aspect that may cause confusion is the fact that in the example shown in section 5.3.2 of the specifications, the calls to setPrefix (and setDefaultNamespace) all appear immediately before a call to writeStartElement or writeEmptyElement. This may lead people to incorrectly believe that a prefix binding defined using setPrefix applies to the next element written. This interpretation however is clearly in contradiction with the setPrefix Javadoc.

Note that early versions of IBM's XLXP-J were based on this incorrect interpretation of the specifications, but this has been corrected. Versions conforming to the specifications support a special property called, which always returns Boolean.FALSE. This allows to easily distinguish the non conforming versions from the newer versions. Note that in contrast to what the usage of the reserved prefix suggests, this is a vendor specific property that is not supported by other implementations.

To avoid unexpected results and keep the code maintainable, it is in general advisable to keep the calls to setPrefix and writeNamespace aligned, i.e. to make sure that the scope (in XMLStreamWriter) of the prefix binding defined by setPrefix is compatible with the scope (in the produced document) of the namespace declaration written by the corresponding call to writeNamespace. This makes it necessary to write code like this:

writer.writeStartElement("p", "element1", "urn:ns1");
writer.setPrefix("p", "urn:ns1");
writer.writeNamespace("p", "urn:ns1");

As can be seen from this code snippet, keeping the two scopes in sync makes it necessary to use the writeStartElement variant which takes an explicit prefix. Note that this somewhat conflicts with the purpose of the setPrefix method; one may consider this as a flaw in the design of the StAX API.

The three XMLStreamWriter usage patterns

Drawing the conclusions from the previous section and taking into account that XMLStreamWriter also has a “namespace repairing” mode, one can see that there are in fact three different ways to use XMLStreamWriter. These usage patterns correspond to the three bullets in section 5.2.2 of the StAX specification:

  1. In the “namespace repairing” mode (enabled by the property), the writer takes care of all namespace bindings and declarations, with minimal help from the calling code. This will always produce output that is well-formed with respect to namespaces. On the other hand, this adds some overhead and the result may depend on the particular StAX implementation (though the result produced by different implementations will be equivalent).

    In repairing mode the calling code should avoid writing namespaces explicitly and leave that job to the writer. There is also no need to call setPrefix, except to suggest a preferred prefix for a namespace URI. All variants of writeStartElement, writeEmptyElement and writeAttribute may be used in this mode, but the implementation can choose whatever prefix mapping it wants, as long as the output results in proper URI mapping for elements and attributes.

  2. Only use the variants of the writer methods that take an explicit prefix together with the namespace URI. In this usage pattern, setPrefix is not used at all and it is the responsibility of the calling code to keep track of prefix bindings.

    Note that this approach is difficult to implement when different parts of the output document will be produced by different components (or even different libraries). Indeed, when passing the XMLStreamWriter from one method or component to the other, it will also be necessary to pass additional information about the prefix mappings in scope at that moment, unless the it is acceptable to let the called method write (potentially redundant) namespace declarations for all namespaces it uses.

  3. Use setPrefix to keep track of prefix bindings and make sure that the bindings are in sync with the namespace declarations that have been written, i.e. always use setPrefix immediately before or immediately after each call to writeNamespace. Note that the code is still free to use all variants of writeStartElement, writeEmptyElement and writeAttribute; it only needs to make sure that the usage it makes of these methods is consistent with the prefix bindings in scope.

    The advantage of this approach is that it allows to write modular code: when a method receives an XMLStreamWriter object (to write part of the document), it can use the namespace context of that writer (i.e. getPrefix and getNamespaceContext) to determine which namespace declarations are currently in scope in the output document and to avoid redundant or conflicting namespace declarations. Note that in order to do so, such code will have to check for an existing prefix binding before starting to use a namespace.

Die Nazis und das Öl

Wer sich immer schon gewundert hat, wie Nazi-Deutschland einen Weltkrieg hat führen können ohne nennenswerte eigene Erdöl-Vorkommen, der sollte diesen Artikel bei "einestages" (Spiegel) lesen. Besonders interessant ist folgende Stelle:
Schon 1940 existierte ein "Öl-Plan" der Briten, der vorsah, 17 deutsche Hydrierwerke anzugreifen und die Produktion mit einem "fatal blow" stillzulegen. Doch Churchill setzte andere strategische Schwerpunkte, drängte auf Flächenbombardements. Bis zum Mai 1944 wurde gerade einmal ein Prozent der gesamten alliierten Bombentonnage auf Öl-Ziele abgeworfen.
Die Briten wussten also schon vor Hitlers Überfall auf die Sowjetunion, wie man die deutsche Kriegsmaschinerie hätte empfindlich schwächen können, haben damit aber bis kurz vor der Landung in der Normandie gewartet...

Friday, October 30, 2009

Quote of the day

"It is well enough that people of the nation do not understand our banking and monetary system, for if they did, I believe there would be a revolution before tomorrow morning."
"Es ist gut, dass die Bürger der Nation unser Banken- und Währungssystem nicht verstehen. Denn ich glaube, wenn sie es verstünden, würde es noch vor morgen früh eine Revolution geben."
"C’est une chance que les gens de la nation ne comprennent pas notre système bancaire et monétaire parce que si tel était le cas, je crois qu'il y aurait une révolution avant demain matin."
Henry Ford

Friday, October 16, 2009

Taming the beast: managing SLF4J dependencies in complex Maven builds

More and more projects now use SLF4J instead of the good old Commons Logging as a logging facade. This introduces new challenges for complex Maven builds, because SLF4J will only work as expected if the dependencies are managed correctly. To understand why this is so, let's first review the different components that are part of SLF4J:
  • slf4j-api contains the SLF4J API, i.e. all the classes that an application or library using SLF4J directly depends on.
  • A number of bindings that implement the SLF4J API either based on an existing logging framework (slf4j-log4j12, slf4j-jdk14 and slf4j-jcl) or using a native implementation developed specifically for SLF4J (slf4j-nop and slf4j-simple).
  • A number of bridges that adapt SLF4J to existing logging facades (jul-to-slf4j) or emulate existing logging facades or implementations (jcl-over-slf4j and log4j-over-slf4j).
For SLF4J to work correctly in a project built with Maven, the following conditions must be met:
  1. The project must have a dependency on slf4j-api. If the project itself uses SLF4J and doesn't depend on any binding or bridge, then this should be a direct dependency. If SLF4J is used by one or more dependencies of the project, but not the project itself, then one may prefer to let Maven's dependency management system include it as a transitive dependency.
  2. If the project produces an executable artifact (JAR with Main-Class, WAR, EAR or binary distribution), then it must have a dependency on one and only one of the bindings. Indeed, a binding is always required at runtime, but the presence of multiple bindings would result in unpredictable behavior.
  3. The project may have any number of dependencies on SLF4J bridges, excluding the bridge for the API used by the binding. E.g. if slf4j-log4j12 is used as a binding, then the project must not depend on log4j-over-slf4j. Otherwise the application may crash because of infinite recursions.
  4. If the project has a dependency on a bridge that emulates an existing logging API, then it must not have at the same time a dependency on this API. E.g. if jcl-over-slf4j is used, then the project must not have a dependency on commons-logging. Otherwise the behavior will be unpredictable.
  5. The dependencies must not mix artifacts from SLF4J 1.4.x with artifacts from 1.5.x, since they are incompatible with each other.
Note that rule number 2 really only applies to executable artifacts. A project that produces a library artifact should never depend on any SLF4J binding, except in test scope. The reason is that depending on a given SLF4J binding in scope compile or runtime would impose a particular logging implementation on downstream projects. In a perfect world where every library (in particular third-party libraries) follows that practice, it would be very easy to validate the five conditions enumerated above: it would simply be sufficient to add a dependency on the desired binding (as wells as any necessary bridge) from SLF4J 1.5.x to every Maven project producing an executable artifact.
Alas, the world is not perfect and there are many third-party libraries that do have dependencies on particular SLF4J bindings and logging implementations. If a projects starts depending on this type of libraries, things get easily out of control if no countermeasures are taken. This is true especially for complex projects with lots of dependencies, which will almost certainly run into a situation where one of the five conditions above is no longer satisfied.
Unfortunately, Maven doesn't have the necessary features that would allow to enforce these conditions a priori, and enforcing them requires some discipline and manual intervention. On the other hand, there is a strategy that is quite simple and effective when applied systematically:
  • Make sure that in projects under your control, the policies described above are always followed.
  • For third-party libraries that don't follow best practices, use exclusions on the corresponding dependency declarations to remove transitive dependencies on SLF4J bindings. Note that if the library is used in several modules of a multi-module Maven project, then it is handy to declare these exclusions in the dependency management section in the parent POM, so that it is not required to repeat it every time the dependency is used.
An example of this would look as follows:

Thursday, October 15, 2009

XML schema oddity: covariant literals (part 1)

If you look at the 19 primitive types defined by the second part of the XML Schema specification, you may notice that one of them, namely QName, has a very particular feature that distinguishes it from the 18 other types: there is no functional mapping between its lexical space and its value space.
The value space of a type describes the set of possible values for that type and is a semantic concept. For example, the value space of the boolean type has two element: true and false. The lexical space on the other hand is the set of possible literals for that type. It is a syntactic concept and describes the possible ways in which the values of the type may appear in the XML document. E.g. the lexical space the boolean type has four elements: 0, 1, true and false. For a given type, the existence of a functional mapping between the lexical space and the value space means that for every literal, there is one and only one value that corresponds to that literal. This implies that if for example the type is used to describe an XML element, it is sufficient to know the text inside that element to unambiguously determine the value.
The QName type doesn't have this property because its value space is the set of tuples {namespace name, local part}, while its lexical space is defined by the production (Prefix ':')? LocalPart. Therefore, a QName literal can only be translated into a QName value if the context in which the literal appears is known. More precisely, it is necessary to know the namespace context, i.e. the set of namespace declarations in scope for the context where the QName is used.
Another interesting property of the schema type system is that none of the primitive types has a lexical space that is disjoint from the lexical spaces of all other primitive types. The proof is trivial: the lexical space of any simple type is in fact a subset of the lexical space of the string type. This implies that without knowledge of the schema, it is not possible to detect usages of QName in an instance document.
Why is all this important? Well, the consequence is that a transformation of an XML document can only leave QName values invariant if one of the following provisions are made:
  • The transformation leaves invariant the namespace context of every element. In that case it is sufficient to leave all literals invariant in order to leave all values invariant.
  • Before applying the transformation, all QName literals are translated into QName values. When serializing the document after the transformation, QName values are translated back into QName literals. In that case, QName literals are no longer invariant under the transformation. As noted above, this approach requires knowledge of the schema describing the document instance being transformed.
The situation is further complicated by the fact that there are custom types that have properties similar to QName, except that the semantics of these types are not defined at schema level, but by the application that eventually consumes the document. A typical example are XPath expressions: they also use namespace prefixes and their interpretation therefore also depends on the context in which they appear in the document.
Taking this into account, it is clear that the first approach described above is both simpler and more robust. The drawback is that it will in many cases cause a proliferation of namespace declarations in the transformation result, most of which are actually unnecessary. This can be seen for example on a transformation that simply extracts a single element from a document: to preserve the namespace context, it would be necessary to copy the namespace declarations of all ancestors of the element in the source document and add them to the element in the output document (except of course for those namespace declarations that are hidden).

In a second post I will examine how the issue described here is handled by various XML specifications and Java frameworks that describe or implement transformations on XML documents.

Electrabel vous remercie

Le gouvernement a négocié avec le secteur énergétique une "contribution structurelle" au budget fédéral. Selon Le Soir, le montant se situera dans une fourchette entre 215 et 245 millions d'euros par an. Or, ce qu'il faut savoir, c'est que les bénéfices exceptionnels que cela engendre pour Electrabel sont estimés à 1,8 milliards d'euros par an! Vous allez me dire qu'Electrabel paie des impôts et qu'une partie de cette somme revient quand même dans les caisses de l'état. Que nenni! Grâce aux intérêts notionnels inventés par notre ami Didier et un astucieux montage financier mis en place par Suez pour transférer une partie de sa dette vers sa filiale belge, Electrabel ne paie actuellement pas d'impôts des sociétés!
Les actionnaires d'Electrabel vous remercient cordialement pour votre contribution structurelle à leur dividendes...

Wednesday, October 14, 2009

Euphemism of the day: "tactical solution"

Recently I got into a discussion with somebody from IBM about an issue in Axiom. I argued that the proposed solution is only a workaround and that I would prefer a clean solution that addresses the root cause and fixes the issue once and for all. In his reply the guy from IBM carefully avoided the word workaround, and instead used tactical solution. I really like that euphemism.
So, next time you report to your boss, don't speak about workarounds and proper fixes, but about tactical and strategical solutions. He will be impressed...

Tuesday, October 13, 2009

Inside Axis2 and Axiom: can somebody please clean up?

Recently my attention got caught by a set of issues in Axis2 and Axiom that at first glance may seem unrelated, but when considered together point towards an important design flaw in Axis2 and Axiom:
  • When MTOM or SwA is used, Axiom has the ability to offload the content of the attachments to temporary files. Axiom does this based on a threshold algorithm: it will first attempt to read the data into memory, and if the attachment is larger than a configurable threshold it will move that data and write the rest of the attachment to a temporary file. In addition, Axiom also implements deferred loading of attachments: the data is only read from the message when the code consuming the request tries to access the attachments. Of course this only works within the limits imposed by the fact that these attachments must be read sequentially from the underlying stream.
    Recently a user reported an issue related to MTOM when used in an asynchronous Web Service, i.e. a service that returns an acknowledgement (HTTP 202) and then processes the request asynchronously on a separate thread, sending back the response using a different channel. This is a feature that is fully supported by Axis2. However it turns out that when used with MTOM, the attachments get lost. The reason is that sending back the HTTP 202 response will discard the part of the request that has not yet been read. More precisely, AbstractMessageReceiver, the class implementing the asynchronous feature, calls SOAPEnvelope#build(), which makes sure that the SOAP part is fully read into memory, but fails to tell Axiom to read the attachments before control is handed back to the servlet container.
    I advised the user to fix this by replacing build by buildWithAttachments, which forces Axiom to fetch all attachments, or at least those that are referenced by xop:Include elements. However, this only led to the next problem, which is that AxisServlet calls TransportUtils#deleteAttachments(MessageContext) before the thread processing the request gets a chance to read the attachments. If the attachments have been loaded into memory, this is not an issue, but if they have been offloaded to temporary files, these files will be deleted at that moment.
  • An interesting aspect about the issue described above is that AxisServlet seems to be the only transport that uses deleteAttachments. This means that the other transports would be affected by the opposite problem, i.e. instead of deleting temporary files too early (in the asynchronous case), they would not delete the temporary files at all. There is indeed an open issue in Axiom that describes this type of problem, but it is not clear if this occurs on the server side or client side (i.e. this bug report may actually refer to the last bullet below).
    It should be noted that since JMS is message based and doesn't use streams, the only other (commonly used) transport that would be impacted is the standalone HTTP transport, which is also used by Axis2's JAX-WS implementation when creating HTTP endpoints outside of a servlet container.
  • Axiom has another highly interesting feature called OMSourcedElement. Basically, this makes it possible to create an XML fragment that is not backed by an in-memory representation of the XML, but by some other data source. To make this work, every OMSourcedElement is linked to an OMDataSource instance that knows how to produce XML from the backing data. Many of the databindings provided by Axis2 rely on this feature. We also use it in Synapse for XSLT results if the stylesheet produces text instead of XML. Here again, if the result of the XSLT is too large, we offload it to a temporary file. In that case, we end up with an OMSourcedElement/OMDataSource that is backed by a temporary file. A known issue with this is that Synapse doesn't properly manage the lifecycle of these files, i.e. it is unable to delete them at the right moment. It actually relies on File#deleteOnExit() and on garbage collection, so that these temporary files will in general be kept longer than necessary.
  • Over the last year(s), there have been many reports about Axis2 leaking file descriptors or not closing HTTP connections. The issue came up again during the release process of Axis2 1.5.1, but it is still not entirely clear if the issue is now solved completely. What we know though is that at least part of the reports are in principle non-issues that are due the fact that the users didn't call ServiceClient#cleanupTransport() to properly release connections. However, as Glen pointed out, the Axis2 Javadoc didn't mention that it is mandatory to call that method (well, until I changed that). Also, I didn't check yet what happens inside cleanupTransport if the service response is MTOM. It might be that here again, Axis2 fails to clean up temporary files (see second bullet).
What is interesting to notice is that when processing a message, any SOAP stack may in general be expected to only acquire two kinds of resources that need explicit cleanup, namely network connections (or any other transport related resources) and temporary files. Indeed, assuming that the SOAP stack itself will not interact with any other external system, all other resources it acquires will be memory based and taken care of by the garbage collector (which of course doesn't exclude memory leaks). Only the clients and service implementations (and maybe some particular modules/handlers) will interact with external systems and acquire other resources requiring explicit cleanup (such as database connections).
The fact that Axis2 (and/or Axiom) does a poor job when it comes to manage both types of resources is a strong indication that there is an important design flaw that has yet to be addressed, or that it is lacking the appropriate infrastructure and APIs required to guarantee correct lifecycle management of these resources.