Home XML Articles Open source XML optimization: EXIficient

Open source XML optimization: EXIficient


The article The optimization of XML documents in Java describes the needs that led to the request to optimize XML documents ( eXtensible Markup Language ), introducing the main approaches to solving the problem. The W3C ( World Wide Web Consortium ) has issued specifications for the binary exchange format EXI ( Efficient XML Interchange ), a very compact representation of XML information. This format came to version 1.0 and became part of the W3C recommendations in March 2011.

The specification gave rise to different implementations, in this article we will examine EXIficient , an open source project launched by Siemens AG and written in the Java programming language. We will take the first steps in using the framework, we will see what is included in the product distribution and how to create a Java class that uses the EXIficient library.

We will use some sample code for the use of the artwork, to prove it we must have correctly installed the JDK version 1.5 or later (minimum requirement for EXIficient). In the archive attached to the article it is possible to find the classes that will be subsequently described and an example XML document accompanied by the relative XSD.

EXIficient: first steps
EXIficient is an implementation of the EXI specification. On the one hand, the final objective is to improve the performance of the tools that already use XML to exchange data, on the other, to allow increasing the number of domains and applications for which XML is a valid choice; among the objectives of the developers of EXIficient there is also the intention to make the technology available to the web community.

The project has arrived at version 0.9.2. The reference site provides essential information; in the download section it is possible to find the project link on SourceForge, moreover, information is available to include the project as a dependency on the POM maven, or to check out from an SVN repository. You can continue to read the article even without being familiar with these terms, just know that it is tools to support the life cycle of the software, not essential to use EXIficient. Also on the official website, “External Tools” section, you can find a list of tools based on EXIficient or using the library; among these tools there is ExiProcessor, program (under development) usable from the command line that encodes and decodes XML files to and from EXI binary files, in practice a textual interface for the framework.

The bundle archive

Also via SourceForge it is possible to download the archive “exificient- [vers] -bundle”. Inside you will find folders with sources and binary, a “lib” folder containing the library exificient.jarand other libraries required as dependencies ( xercesImpl.jare xml-apis.jar), a “doc” folder with documentation and a “sample-data” folder with a XML document and the related XSD scheme useful for carrying out the first tests.

There are also two scripts, one for Windows ( run-sample.bat) and one for Linux ( run-sample.sh) that allow you to start using the extracted archive as an optimization tool.

It is possible to import in the Eclipse IDE, or in other IDEs like NetBeans, the “bundle” folder as an existing project to work directly on it. In this section, we will limit ourselves to using scripts.

EXIficient: use by script

We have seen that in the archive are already available scripts to be launched, they are configured to optimize the file notebook.xmlin the “sample-data” folder. Exploring one of the scripts, we will see that to launch the class EXIficientDemoare passed three parameters, namely the file to be optimized, the associated XML schema and the number of executions of the optimization (by default only one)

By launching the script, we will get the following output:

Then a new “sample-out” folder will be created, where the optimized files and the decoded files obtained from the encoded files will be placed. It is possible to note that initially an optimization was performed without using the schema information ( SchemaLess ) which produced a file ( notebook.xml_exi) weighing just over a third of the original, a document that was subsequently decoded ( notebook.xml_exi.xml) for regain the starting XML.

Subsequently the operation was repeated adding the contribution of the XSD schema ( SchemaInformed ) obtaining a file ( notebook.xml_exi_schema) about five times less cumbersome than the original. As before, the obtained file was in turn decoded to get the starting XML ( notebook.xml_exi_schema.xml). When variations of the third parameter have not recorded any changes, it is possible to keep it at the default value.

Let’s try to compress another XML document. For this purpose it is possible to add to the “sample-in” folder the XML document with the related schema included in the archive attached to the article ( SampleGeospatialReference.xmle SampleGeospatialReference.xsd), and then modify the script in order to optimize the latter document.

By examining the XML in question, we will see that it is an example based on classic geolocation data in which long sequences of elements of the same type are repeated, in this case points identified by latitude and longitude. We modify and execute the script to load the new documents (hereinafter we modify the “.bat” script, in the same way we can modify the “.sh” script for linux):

<pre class=”brush: php; html-script: true”>
java -cp .;bin;lib/exificient.jar;lib/xercesImpl.jar;lib/xml-apis.jar EXIficientDemo sample-data-ext/SampleGeospatialReference.xml sample-data-ext/SampleGeospatialReference.xsd

In this case, as was to be expected, for geolocation data the results are even better, obtaining a cumbersome file less than one-sixth of the original in SchemaLess mode, and about a tenth of the original in SchemaInformed mode . The results offered by the framework are therefore in line with what is reported on the EXI format specifications.

So far we have used the script and sample demo class, but what happens when we want to try to optimize a document without having the corresponding schema? Trying to pass only the first argument (name and position of the XML) we will be informed … that the schema has not been provided and the program will stop raising an exception ( Input files not valid! ). This happens because the class referred to by the script requires both the XML and the relative schema as inputs.

In the next section, we will see how to use EXIficient in a programmatic way, working even in the absence of the schema and coming to the creation of a class that can support (or replace) the example class made available by the developers of the framework.

Using EXIficient in Java programs

You can import the bundle folder as an Eclipse project. In this way we will be able to work on the example class included in the project to do some experience with the tools made available and extend the behavior of the class, for example allowing to optimize documents without schemes, or deciding whether or not to adopt options like EXI compression.

Import the project, the EXIficientDemo class

We import the bundle project into the IDE used (in Eclipse: File-> Import … -> Existing Projects into Workspace). We will, therefore, have a situation similar to that shown in the following figure:

EXIficient - the example project
EXIficient – the example project

The EXIficient library (and the non-native libraries on which it depends) are easily recognizable, the data source and destination folders, the startup scripts and above all the example class EXIficientDemothat shows an example of using the framework.

From an analysis of the class we will confirm that to optimize an XML it is mandatory to provide the corresponding XSD (check the conditions present in the method parseAndProofFileLocations), and that the operations performed are independent with respect to the variation of the parameter runs.

The core of the class is formed by the methods codeSchemaLessand codeSchemaInformedcontaining the invocations useful to define how the optimization will occur and the subsequent return to the original XML, and the methods encodeand decodein which the actual operations take place.

By executing this class by setting the input parameters (in Eclipse: Run as-> Run configuration … -> Arguments), you will get the same effect as executing the “run-sample” script (whose purpose is exactly to call this class). It should be noted that coding and decoding take place using the SAX API .

By bringing the project just extracted under IDE, the “sample-out” folder will not be present, which will be automatically created by executing the script or by running the main EXIficientDemoproviding input parameters as input from script ( sample-data/notebook.xml sample-data/notebook.xsd)

If not already present, it is possible to include in the “sample-data” folder the XML included in the archive attached to the article and the relative schema ( SampleGeospatialReference).

As we have already seen above, the example class has several limitations, so we will proceed to the creation of a new class with which to practice and obtain a library that can provide us with more support.

The EXIficientUtilities class

We create a new class EXIficientUtilitiesthat we can use as an access class to the EXIficient library to optimize files without the restrictions we found in the Demo class. We will work in two steps, in the first we will limit ourselves to provide for the possibility of optimizing even documents without providing the schema (class EXIficientUtilitiesStep1). Then we will provide the possibility to set some other parameters (class EXIficientUtilities). It is possible to find both classes in the archive associated with the article. It is assumed that the classes are implemented in the default package, in order to reflect the organization of the already present Demo class (minimizing the changes to be made to the scripts).

In the first step the core of the class, the methods for encoding and decoding ( codeSchemaLess, codeSchemaInformed, encode and decode) remain unchanged compared to the original Demo class. What changes is in particular the method parseAndProofFileLocations:

<pre class=”brush: php; html-script: true”>
private boolean parseAndProofFileLocations(String[] args) throws Exception {
if (args.length != 0) {
if (args.length == 1) {
schemainformed = false;
xmlLocation = args[0];
if (args.length == 2) {
schemainformed = true;
xmlLocation = args[0];
xsdLocation = args[1];
} else {
xmlLocation = “sample-data/notebook.xml”;
xsdLocation = “sample-data/notebook.xsd”;
// xml
File xmlFile = new File(xmlLocation);
xmlName = xmlFile.getName();
if (schemainformed) {
// xsd
File xsdFile = new File(xsdLocation);
if (xmlFile.exists() && xsdFile.exists()) {
File outputDir = new File(OUTPUT_FOLDER);
} else {
if (xmlFile.exists()) {
File outputDir = new File(OUTPUT_FOLDER);
return schemainformed;

Contrary to what happened previously, the method that verifies the input arguments allows to use the class even if the schema is not passed. Furthermore, if no input arguments are provided, the default XML is optimized (or other XML specified therein, a useful feature more than anything else under development). The method returns a Boolean variable ( schemainformed) that will allow the main to access or not the “schema-informed” ( codeSchemaInformed) section. You will now be able to run the class by supplying the argument only sample-data/notebook.xml.

In the next (and last) part of this discussion, we will analyze the EXI options that can directly influence the coding and decoding phases.

The EXI options

The EXI specification presents a series of options that can be specified in the header, which allow to vary the behavior during the coding/decoding phase. For example, you can enable EXI compression, or change behavior at events such as comments and namespace prefixes.

EXIficient implements the options provided by the specification. At present (0.9.2) has the following options:

Option Description
Fidelity Options used for the purpose of enabling or disabling the ability to preserve certain types of information (eg namespace prefixes or comments).
Alignment options used to control the alignment of event codes and content. For example, if set to bit-alignmentevent codes and associated content they are compacted without adding an paddingintermediate. It is the default option, it works well for small messages. Among the alternatives, there is the possibility to align on the byte instead of on the bit (less performing option due to the paddingintroduced), or to enable EXI compression, an option that for larger documents allows more information to be compressed and to save more space .
Strict mode usable in the presence of the scheme to increase the compactness through a strict interpretation of the scheme and omitting the presence of certain objects as namespace prefixes.


Enable EXI compression

Below we will see how to enable EXI compression. For this purpose a new argument is provided for main which allows the compression to be enabled or not.

<pre class=”brush: php; html-script: true”>
else if (args.length == 3) {
schemainformed = true;
xmlLocation = args [0];
xsdLocation = args [1];
exiCompression = Boolean.parseBoolean (args [2]);

The other changes to be made concern the methods codeSchemaLess:

<pre class=”brush: php; html-script: true”>
protected void codeSchemaInformed () throws Exception {
String exiLocation = getEXILocation (true);
EXIFactory exiFactory = DefaultEXIFactory.newInstance ();
GrammarFactory grammarFactory = GrammarFactory.newInstance ();
Grammars g = grammarFactory.createSchemaLessGrammars (); // default
exiFactory.setGrammars (g);
if (exiCompression) {
exiFactory.setCodingMode (CodingMode.COMPRESSION);

// encode
OutputStream exiOS = new FileOutputStream (exiLocation);
EXIResult exiResult = new EXIResult (exiFactory);
exiResult.setOutputStream (exiOS);
encode (exiResult.getHandler ());
exiOS.close ();

// decode
SAXSource exiSource = new EXISource (exiFactory);
XMLReader exiReader = exiSource.getXMLReader ();
decode (exiReader, exiLocation);

and codeSchemaInformed:

<pre class=”brush: php; html-script: true”>
protected void codeSchemaInformed () throws Exception {</font></font>
String exiLocation = getEXILocation (false);</font></font>
EXIFactory exiFactory = DefaultEXIFactory.newInstance ();</font></font>
GrammarFactory grammarFactory = GrammarFactory.newInstance (); </font></font>
Grammars g = grammarFactory.createGrammars(xsdLocation);

// encode
OutputStream exiOS = new FileOutputStream(exiLocation);
EXIResult exiResult = new EXIResult(exiFactory);

// decode
EXISource saxSource = new EXISource(exiFactory);
XMLReader xmlReader = saxSource.getXMLReader();
decode(xmlReader, exiLocation);

If the Boolean variable exiCompressionis true, the object exiFactorwill set the encoding mode to COMPRESSION, enabling EXI compression. By default the encoding mode is set to BIT_PACKED.

Some aspects should be noted. In the method, codeSchemaLesswe have defined a grammar without a schema ( createSchemaLessGrammars). This could be omitted as it is the default option. The purpose of the grammar is to define the rules to which the XML document is subject, rules defined for example by XSD.

The core around which everything revolves is the EXIFactory, a class that contains the elements of the configuration useful for coding and relative decoding. They are information to be shared in case the encoding/decoding options are not the default ones.

Running for test purposes the main XML and related default schema and requiring EXI compression (third argument true) will result in objects ( notebook.xml_exiand notebook.xml_exi_schema) that take up little more space than the same objects obtained with bit ( CodingMode.BIT_PACKED) encoding. The original document was already small and in these cases, the default options work better.

If instead the test is performed on the document SampleGeospatialReference, we will obtain objects that occupy about half of the equivalent obtained in default mode, ie objects smaller than 10 times the original in the case schema-lessand almost 20 times smaller in the case schema-informed. An appreciable result.

As mentioned, it is also possible to set the FidelityOption, for example, to enable the mode Strict:

<pre class=”brush: php; html-script: true”>

EXIficientUtilities: use by script

We have seen previously that it is possible to use the bundle as an executable archive by means of the attached scripts. We can use the same archive to host the new one main that we have made, simply by copying and pasting the compiled of the newly created class into the corresponding “bin” folder of the decompressed archive (if, as assumed, it was created in the default package and working with Eclipse without building tools such as maven, we will find the class compiled in the location: “bundle/bin / EXIficientUtilities.class”). This is useful if you have made a copy of the bundle in the workspace used with Eclipse (or equivalent with other IDEs). If you have worked directly on the unzipped archive, you will already have the source and compiled in the EXIficientUtilities class.

At this point it is possible to create the new launch script that recalls mainwhat we have done by modifying the starting one; “Run-ExiUtilities.bat”

<pre class=”brush: php; html-script: true”>
:: EXI Coding<font></font>
java -cp .;bin;lib/exificient.jar;lib/xercesImpl.jar;lib/xml-apis.jar EXIficientUtilities sample-data/SampleGeospatialReference.xml sample-data/SampleGeospatialReference.xsd true<font></font>


<pre class=”brush: php; html-script: true”>
java -cp .:bin:lib/exificient.jar:lib/xercesImpl.jar:lib/xml-apis.jar EXIficientUtilities sample-data/SampleGeospatialReference.xml sample-data/SampleGeospatialReference.xsd true

We can try to start the startup script by supplying only the XML and this time we will not get errors.

Execution of the run-ExiUtilities script
Execution of the run-ExiUtilities script

In a similar way, it will be possible to test the script by supplying the schema as well, or by trying to enable EXI compression by inserting the true third argument.


EXIficient, implements the EXI specification, XML binary exchange format. We tried to use the options of the specification and we saw that EXIficient has a good coverage of the standard, even if it has not yet reached the 1.0 release. The results are in line with the assessments made by the W3C for the specification.



Please enter your comment!
Please enter your name here