Wednesday, February 13, 2013

oXygen, Scala and plugin repository

Introduction

I am looking at Scala for future development, and wondered a bit about its integration abilities for edge cases. Writing a plugin for oXygen looked like a good candidate to investigate this: oXygen has no specific support for Scala, it is written in Java, there is a JAR library containing classes to use for developing the plugin, which has to implement some specific classes and use others, then be bundled as a plugin JAR file, loaded dynamically by oXygen and called from the description in the plugin.xml descriptor. If Scala did not support that scenario, it would probably not be worth having a look at for me.

So I don't know anything about Scala. I downloaded it, tried to sketch a Hello World kind of plugin for oXygen, and the result was impressively simple! So here are the few steps I've done, if you want to make your own at home.

Setup

First, download and install Scala. I went to Scala download area, downloaded the latest version for Mac OS X, unzip it in /usr/local/scala-2.10.0 and added the sub-directory bin/ to my $PATH. I assume you also have the oXygen SDK, with the file oxygen.jar to use for plugin development.

The Scala sources

The following two classes are the implementation of a very simple plugin for oXygen that displays a message when you start oXygen up, and another one when you are about to shut it down. They both must be in a directory called org/fgeorges/test in order to match the package name (exactly like in Java). First the plugin class itself, in org/fgeorges/test/MyPlugin.scala:
package org.fgeorges.test

import ro.sync.exml.plugin.{Plugin, PluginDescriptor}

class MyPlugin(desc: PluginDescriptor) extends Plugin(desc)
{
    // empty plugin
}
Here is the extension class, in org/fgeorges/test/MyExtension.scala:
package org.fgeorges.test

import ro.sync.exml.plugin.workspace.WorkspaceAccessPluginExtension
import ro.sync.exml.workspace.api.standalone.StandalonePluginWorkspace

class MyExtension() extends WorkspaceAccessPluginExtension
{
    var myws: StandalonePluginWorkspace = null
    override def applicationStarted(ws: StandalonePluginWorkspace) {
        myws = ws
        ws.showInformationMessage("F., yeah!")
    }
    override def applicationClosing(): Boolean = {
        myws.showInformationMessage("We're closing guys...")
        true
    }
}

Compiling

In order to compile those, and build the JAR file, go to the top directory (the one containing the sub-directory org/), and use the following command. Note that the -Y is necessary because of some conflicts in the oXygen JAR file when used from Scala. You also might have to adapt the path to the oXygen JAR, of course.
> scalac \
    -Yresolve-term-conflict:error \
    -cp .:oxygen.jar \
    org/fgeorges/test/*.scala
> jar cf my-plugin.jar org/fgeorges/test/*.class

Deploying

From now on, you have your plugin as a JAR file, like any other plugin for oXygen. So the next required step is to deploy the plugin like any other plugin. Create a new directory in the oXygen plugins/ sub-directory (right under the install directory), and create a plugin descriptor at, say, plugins/my-plugin/plugin.xml:
<!DOCTYPE plugin SYSTEM "../plugin.dtd">

<plugin name="MyPlugin"
        description="Test plugin with Scala..."
        version="0.0.1"
        vendor="fgeorges.org"
        class="org.fgeorges.test.MyPlugin">

   <runtime>
      <library name="my-plugin.jar"/>
   </runtime>

   <extension type="WorkspaceAccess" class="org.fgeorges.test.MyExtension"/>

</plugin>
All you have to do now to test the plugin is to (re)start oXygen!

Add-ons repository

A new feature in oXygen 14 (which is not related at all with Scala), is the ability to create an online repository for add-ons (including plugins and frameworks), so a user can point oXygen to it and install add-ons through a graphical interface:

All you have to do in order to create such a repository is to publish a repository descriptor, which links to the JAR files for your add-ons, and give that URL to your users, like you can see on the EXPath oXygen area. The descriptor looks like the following:
<xt:extensions xmlns:xt="http://www.oxygenxml.com/ns/extension"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation="
                  http://www.oxygenxml.com/ns/extension
                    http://www.oxygenxml.com/ns/extension/extensions.xsd">
   <xt:extension id="xproject">
      <xt:location href="http://expath-pkg.googlecode.com/files/xproject-oxygen-plugin-0.5.1.zip"/>
      <xt:version>0.5.1</xt:version>
      <xt:oxy_version>14.0+</xt:oxy_version>
      <xt:type>plugin</xt:type>
      <xt:author>Florent Georges</xt:author>
      <xt:name>XProject</xt:name>
      <xt:description>XProject, the XML project manager.</xt:description>
      <xt:license>Bla bla...</xt:license>
   </xt:extension>
</xt:extensions>
Conclusion: Scala is sooo easy to setup to be used to write a plugin for oXygen, and the new oXygen add-on repository is sooo easy to create and to install plugins from!

Labels: ,

Wednesday, September 07, 2011

Packaging extension steps for Calabash

In my previous blog post, I introduced how to develop an extension XProc step in Java, for the Calabash processor. Even though writing such an extension is quite easy when you know what to do, the configuration part for the final user is quite tricky. That complexity could be a serious argument for a potential user to give up even before he/she is able to run an example using your extension step. See the previous blog entry for details, but basically the user has to configure the classpath for Calabash with your JAR and all its dependencies, point to your config file when launching Calabash, and import your library into the main pipeline (after having decided where to install your extension step).

At the end of the previous post, I introduced the idea of having such extension steps, writen in Java for Calabash, supported out of the box by the repository implementing the Packaging System. I played a little bit with the idea and came up with the following design (and implementation). Of course you still have to provide the same information (the step interface, its implementation, and the link between its type and the class implementing it), but the goal is to enable the author to do it once for all, so the user can simply use the following commands to install the package and run a pipeline using it:

> xrepo install http://example.org/path/to/your-package.xar
> calabash pipeline.xproc
...
The only constraint on the user is to use the absolute URI you defined to import the XProc library you wrote with the step interface declaration. This absolute URI will be resolved automatically into the user local repository, and the repository system will configure Calabash with the Java code automatically. In order to achieve that goal, you, as an extension step author, have to provide a package with the following structure:

expath-pkg.xml
calabash.xml
your-steps/
   your-steps-lib.xpl
   your-steps.jar
   dependency.jar

This structure looks familiar to whoever knows the structure of a standard package: you have the package descriptor, namely expath-pkg.xml, containing meta-information about the package and its content, then within the package directory you have the components, the content itself of the package. In addition, you have an additional descriptor, specific to Calabash, that is calabash.xml. In this case, the content of the package is an XProc library containing the step declarations, the JAR file with the compiled Java implementation of your extension steps, and all its dependencies (the other Java libraries it uses). Let's see how the two descriptors carry out all the information needed in order to use the extension steps. First the standard package descriptor, expath-pkg.xml:

<package xmlns="http://expath.org/ns/pkg"
         name="http://example.org/lib/your-steps"
         abbrev="your-steps"
         version="0.1.0"
         spec="1.0">

   <title>Your XProc steps for Calabash</title>

   <dependency processor="http://xmlcalabash.com/"/>

   <xproc>
      <import-uri>http://example.org/your-steps/lib.xpl</import-uri>
      <file>your-steps-lib.xpl</file>
   </xproc>

</package>

Besides the usual informations about the package (its name, textual description, version number, etc.), we tell that this package is specific to Calabash (by depending on that processor). We also declare a public component, a standard XProc library, by assigning a public, absolute URI to it, and by linking to its file by name, within the package content. Indeed, keep in mind that this library declares the step interfaces and is standard XProc, it remains the same even if there are several implementations. The library itself is:

<p:library xmlns:p="http://www.w3.org/ns/xproc"
           xmlns:y="http://example.org/ns/your-steps"
           version="1.0">

   <p:declare-step type="y:some-of-your-steps">
      <p:input  port="source" primary="true"/>
      <p:output port="result" primary="true"/>
      <p:option name="username"/>
   </p:declare-step>

   <p:declare-step type="y:another-one">
      <p:output port="result" primary="true"/>
   </p:declare-step>

</p:library>

Finally, the second descriptor, specific to Calabash and named calabash.xml, describe the informations about the Java implementation: the JAR files to add to the classpath, and the Java class implementing each of the extension step types:

<package xmlns="http://xmlcalabash.com/ns/expath-pkg">

   <jar>your-steps.jar</jar>
   <jar>dependency.jar</jar>

   <step>
      <type>{http://example.org/ns/your-steps}some-of-your-steps</type>
      <class>org.example.yours.SomeStep</class>
   </step>

   <step>
      <type>{http://example.org/ns/your-steps}another-one</type>
      <class>org.example.yours.AnotherStep</class>
   </step>

</package>

The JAR files are referenced by filenames (relative to the package content dir), the step types are identified by there QName (using Clark notation, to represent both the namespace URI and the local name as one single string), and the implementation class is referenced by it fully qualified name.

The package author has just to respect those conventions and to provide those two descriptor. He/she can package everything up by zipping this into one single ZIP file (usually using the extension *.xar, for XML ARchive). He/she is then able to publish and distribute the package to users. If the users have support for the packages, the only piece of documentation to provide is the public URI of the XProc library, to import it into their own pipeline.

An interesting point is that this strategy is usable as well for private extensions. Let's take the set of XSLT 2.0 stylesheets for DocBook for instance. A pipeline, or even a set of pipelines, might make perfect sense to drive some processings using this large application. If that processing needs some extensions to the standard languages, then it is possible to write extension steps for Calabash, integrate them within the package with the standard XSLT stylesheets and XProc pipelines, and to use it internally. If the XProc library declaring the steps is not publicly exposed in the package descriptor, then only the other components in the package itself can use it.

In that case, a user using Calabash just installs the package like any other package, and does not have to, you know, configure the extensions...

Labels: , ,

Sunday, September 04, 2011

Writing an extension step for Calabash, to use BaseX

Introduction

Writing an extension for Calabash in Java involves three different things: 1/ the Java class itself, which has to implement the interface XProcStep, 2/ binding a step name to the implementation class, and 3/ declaring the step in XProc.

Java

Let's take, as an example, a step evaluating a query using the standalone BaseX processor. The goal is not to have a fully functional step, nor to have a best-quality-ever step with error reporting and such, but rather to emphasize how to glue all the things together. The step has one input port, named source, and one output port, named result. The step gets the string value of the input port (typically a c:query element) and evaluates it as an XQuery, using BaseX. The result is parsed as an XML document and sent to the output port (it is a parse error if the result of the query is not an XML document or element). Let's start with the Java class implementing the extension step:

/****************************************************************************/
/*  File:       BasexStandaloneQuery.java                                   */
/*  Author:     F. Georges - H2O Consulting                                 */
/*  Date:       2011-08-31                                                  */
/*  Tags:                                                                   */
/*      Copyright (c) 2011 Florent Georges.                                 */
/* ------------------------------------------------------------------------ */


package org.fgeorges.test;

import com.xmlcalabash.core.XProcException;
import com.xmlcalabash.core.XProcRuntime;
import com.xmlcalabash.io.ReadablePipe;
import com.xmlcalabash.io.WritablePipe;
import com.xmlcalabash.library.DefaultStep;
import com.xmlcalabash.runtime.XAtomicStep;
import java.io.StringReader;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import net.sf.saxon.s9api.DocumentBuilder;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.XdmNode;
import org.basex.core.BaseXException;
import org.basex.core.Context;
import org.basex.core.cmd.XQuery;


/**
 * Sample extension step to evaluate a query using BaseX.
 *
 * @author Florent Georges
 * @date   2011-08-31
 */
public class BasexStandaloneQuery
        extends DefaultStep
{
    public BasexStandaloneQuery(XProcRuntime runtime, XAtomicStep step)
    {
        super(runtime,step);
    }

    @Override
    public void setInput(String port, ReadablePipe pipe)
    {
        mySource = pipe;
    }

    @Override
    public void setOutput(String port, WritablePipe pipe)
    {
        myResult = pipe;
    }

    @Override
    public void reset()
    {
        mySource.resetReader();
        myResult.resetWriter();
    }

    @Override
    public void run()
            throws SaxonApiException
    {
        super.run();

        XdmNode query_doc = mySource.read();
        String query_txt = query_doc.getStringValue();
        XQuery query = new XQuery(query_txt);
        Context ctxt = new Context();
        // TODO: There should be something more efficient than serializing
        // everything and parsing it again...  Plus, if the result is not an XML
        // document, wrap it into a c:data element.  But that's beyond the point.
        String result;
        try {
            result = query.execute(ctxt);
        }
        catch ( BaseXException ex ) {
            throw new XProcException("Error executing a query with BaseX", ex);
        }
        DocumentBuilder builder = runtime.getProcessor().newDocumentBuilder();
        Source src = new StreamSource(new StringReader(result));
        XdmNode doc = builder.build(src);

        myResult.write(doc);
    }

    private ReadablePipe mySource = null;
    private WritablePipe myResult = null;
}

An extension step has to implement the Calabash interface XProcStep. Calabash provides a convenient class DefaultStep that implements all the methods with default behaviour, good for most usages. The only thing we have to do is to save the input and output for later use, and to reset them in case the step object is reused. And of course to provide the main processing in run(). The processing itself, in the run() method, we read the value from the source port, get its string value, execute it using the BaseX API, and parse the result as XML to write it to the result port.

As you can see, there is nothing in the class itself about the interface of the step: its type name, its inputs and outputs, its options, etc. This is done in two different places. First you link the step type to the implementation class, then you declare the step with XProc.

Tell Calabash about the class

Linking the step type to the implementation class is done in a Calabash config file. So you have to create a new config file, and pass it to Calabash on the command line with the option --config (in abbrev -c). The file itself is very simple, and link the step type (a QName) and the class (a fully qualified Java class name):

<xproc-config xmlns="http://xmlcalabash.com/ns/configuration"
              xmlns:fg="http://fgeorges.org/ns/tmp/basex">

   <implementation type="fg:ad-hoc-query"
                   class-name="org.fgeorges.test.BasexStandaloneQuery"/>

</xproc-config>

Declare the step

Finally, declaring the step in XProc is done using the standard p:declare-step. If it contains no subpipeline (that is, if it contains only p:input, p:output and p:option children), then it is considered as a declaration of a step the implementation of which is somewhere else; if it contains a subpipeline, then this is a step type definition, with the implementation defined in XProc itself. The declaration can be copied and pasted in the main pipeline itself, but as with any other language, the best practice is rather to declare it in an XProc library and to import this library (composed only with step declarations) within the main pipeline using p:import. In our case, we define the step type to have an input port source, an output port result (both primary), and without any option:

<p:library xmlns:p="http://www.w3.org/ns/xproc"
           xmlns:fg="http://fgeorges.org/ns/tmp/basex"
           xmlns:pkg="http://expath.org/ns/pkg"
           pkg:import-uri="http://fgeorges.org/tmp/basex.xpl"
           version="1.0">

   <p:declare-step type="fg:ad-hoc-query">
      <p:input  port="source" primary="true"/>
      <p:output port="result" primary="true"/>
   </p:declare-step>

</p:library>

Using it

Now that we have every pieces, we can write an example main pipeline using this new extension step:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step"
                xmlns:fg="http://fgeorges.org/ns/tmp/basex"
                name="pipeline"
                version="1.0">

   <p:import href="basex-lib.xpl"/>

   <p:output port="result" primary="true"/>

   <fg:ad-hoc-query>
      <p:input port="source">
         <p:inline>
            <c:query>
               &lt;res> { 1 + 1 } &lt;/res>
            </c:query>
         </p:inline>
      </p:input>
   </fg:ad-hoc-query>

</p:declare-step>

To run it, just issue the following command on the command line (where basex-steps.jar is the JAR file you compiled the extension step class into):

> java -cp ".../calabash.jar:.../basex-6.7.1.jar:.../basex-steps.jar" \
       -c basex-config.xml \
       example.xproc

If you use this script, you can then use the following command:

> calabash ++add-cp .../basex-6.7.1.jar \
           ++add-cp .../basex-steps.jar" \
           -c basex-config.xml \
           example.xproc

Packaging

Update: The mechanism described in this section has been implemented, see this blog entry.

If you want to publicly distribute your extension, you have to provide your users with 1/ the JAR file, 2/ the config file and 3/ the library file. Thus the user needs to correctly configure Java with the JAR file, to correctly configure Calabash with the config file, and to use a suitable URI in the p:import/@href in his/her pipeline. This is a lot of different places where the user can make a mistake.

The EXPath Packaging open-source implementation for Calabash does not support Java extension steps yet, but it is planned to support them, in order to handle that configuration part automatically. The goal is to have the library author to define an absolute URI for the XProc library (declaring the steps), which the user uses in p:import, regardless of where it is actually installed (it will be resolved automatically). The details (classpath setting, XProc library resolving, and Calabash config) should then be handled by the packaging support. Once the package of the extension step has been installed in the repository, one can then execute the following pipeline (note the import URI has changed):

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step"
                xmlns:fg="http://fgeorges.org/ns/tmp/basex"
                name="pipeline"
                version="1.0">

   <p:import href="http://fgeorges.org/tmp/basex.xpl"/>

   <p:output port="result" primary="true"/>

   <fg:ad-hoc-query>
      <p:input port="source">
         <p:inline>
            <c:query>
               &lt;res> { 1 + 1 } &lt;/res>
            </c:query>
         </p:inline>
      </p:input>
   </fg:ad-hoc-query>

</p:declare-step>
by invoking simply the following command:
> calabash example.xproc

Labels: , , ,

Sunday, November 15, 2009

EXPath Packaging System: the on-disk repository layout

While working on the implementation for Calabash of the EXPath Packaging System, I was rewriting, again, a repository manager, dedicated to Calabash. Exactly as I did for Saxon one month earlier. Why? The repositories provide the same features. It should be then possible to make Calabash and Saxon share the same repository, if Saxon just ignore components other than XSLT and XQuery (for instance XProc pipelines) in that repository. So one just has to maintain one single repository for his/her whole computer (or one repository dedicated to a single project, like a Java EE application.)

Going further, I think the layout of such an on-disk repository should be part of the packaging specification itself. An implementation does not have to use such a standard repository, but if it does, it doesn't have to worry about package installation, repository management software, or even about the resolving mecanism between a component URI and the actual file with that component. One repository layout, one set of softwares for all those tasks.

This introduce a new concept. Each kind of component (XSLT, XQuery, XML Schema, etc.) has its own URI space. For instance, when using Saxon for a transform, it will resolve xsl:import URIs only in the XSLT space, when using Calabash, it will use the right space for each step. The resolving machinery is based on OASIS XML Catalogs. The repository has a top-level catalog for each URI space.

The global view of the repository is a set of subdirectories, one per package installed. The package is unzip exactly has it has been created (with the exact same files and the exact same structure.) One of those direct subdirectories is special. Its name is .expath-pkg/ and it contains the catalogs and other administrative files. It can also contain config files dedicated to a specific processor; for instance the extensions written in Java for Saxon need some config file to be stored there. There is one top-level catalog for each URI space in the repository, as well as for each package there is one catalog for each URI space it contains. The top level catalogs just point to all existing catalogs at the package level.

repo/
   .expath-pkg/
      xquery-catalog.xml
      xslt-catalog.xml
      .saxon/
         ...        [Saxon-specific stuff at the repository level]
      lib1/
         xquery-catalog.xml
         xslt-catalog.xml
         saxon/
            ...     [Saxon-specific stuff in lib1]
      lib2/
         ...
   lib1/
      query.xq
      style.xsl
   lib2/
      ...

There is a specific project aimed only at managing such a repository. There is for now only a command line interface, but there should be a graphical interface in the near future. The same project provides helpers to other Java-based applications to use repositories. For instance, the implementations for Saxon and Calabash use this JAR file to get resolving support for some URI spaces, based on the Norman's resolver for XML Catalogs. It could then be used in applications like Kernow and oXygen, or even in eXist. The following are the steps needed to setup the repository management application, Saxon and Calabash to have a usable packaging system.

  • 1/ download expath-pkg-repo-0.1.jar. I create a shell script on my system to use it easily by typing just xrepo, but this is a simple JAR file you can execute by java -jar pkg-repo.jar. Hereafter I simply use xrepo to refer to this application.
  • 2/ set $EXPATH_REPO, for instance to ~/share/expath/repo or to /usr/local/share/expath/repo or to c:/expath/repo
  • 3/ initialize the repository with xrepo create $EXPATH_REPO
  • 4/ put saxon and calabash scripts into your $PATH, with the following environment variables to be able to use them
  • 5/ set SAXON_CP to the classpath required to execute Saxon; it must contain the following JARs: saxon9he.jar (or any other version), resolver.jar, expath-pkg-repo-0.1.jar and expath-pkg-saxon-0.2.jar
  • 6/ set CALABASH_CP to the classpath required to execute Calabash; it must contain the following JARs: my modified version of Calabash, saxon9he.jar (or any other 9.2 version), resolver.jar, expath-pkg-repo-0.1.jar, expath-pkg-saxon-0.2.jar and expath-pkg-calabash-0.1.jar
  • 4b/ instead of the steps 4, 5 and 6 (for example if you do not have a Unix shell,) you can just create a simple script with the appropriate classpath and Java command to launch Saxon, as well as one for Calabash. The only drawback is that the JAR files for extensions written in Java for Saxon won;t be taken automatically from the repository

We are now going to test the EXPath HTTP Client, delivered as a XAR file. First, we create three test files: an XSLT stylesheet, an XQuery main module and an XProc pipeline. All those files are simple and use the extension function http:send-request() to send an HTTP request to a website, get the result, and extract the HTML title. Save them somewhere as, say, http-client-test.xsl, http-client-test.xq and http-client-test.xproc:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:http="http://www.expath.org/mod/http-client"
                xmlns:h="http://www.w3.org/1999/xhtml"
                exclude-result-prefixes="http h"
                version="2.0">

   <xsl:import href="http://www.expath.org/mod/http-client.xsl"/>

   <xsl:template name="main">
      <xsl:variable name="request" as="element()">
         <http:request href="http://www.fgeorges.org/" method="get"/>
      </xsl:variable>
      <title>
         <xsl:value-of select="http:send-request($request)
                                 / h:html/h:head/h:title"/>
      </title>
   </xsl:template>

</xsl:stylesheet>
import module namespace http = "http://www.expath.org/mod/http-client";
declare namespace h = "http://www.w3.org/1999/xhtml";

http:send-request(
   <http:request href="http://www.fgeorges.org/" method="get"/>
)
  / h:html/h:head/h:title
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step">

   <p:input  port="source"/>
   <p:output port="result"/>

   <p:xslt template-name="main">
      <p:input port="stylesheet">
         <p:document href="http-client-test.xsl"/>
      </p:input>
      <p:input port="parameters">
         <p:empty/>
      </p:input>
   </p:xslt>

</p:declare-step>

If you try to evaluate those test files before installing the package, you will get errors from Saxon and Calabash (disclaimer: I rewrote the outputs of both processors, just make them more easily readable, but the meaning stays intact):

$ saxon -xsl:http-client-test.xsl -it:main
File not found: http://www.expath.org/mod/http-client.xsl

$ saxon --xq http-client-test.xq
Cannot locate module for namespace http://www.expath.org/mod/http-client

$ calabash http-client-test.xproc
File not found: http://www.expath.org/mod/http-client.xsl

Now, install the package directly from the Internet (just press ENTER at both questions from the installer, to keep the default values,) then try again the test files:

$ xrepo install http://www.cxan.org/tmp/expath-http-client-0.1.xar
Install module EXPath HTTP Client? [true]: 
Install it to dir [expath-http-client-0.1]: 

$ saxon -xsl:http-client-test.xsl -it:main
<title>Florent Georges</title>

$ saxon --xq http-client-test.xq
<title xmlns="http://www.w3.org/1999/xhtml">Florent Georges</title>

$ calabash http-client-test.xproc
<title>Florent Georges</title>

While I think the runtime support for the packaging is best handled in each processor's internals, having a common repository layout (and actually shared repositories) could help processors to implement it and especially to have a set of independent applications to manage repositories and packages.

The next is, finally, to release a new version of the specification, including this repository layout. See the EXPath Packaging page for more information, and subscribe to the EXPath mailing list to stay tunned.

Labels: , ,

EXPath Packaging System prototype implementation for Calabash

An interesting piece of code I worked on during the past few weeks is the implementation of the EXPath Packaging System for the Norman Walsh's XProc processor: Calabash. It was interesting for itself, as a coding experience, but also for the still-in-development packaging system, as XProc provides all core XML technologies within a single language. Thus implementing the packaging system for Calabash implied to implement it for: RNC, RNG, Schematron, XProc (for XProc pipelines themselves,) XQuery, XSD and XSLT. This was enlightening about the relationships between those several technologies, and a proof of concept about the applicability of the packaging concept to those several technologies.

Unfortunately, Calabash does not provide any way for the user to finely configure the underlying processors (for instance Saxon for XSLT, Jing for RNG, etc.) So I first needed to add this feature to Calabash itself. Instead of plugging the EXPath stuff directly into the Calabash code base, I decided to add only a simple API for an external user to plug configuration code into Calabash. I hope Norm will agree on integrating such changes into Calabash, so the packaging support could be written entirely outside of the Calabash code base in a first time (and maybe included in Calabash in a second time.) In the meanwhile, you can just use an alternative JAR file for Calabash, including my changes (and based on the latest Subversion revision, so this is really beta stuff, besides some classes have been disabled also, due to dependency issues.) You can also have a look at the following email on XProc Dev with explanations on how to patch the Calabash code base.

To install the packaging support for Calabash, you need to put the following JAR files into your classpath: my modified Calabash JAR file, the EXPath repository management, the EXPath packaging support for Calabash and the EXPath packaging support for Saxon. Then run Calabash the usual way, besides setting the Java property org.expath.pkg.calabash.repo to the location of the repository you want to use. For repository management, please see the next blog entry I will post here...

If you are under Unix (incl. Linux, Mac OS X or Cygwin under Windows) you can use this shell script to launch Calabash from the command line. Just define the environment variable CALABASH_CP with the above classpath, and EXPATH_REPO to the repository directory. In addition to setting Calabash up, it will also add JAR files with extensions for Saxon into the classpath.

To test if the installation is ok, install this sample package (wait for the next blog entry for details about installing a package with the repository) and save the following pipeline in a file, say invoice-test.xproc:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:i="http://www.fgeorges.org/test/invoice-steps">

   <p:import href="http://www.fgeorges.org/test/invoice.xpl"/>

   <p:output port="result"/>

   <p:input port="source">
      <p:inline>
         <invoice xmlns="http://www.fgeorges.org/test/invoice"
                  date="2009-10-12">
            <line price="15" quantity="10" unitary="1.5">
               <desc>Some stuff.</desc>
            </line>
            <line price="100">
               <desc>Bigger stuff.</desc>
            </line>
            <total tax-excl="115" tax-incl="139.15"/>
         </invoice>
      </p:inline>
   </p:input>

   <i:validate/>

   <i:transform/>

</p:declare-step>

Then run it using the above description. If you saved the shell script under the name "calabash" in your $PATH, just type:

calabash invoice-test.xproc

And that's all! See the EXPath Packaging page for more information, and subscribe to the EXPath mailing list to stay tunned.

Labels: ,

Friday, October 02, 2009

EXPath Packaging System prototype implementation for Saxon

Introduction

After having released a first implementation of EXPath Packaging System for eXist, here is a version for Saxon. You can read this previous blog entry to get more information on the packaging system; in particular, it says: "The concept is quite simple: defining a package format to enable users to install libraries in their processor with just a few clicks, and to enable library authors to provide a single package to be installed on every processors, without the need to document (and maintain) the installation process for each of them."

The package manager for Saxon is a graphical application (a textual front-end will be provided soon,) and is provided as a single JAR file. Go to the implementations page, or use this following direct link to get the JAR. Run it as usual, for instance by double-clicking on it or by executing the command java -jar expath-pkg-saxon-0.1.jar. That will launch the package manager window.

Repositories

The implementation for Saxon differs from the one for eXist in a fundamental way: Saxon does not have a home directory where you can put the installed packaged, and you can invoke Saxon in so many different ways (while the eXist core is always started the same way.) That involves two different aspects regarding package management with Saxon: the package manager itself that installs and remove packages, and a way to configure Saxon itself, regardless with the way you invoke it. In addition, the homeless property of Saxon needs to introduce the concept of package repository.

A repository is a directory dedicated to installing packages, and should only be modified through the package manager. It contains the packages themselves (under a form usable by Saxon) as well as administrative informations to be able to use them (like catalogs, etc.) The graphical package manager allows one to create a new repository directly from the graphical interface, as well as switching between different repositories (if you need to maintain several repositories for several purposes.)

Importing stylesheet

But as I said above, having a repository full of packages is not enough. You have to configure Saxon to use this repository. Because you can invoke Saxon in a plenty of ways, the configuration itself is implemented as a Java helper class that you can use in your own code if you invoke Saxon from within Java (for instance in a Java EE web application.) If you use Saxon from the command line, there is a script that takes care of configuring everything for you.

But before looking in details at how to configure Saxon to use a repository, let's have a look at how a stylesheet can use an installed package. This is the whole point of the packaging system, after all. The goal is simply to be able to use a public import URI in an import statement, this URI being automatically resolved to its local copy in the repository. Like a namespace URI is just a kind of identifier (it is just used as a string, your processor does not try to actually access anything at that address,) the public import URI is an identifier to a specific stylesheet. This machanism supports also having functions implemented in Java. So all you need to do is to use this public URI, like the following:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:h="http://www.example.org/hello"
                version="2.0">

   <xsl:import href="http://www.example.org/hello.xsl"/>

   <xsl:template ...>
      ...
      <xsl:value-of select="h:hello('world')"/>

For XQuery, this is a bit different as XQuery does have a module system. But this is actually very similar. XQuery library modules are identified by their namespace URI. Once again, it can be seen as a public identifier for that XQuery module. So let's say we have an XQuery library module for the namespace URI http://www.example.org/hello, then you can simply write a module that imports it as following:

import module namespace h = "http://www.example.org/hello";
h:hello('world')

And that's it! In the package samples section below, you can see completes examples of such importing stylesheets and queries, as well as the packages they use.

Java configuration

To configure Saxon to use a repository from Java, you need to get a Configuration object. This is a central class in Saxon, which is used almost everywhere in the Saxon code base. You can get it from a Saxon TransformerFactory or from a S9API Processor. With that object on the one hand, and a File object pointing to the repository directory on the other hand, you can just call:

// the repo directory
File          repo   = ...;
// the Saxon config object
Configuration config = ...;
// the EXPath Pkg configurer
ConfigHelper  helper = new ConfigHelper(repo);
// actually configure Saxon
helper.config(config);

Besides the Java code itself, you have to be sure 1/ to have an actual repository at the location you pass to the ConfigHelper constructor and 2/ to have the JAR files used by and containing the extension functions written in Java into your classpath. The only exception to this rule is when you register such an extension function (written in Java) to Saxon 9.2; in this case EXPath Pkg will try to dynamically add the JAR files from the repository to the classpath. But playing with the classpath at runtime is not something I would recommend in Java.

Shell script

When using Saxon from the command line, EXPath Pkg comes with an alternate class to launch Saxon (this class automatically uses ConfigHelper to configure Saxon) as well as with a shell script to launch Saxon with the correct classpath.

To use this shell script (only available on Unix-like systems for now, including Cygwin under Windows) you have to set the environment variables SAXON_HOME to the directory where you put the Saxon JAR files, EXPATH_PKG_JAR to the EXPath Pkg JAR file, and APACHE_XML_RESOLVER_JAR to the XML Resolver JAR file from Apache. Additionally, you can set EXPATH_REPO to the repository directory, to not have to explicitely give it as an option each time you invoke Saxon. If all the above environment variables have been correctly set, and the script added to your PATH, you can just invoke Saxon as usual: saxon -s:source.xml -xsl:stylesheet.xsl.

Use saxon --help to get the usage help of this script. You can set the EXPath repository (and thus override EXPATH_REPO if it is set) with the option --repo=. You can add items to the classpath with the option --add-cp=. You can set the classpath (so overriding SAXON_HOME and other environment variables) with the option --cp=. The script detects if Saxon SA is present, and if so will use the SA version. You can force either B or SA version with either --b or --sa. You can also set any option to the Java Virtual Machine by using --java=, for instance to set a system property, and --mem= to set the amount of memory of the virtual machine (shortcut for the Java option -Xmx) And finally, you can also set the HTTP and HTTPS proxy information with --proxy=host:port (for instance --proxy=proxyhost:8080.)

Package samples

The first example is a packaged version of Priscilla Walmsley's FunctX. This package contains both the XSLT and the XQuery versions of this library. Of course, the XQuery module defines a module namespace, but the XSLT stylesheet does not have any public import URI (as this is behind the standard.) I chose the URI http://www.functx.com/functx-1.0.xsl, but keep in mind this is not official by any means, this is just the URI I chose. It is intended that library authors package their own libraries and choose the public URIs themselves.

The package itself is a plain ZIP file. If you open it or unzip it with your preffered tool, you can see that at the top level, there is a file named expath-pkg.xml. This is the package descriptor, that defines what the package contains (at least what is publicly exported from the package, so what can be used from within a stylesheet or a query.) In the case of this FunctX package, this descriptor looks like:

<package xmlns="http://expath.org/mod/expath-pkg">
   <module version="1.0" name="functx">
      <title>FunctX library for XQuery 1.0 and XSLT 2.0</title>
      <xsl>
         <import-uri>http://www.functx.com/functx-1.0.xsl</import-uri>
         <file>functx-1.0-doc-2007-01.xsl</file>
      </xsl>
      <xquery>
         <namespace>http://www.functx.com</namespace>
         <file>functx-1.0-doc-2007-01.xq</file>
      </xquery>
   </module>
</package>

To install the package, just download it to a temporary location, launch the package manager as explained at the beginning of this blog post, choose "install" in the file menu, and choose the package on your filesystem. To test if it is correctly installed, write the following stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:f="http://www.functx.com"
                version="2.0">

   <xsl:import href="http://www.functx.com/functx-1.0.xsl"/>

   <xsl:template match="/" name="main">
      <result>
         <xsl:sequence select="f:date(1979, 9, 1)"/>
      </result>
   </xsl:template>

</xsl:stylesheet>

and/or the following XQuery main module (depending on what you want to test):

import module namespace f = "http://www.functx.com";

<result> {
   f:date(1979, 9, 1)
}
</result>

To evaluate them, make sure you configured the shell script correctly, as explained above, then open a shell and type one of the following command (or both) where style.xsl is the file where you saved the above stylesheet and query.xq is the file where your saved the above query:

$ saxon -xsl:style.xsl -it:main
<result>1979-09-01</result>
$ saxon --xq query.xq
<result>1979-09-01</result>
$ 

If you prefer to test from Java, just write a simple main class that evaluates the above stylesheet and/or query, taking care of using ConfigHelper to set up the Saxon Configure object. For instance, if you want to use the S9API, you can configure the Processor object like the following (don't forget to add the EXPath Pkg and the Apache XML resolver JAR files to your classpath):

// the repo directory
File         repo   = new File("...");
// the EXPath Pkg configurer
ConfigHelper helper = new ConfigHelper(repo);
// the Saxon processor
Processor    proc   = new Processor(false);
// actually configure Saxon
helper.config(proc.getUnderlyingConfiguration());
// then use 'proc' as usual...

The second sample package provides a single function: ext:hello($who). It is written in Java. Besides other stuff related to the packaging itself, it contains a JAR file with the implementation of that extension function. To test it, just follow the same steps as for the FunctX package, except that you have to add the installed JAR file (from within the repository) to your claspath (this is done automatically for you if you use the shell script, but not if you test it from a Java program.)

Conclusion

This is just a prototype implementation of a package manager for Saxon, which is consistent with the one for eXist. The main issue is the configuration of the classpath, but I think this is best let to the user than having to deal with the classpath, in particular within the context of a Java EE application. This issue shows up also in your IDE configuration. For now, I configure oXygen by adding the catalogs from the repository to the oXygen's main catalog list, and the extension JAR files to the oXygen classpath, so the built-in Saxon processors can be used exactly as usual. But such issues can be resolved by native support right into the processors ad IDEs.

Besides this classpath issue, I am convinced that package management will really improve the current situation, and maybe could be the missing piece to distribute real general-purpose libraries for XQuery and XSLT, and one of the basis to other systems, like an implementation-independent XRX system.

Labels: , , ,