1. Page d'accueil
  2. Informatique

Publishing XML files with XSL

Publishing XML files with XSL.

Introduction

Since the availability of their electronic version, all major newspapers and magazines have been confronted to one problem: the publishing of articles both on paper and for the web. The text is the same, but the entire page layout has to be re-done. In these pages we will use the characteristics of XML - separating the content and its presentation - to fasten the publishing on the two media.

A simple example of XML file

First contact with an XML file

If you already know some basics of XML you can read directly the next chapters.

For those of you who are not familiar with XML, you must know that a Document Type Definition (DTD) stores the structure and the grammar of XML files whereas XML files store data according to the structure defined in the DTD.

Here come a simple DTD and an XML file using it.

simple.dtd
<?xml version="1.0" encoding="iso-8859-1"?>
<!ELEMENT list_of_people (person)*>
<!ELEMENT person (firstname, date_of_birth?)>
<!ATTLIST person gender (male|female)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT date_of_birth (#PCDATA)>

The first line of simple.dtd specifies which XML Norm is used and which character set is used (here it is the ISO Latin 1 character set).

The other lines describe the structure of XML files using this DTD.

Line 2 of simple.dtd says that list_of_people is the root element and that it can have 0 or more children of type person.

Line 3 says that a person has 2 children, a firstname and an optional date_of_birth.

Line 4 specifies that a person can have an attribute - named gender - that can have either the value "male" or the value "female".

The two last lines define firstname and date_of_birth as parsed character data. It means that you can only write a character string between the opening tag and the ending tag. Moreover this string will be analysed by the XML parser, so some characters (<, & and the sequence ]]>) are forbidden (they must be replaced by &lt;, &amp; and ]]&gt;).

example.xml
<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE list_of_people SYSTEM "simple.dtd">
<list_of_people>
  <person gender="male">
    <firstname>Peter</firstname>
  </person>
  <person gender="female">
    <firstname>Alice</firstname>
    <date_of_birth>06/03/1977</date_of_birth>
  </person>
</list_of_people>

The file example.xml simply follows the rules defined in the DTD and the general rules of XML: an element must have an opening tag and an ending tag; it can have an attribute whose value is written between quotes; an element is placed between the opening and the ending tags of its parent.

The only two particularities of this file are the second line - this one specifies the used DTD and the root element of this DTD - and the standalone attribute - the value "no" means that the XML file requires another file: the DTD.

To be continued

Now that you should be able to read simple Document Type Definitions and XML files, you can go on reading this article to know more about the eXtensible Stylesheet Language (XSL).

XSL Transformations (XSLT)

Presenting data

So DTDs deal with structure and XML files with pure data. What about the presentation? The eXtensible Stylesheet Language was created by the W3C to express presentations elements according to the XML norm. If XSL is still in a draft stage, a part of it - the XSL Transformations - is frozen and can be used to translate XML data into another file formats, such as HTML or languages based on XML. Thus XML is really an universal language as it enables full compatibility between its subsets and can even be compatible with non-XML languages (with some restrictions).

Understanding XSLT

XSLT might be difficult to understand the first time as it does not perform a line by line reading and transformation. In fact XSLT select a node - or a set of nodes - and then apply a transformation to this node. The selection is done by using XSL instructions and XPaths - namely a kind of internal URL. Let see an example.

example1.xslt
 <?xml version="1.0"?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">
  <xsl:template match="/">
    <HTML>
    <HEAD><TITLE>Example of XSLT</TITLE>
    </HEAD>
    <BODY>
  <xsl:apply-templates />
    </BODY>
    </HTML>
  </xsl:template>
  <xsl:template match="person">
    <xsl:choose>
      <xsl:when select="\[attribute::gender="male"]">
        <xsl:value-of select="firstname" /> is a boy.
        <xsl:if select="date_of_birth">
          He is born the <xsl:value-of select="date_of_birth" />.
        </xsl:if>
        &lt;BR />
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="firstname" /> is a girl.
        <xsl:if select="date_of_birth">
          She is born the <xsl:value-of select="date_of_birth" />.
        </xsl:if>
        &lt;BR>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 </xsl:stylesheet>

The first line is typical: it means that our style-sheet follow is compliant with the XML norm.

The second line declares xsl:stylesheet as the root element. You have probably noticed that the first part of all XSL instructions is the string xsl followed by a colon: it means that all these elements have the same namespace referenced as xsl. The xmlns attribute of the xsl:stylesheet element specify an URL: this URL is the one where the W3C has published its recommendations on XSLT. Thus all elements beginning with xsl: are defined in the W3C DTD for XSL Transformations.

The third line defines a template (the xsl:template element) with a part of the XML file. This part is elected with the match attribute. This attribute receive an XPath as value and select the corresponding node (here / means the root element like in Unix path names).

Then we write some basics HTML tags - in XSL, all strings that do not contain an XSL instruction are just copied in the resulting file.

The xsl:apply-templates element line 8 forces the application of the other templates of the style-sheet.

Line 12 starts the definition of a new template. The current XML Path is /> - the root element - so the value "person" of this match attribute means "all the person elements whose father is the root element". Thus we select all the person elements and the content of this template will be perform for each person element.

Line 13 comes the xsl:choose instruction. It is quite the same thing as the switch instruction in most of the programming languages: it enables to perform different actions in function of the result of several tests. The opening and ending tags of the xsl:when elements define instruction blocks and the select attribute performs the test. Here [attribute::gender="male"] means "select all the elements among the currently selected ones with a gender attribute whose value is male". You must be careful, because the selection done with select is only valid inside the current xsl:when block: after exiting this block on line 20 the current selection is "the person elements children of the root element" back.

In fact you can use any number of xsl:when blocks inside an xsl:choose instruction. You can besides use the xsl:otherwise block to select nodes that failed all the other tests.

The xsl:value-of element is used here to copy the content of the firstname element child of the current person element. Generally speaking, it is used to insert the value of the XPath expression parameter of the select attribute. Thus it is possible to copy the content of an element, the value of an attribute or the result of a mathematical expression.

The xsl:if instruction test an XPath expression - here "does the current person element has a child named date_of_birth?" - and performs the content of the block if the result of the test is true.

Line 19 may seem strange. We wrote &lt;BR> in place of <BR> because in XML all opening tags must have an ending tag. So to be XML compliant we would have had to write <BR></BR> or <BR />. The problem is that it is not yet understood by all web browsers, even if this writing is part of the XHTML norm. A way to solve this problem is to replace the lesser than sign by the equivalent string &lt; alike in HTML. By doing so it is not interpreted by the parser as the sign starting all tags. The greater than sign should not be a problem but you can replace it with &gt; if you want as well.

To be continued

As you have seen XSL Transformations require some efforts to be understood. Once it is done you will probably be fully aware of the power of XML Style-sheets. You may even want to use it in your favourites applications. It is not yet possible, so you are obliged to code your style-sheets for the moment.

The next part of this article will present to you the building of a small project designed to publish the file example.xml both to the web and to paper.

Multi-media publishing

Goal

The goal of this project is to publish the content of example.xml. in order to simplify the presentation we choose to present the data in a table (one person per row). The web version is in pure HTML in order to be compatible with any web browser and the paper version is in PostScript - a page description language understood by most of the professional printers and readable by human people.

Web-publishing

To start with we need to create the stylesheet.

 <?xml version="1.0"?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">

The next thing is the creation of the HTML file and the creation of the table. As it must be done before parsing example.xml we need to do it when detecting the root element:

 <xsl:template match="/">
   <HTML>
   <HEAD><TITLE>Content of example.xml</TITLE>
   </HEAD>
   <BODY>
   <P><CENTER><H1>My Friends</H1></CENTER></P>
   <P><TABLE border="1" width="100%">
     <TR>
       <TD width="85%" align="left"><B>Name</B></TD>
       <TD width="5%" align="center"><B>Gender</B></TD>
       <TD width="10%" align="left"><B>Date of birth</B></TD>
     </TR>
   <xsl:apply-templates />
   </TABLE>
   </P>
   </BODY>
   </HTML>
 </xsl:template>

Now we need to write the data for all the person elements.

 <xsl:template match="person">
   <TR>
     <TD width="85%" align="left"><xsl:value-of select="firstname" /></TD>
     <TD width="5%" align="center"><xsl:value-of select="attribute::gender" /></TD>
     <TD width="10%" align="left"><xsl:value-of select="date_of_birth" /></TD>
   </TR>
 </xsl:template>

Now we can close the stylesheet:

</xsl:stylesheet>

Applying this stylesheet to example.xml gives the result seen in figure1.

figure1

Publishing on paper

First we use the root element to create a new PostScript document:

 <?xml version="1.0"?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">
   <xsl:template match="/">
     %!PS

Then we define a function called tablerow that will be used to draw the shape of each row:

     /tablerow
       {500 0 rlineto
       0 -45 rlineto
       -500 0 rlineto
       closepath} def

Now we write the title of our page:

     50 700 moveto
     gsave
     /Times-Bold findfont 25 scalefont setfont
     (My Friends) stringwidth pop
     2 div
     250 exch sub
     0 rmoveto
     (My Friends) show
     grestore

Now we can write the first row of the table:

     0 -50 rmoveto
     0 setgray
     gsave
     tablerow
     stroke
     grestore gsave
     /Times-Bold findfont 15 scalefont setfont
     gsave
     15 -30 rmoveto
     (Name) show
     grestore
     350 0 rmoveto
     gsave
     0 -45 rlineto
     stroke
     grestore gsave
     (Gender) stringwidth pop
     2 div 25 exch sub
     -30 rmoveto
     (Gender) show
     grestore
     50 0 rmoveto
     gsave
     0 -45 rlineto
     stroke
     grestore gsave
     5 -30 rmoveto
     (Date of birth) show
     grestore
     -400 -45 rmoveto

We apply the other templates and then close the PostScript document.

     <xsl:apply-templates />
     showpage
   </xsl:template>

Like in the creation of the HTML page, we will use a template matching the person elements to write the content of the table row by row:

   <xsl:template match="person">
     gsave
     tablerow
     stroke
     grestore
     /Times-Roman findfont 15 scalefont setfont
     gsave
     15 -30 rmoveto
     (<xsl:value-of select="firstname" />) show
     grestore
     350 0 rmoveto
     gsave
     0 -45 rlineto
     stroke
     grestore gsave
     /gender (<xsl:value-of select="attribute::gender" />) def
     gender stringwidth pop
     2 div 25 exch sub
     -30 rmoveto
     gender show
     grestore
     50 0 rmoveto
     gsave
     0 -45 rlineto
     stroke
     grestore gsave
     5 -30 rmoveto
     (<xsl:value-of select="date_of_birth" />) show
     grestore
     -400 -45 rmoveto
   </xsl:template>

We close the stylesheet:

 
</xsl:stylesheet>

figure2

Results

This project may seem a lot of work for a small result. This is right but you must keep in mind that it was only designed for teaching purpose. In a wide-scale application we would have based our style-sheets on the DTD, not on the XML file. Thus the style-sheets would have been usable for any XML file based on the DTD which would have said a lot of time in case we had numerous files (for example articles from a newspaper).

Conclusion

Present state

This article has demonstrated how useful XML Style-sheets can be if you have to publish the same file to several media. The problem is that XSL will not be fully usable until the applications we use everyday understand both XML and XSL (FrameMaker can already use SGML - the ancestor of XML - and the new version of XPress should be compatible with XML).

Perspectives

Another thing that XML Style-sheets can do is to change the presentation of numerous files in one time. This could be used for example to change the look of a website more easily. We can even think about on-the-fly changes in function of the Websurfer. Such personalization is already possible on some websites using the PHP language but Sablotron - an XSLT processor module for the Apache webserver - may open the way to a direct use of XML Style-sheets.

Universality

The last thing about XML and XSL is the universality of XML. Thanks to XSL it is possible to have the same XML file compatible with many different applications even if they do not use the same file format. Recently lots of Linux applications have moved to XML configuration files which has enabled to switch between applications of the same type (for mail user agents) and nevertheless keep the same configuration file.

In fact, the pair XML/XSL can be used for some many tasks that it would take hundreds of pages to describe each one. To conclude, XML has so many advantages that it may become the universal file format of the future.

# Publié le 20-02-2006 à 22:38 par Christophe Garrigue.
Dans la rubrique Informatique.

Add a comment



Mention légale

Mention légale

Ce site personnel me sert principalement pour regrouper en un seul endroit tout ce que je publie sur Internet, ainsi que mes dessins et les quelques dossiers que j'ai eu l''occasion de faire au cours de mes études.
Tout le contenu de ce site est ma propriété exclusive et ne peut être réutilisé sans en faire la demande, sauf mention contraire (article sous une des licenses Creative Common par exemple). Il est toutefois possible de citer des parties des articles ici publiés comme le prévoit le code pénal français.
Police de bannière par Caffeen.