by Manish Dixit
(June, 2003)
We want to hear from you! Please send us your FEEDBACK.
The following Technical Article may contain actual software programs in source code form. This source code is made available for developers to use as needed, pursuant to the terms and conditions of this license.
XML is a way to represent application data.
XML is defined as eXtensible Markup Language. It is a meta-markup language and more.
Historically, web services faced continuous problems with representing application data. There were issues with performance, persistence, mutability, composition and security. There were no standards defining the representation of data. Defining interpretation, presentation, interoperability and the portability capabilities of data required care.
Until now HTML has served as the most common form of data markup. It has been tailored primarily towards the presentation of data in browsers. However, it was commonly felt that HTML was trading its power for ease of use. Using a fixed set of tags, along with predefined semantics and data for these tags, HTML was restrictive when came to expanding its functionality to encompass application data.
XML was born to handle the shortcomings of HTML. The idea was to have a common way to represent application data to be shared over the Internet, such that variety of applications could work with this data, and so it would be easy to write a set of programs which could process this XML data.
With XML, data can be "marked-up" with an infinite number of customizable tags. Tags can be anything from semantic data representation, to business rules (ebXML), to data relationships (Enterprise JavaBean container-managed persistence), to Formatting (XSL), etc. The feature that allows XML to have any number of tags makes it eXtensible.
The underlying ideas for the Java programming language and XML are similar. The Java programming language enables portable code, whereas XML enables portable data. Most XML tools and code are written in the Java programming language. The Java programming language arguably has the best API support for XML technology through the Java Community Process (JCP). Some of these APIs include:
Developers are mostly interested in creating, sending and receiving, and parsing and manipulating XMLs.
Creation of XML documents is mostly done programatically, but since the format of XML is simple text, it can be done using any text editor or using one of the available XML editors.
Sending an XML document can be accomplished over any common protocol such as http, ftp, tcp, smtp, etc. This can also be accomplished through programming APIs such as Java Message Service (JMS), JAXM or sockets. XML documents can be URIs (files and urls), inputstream, Simple API for XML (SAX) input or a DOM tree.
Parsing XML documents involves converting them into programmable objects that represent data. The final step involves manipulating these objects in an application-specific manner to display, save in a database, or create further XML objects.
Frequently there is a need for the application data to be transformed into a different format. For example, converting to HTML for presentation purposes, or converting to XML to send to another application. Stylesheets make it easier for this conversion to take place.
A stylesheet specifies the presentation of XML information using two basic techniques:
A. Components of the XSL Language
To understand XSL technology it is essential to become familiar with its three components:
- XPath: XML Path Language - a language for referencing specific parts of an XML document.
- XSLT: XSL Transformations - a language for describing how to transform one XML document (represented as a tree) into another.
- XSL: Extensible Stylesheet Language - XSLT plus a description of a set of formatting objects and formatting properties
B. An XSL Stylesheet
An XSL stylesheet basically consists of a set of templates. Each template compares (or "matches" in XML lingo) some set of elements in the source XML, then describes the contribution that the matched element makes to the resulting document.
Generally, elements in a stylesheet in the XSL namespace are part of the XSLT language, and non-XSL elements within a template are what get copied to the resulting document. The primary goal of the namespace specification is to let the document author tell the parser which Document Type Definition (DTD) or schema to use when parsing a given element. The parser can then consult the appropriate DTD or schema for an element definition. Of course, it is also important to keep the parser from aborting when a "duplicate" definition is found, and yet still generate an error if the document references an element like
titlewithout qualifying it (identifying the DTD or schema to use for the definition).C. The Structure of a Stylesheet
- XSLT Stylesheets are XML documents; namespace is used to identify semantically significant elements.
Most stylesheets are standalone documents rooted at
<xsl:stylesheet>or<xsl:transform>. It is possible to have "single template" stylesheets/documents.D. Understanding a Template
Each template has the following form:
<xsl:template match="/">
{ACTION}
</xsl:template>
Anything beginning with
xsl:... defines a stylesheet element.The term
match="/"in the preceding template defines what needs to be found in the given XML. In this case it looks for the root of the XML document.This pattern-matching capability of the stylesheet language is essential to isolate pieces of relevant presentation information and transform them into the resulting set. XSLT does this with "match patterns" defined by the XML Path Language (Xpath).
E. XPath
Identifying and parsing XML documents required a language that can identify different parts of XML. XPath was designed to do just that. In support of its primary job of identifying parts of XML, the language has the added functionalities of manipulating a few data types (string, numbers and boolean) and indexing. XPath models XML as a tree of nodes.
In general, the XML nodes can be classified as:
- root nodes - If XML is a tree of information, then root depicts the information starting point. The root node also has children for processing instructions and comment nodes for processing instructions.
- element nodes - Every element of the tree can be described by the element node. The fully-qualified name of the node is its path depiction from the root. For example, a fully-qualified name of
/WebApi/Story/headlinewould depict an element such as
<WebApi>
<Story>
<headline>
US invades Iraq
</headline>
</Story>
<Story>
</Story>
</WebApi>
- text nodes - Any character text in the XML document would be described by a text node. In the preceding example, the lines "Us invades Iraq" form a text node.
- attribute nodes - Attribute nodes are nodes that have certain attributes associated with them. For example, in the following excerpt from an XML file, we see that under
/WebApi/Company/Statement/Financial/Sales Periodthere is actually an attribute node, withperiod_ending,type, andidas its attributes.To access an attribute of the node we use the "@" sign. For example,
<xsl:value-of select="/WebApi/Company/Statement/Financial/Sales/Period[1]/@period_ending"/>
<WebApi>
<Company>
<company_name>Sun Microsystems, Inc.</company_name>
<ticker>SUNW</ticker>
<fiscal_year_end>June 30, 2002</fiscal_year_end>
<Statement>
<Financials>
<Sales>
<Period period_ending="Dec. 02" type="Quarterly" id="2Q">2,915.0</Period>
<Period period_ending="Sep. 02" type="Quarterly" id="1Q">2,747.0</Period>
</Sales>
</Financials>
</Statement>
</Company>
<Company> .... </Company>
</WebApi>
- namespace nodes - Every element in an XML document has a set of associated namespace nodes - one for each distinct namespace prefix that is in the scope for the element. The element is the "parent" of each of these namespace nodes.
Elements never share namespace nodes: if one element node is not the same namespace node as another element node, then none of the namespace nodes of the original element node will be the same namespace node as another element node.
- processing instruction nodes - Sometimes application data may require processing certain instructions on a set of data before it becomes usable. To help facilitate this, in the XML, the data provider can have a special node depicted by a processing instruction node, with the instructions that are need to process the data.
- comment nodes - These are used to provide comments on the data objects in the XML. They do not form the data itself.
A. News Web Service
In our example we assume that a web service call has been made to the provider to get the XML document. Here we will demonstrate how we can write an XSLT to convert this XML document.
1. XML
We'll use headlines.xml as our source of information. The XML contains news stories. Each story is associated with a headline, a URL for the detailed story related to the headline, a publication source, a publication date, the arrival time of the news, and the category to which the story belongs.
2. XSLT
The complete XSLT can be downloaded from headlines.xsl. It's also included below, embedded with comments in red.
The preceding line is mandatory; it indicates that the stylesheet itself is a document conforming to XML standards.<?xml version = "1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl=" http://www.w3.org/1999/XSL/Transform">
xsl:stylesheetindicates the start of a stylesheet page. It has a fixed format as indicated above.
<xsl:template match="/">
xsl:templateis the start of the template definition. In this case we have an argument ofmatch="/"; this causes the parser to look into the XML document and match any child starting with "/".
<HTML> <HEAD> <TITLE>News Headlines</TITLE> </HEAD> <BODY>Most of the text in the stylesheet that does not start with
xsl:... would be directly copied by the parser.
<xsl:for-each select="/WebApi/Story">This is to apply stylesheet actions procedurally. Using
xsl:for-each, a series of templates are created such that each template explicitly selects and processes the necessary elements.
The selected node trees are all elements under the
/WebApi/Storytree.
<A HREF="/YB/jsps/news_display_page.jsp?URL={url}"> <I><B><xsl:value-of select="headline"/></B></I> </A>For the most part, the above text is copied as it is, except for one important illustration. The
{url}is a way to get the XML value of the elementurlunder the specified tree (/WebApi/Story).
Another way to get the value of an XML element is by using
xsl:value-of. The preceding example shows how to get the value of the headline element from the XML.
<FONT SIZE="-1"> <xsl:value-of select="pubsource"/></FONT> <BR/> <FONT SIZE="-1"> <xsl:value-of select="Summary/text"/></FONT> <BR/><BR/> </xsl:for-each>The XSL stylesheet is a fully-conforming XML document; each beginning tag needs to have a corresponding end tag.
</BODY> </HTML> </xsl:template> </xsl:stylesheet>This marks the end of stylesheet processing.
3. Java Programming Language Code
Java programming language code makes use of the Java Web Services Developer Pack (JWSDP) APIs to carry out the transformation. The JWSDP can is available here. The complete code for our transformation example is shown in ParseXml.java. To build and to run the code we needed to the transformation engine
jarfiles from the web services developer pack in our classpath. We used a script that runs on UNIX®csh, and has been shown in build. One should be able to easily adapt it to any other shell or operating system. Similarly, for running the generated classfile executable, a script is shown in run.The Java programming language code here is generic enough in that it can take in any XML file and use any stylesheet. The code only provides the transformation engine that is necessary to carry out the parsing. Walking through the code we can see that it first sets up the transformation engine to use by defining it in the system properties. It goes on create the actual transformer. At this point the transformer is ready to take in any set of XML files and XSLT files and perform the transformation. In our case the XSLT has been written to transform the XML into an HTML document.
The code begins by adding the following text from the stylesheet:
<HTML>
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<TITLE>YellowBrix News Headlines</TITLE>
</HEAD>
<BODY>
The code then looks for the node
/WebApi/Story, and performs the transformation on it: it searches for the node headlines first in this tree, and displays the following:
<A HREF="/YB/jsps/news_display_page.jsp?URL=http://xyz.yellowbrix.com/pages/iplanet/Story.nsp?story_id=38821585&ID=xyz"><I><B>San Mateo, Calif., Software Maker Relents on Shareholder Meeting</B></I></A>It then goes on to search for the node
pubsourcewithin this tree, gets the value of this node, and display it as:
<FONT SIZE="-1">San Jose Mercury News</FONT>
<BR>Lastly, it searches for
Summary/textif there is any, and displays it as :
<FONT SIZE="-1"></FONT>
<BR>It repeats the above steps for each
/WebApi/Story, as we have the XSL predefined function
for-each.At the end of the loop, the following is added :
</BODY>
</HTML>4. Output
The complete output from the program is shown in headlines.html.
B. Financials
The second example, which shows that the code we have written is reusable, makes use of another set of information gathered about the financial records for selected companies. The code uses the transformation definition provided in the stylesheet to carry out the translation. The principles for conversion remain the same, although the data and set of transformation rules are very different.
The XML data is in
financials.xml, the stylesheet is infinancials.xsl, and the output is shown infinancials.html.XML and parsing is not described in this document.
http://nwalsh.com/docs/tutorials/xsl/xsl/slides.html
http://java.sun.com/webservices/
http://www.w3.org/TR/REC-xml-names
DOC ID# 1854