|
Documentation
Resources
Support
|
4Suite Core: Open-source Library for XML Processing
Users' Manual
1 Introduction
4Suite allows users to take advantage of standard XML technologies
rapidly and to develop and integrate Web-based applications. It also puts
practical technologies for knowledge management projects in the hands of
developers. It is implemented in Python with C extensions.
At the core of 4Suite is a library of integrated tools (including
convenient command-line tools) for XML processing, implementing open
technologies such as DOM, SAX, XSLT, XInclude, XPointer, XLink, XPath,
XUpdate, RELAX NG, and XML/SGML Catalogs.
With 4Suite, you can:
And much more. These tasks are covered in this manual.
2 Installation
Please see the UNIX or Windows install
documents. Remember that if you are using Cygwin on Windows, you should follow the UNIX instructions.
3 DOM-like XML processing
Domlette is 4Suite's lightweight DOM implementation. It is optimized
for XPath operations, speed, and relatively low memory overhead. The
Domlette API is accessible through Ft.Xml.Domlette. This section describes how to
parse, manipulate, and then serialize XML documents using this API.
Below, we briefly summarize the various elements of the API that form
the basic life span of Domlette objects.
- Parsing XML documents
-
The Ft.Xml module
contains the function Parse that gets the
job done quickly. See “Quick access to the Domlette reader API” for
details. For a bit more more advanced parsing, you will need a
combination of the reader instances in the
Ft.Xml.Domlette module and
Ft.Xml.CreateInputSource for constructing
InputSource instances. In rare cases you
might need lower-level APIs in in the
Ft.Xml.InputSource module.
Read “The full Domlette reader API” if
Ft.Xml.Parse isn't enough.
- Modifying and interacting with XML documents
-
The Domlette API for interacting with XML documents—accessible
as methods of the various Domlette objects—is similar to the DOM Level 2
specification. See “Domlette API summary” for more
information.
- Serializing XML documents
-
The Ft.Xml.Domlette
module provides two functions, Print and
PrettyPrint, for writing your XML documents.
The Print function writes the XML document
precisely as given in the model. On the other hand, the
PrettyPrint function adds whitespace nodes to
your document to try to indent the resulting output nicely. See “Serializing Domlette nodes” for details.
3.1 Parsing XML documents
We begin our discussion of the Domlette API by describing how to
obtain a model of your XML documents to manipulate further. Because XML
documents offer such rich functionality and exist in such varied
environments, there can be a surprising amount of work that you must do to
simply load your XML documents. We begin by providing a short-cut for easy
access. We will then dive into the full suite of document loading
utilities.
3.1.1 Quick access to the Domlette reader API
For basic document manipulations or to get started quickly, the
Ft.Xml module offers a quick
way to parse XML documents and directly obtain access to the Domlette
interface to those documents. Within this module the function of
interest is Parse.
Warning
This function will get you started quickly because it
specifically chooses some default values for some of the more advanced
parsing features. If you are passing in a string or stream, and the
material in “The importance of base URIs”
applies to your parsing situation, then you will want to use the
full-featured API. In brief, if your XML document references external
resources, you should not use this convenience function. See “The full Domlette reader API” instead.
This function returns a Domlette
Document representing the root of the document
from the argument.
Parse(source)
-
The Parse function takes a single
argument, which is a byte string (not unicode object), file-like
object (stream), file path or URI.
XML = """
<ham>
<eggs n='1'/>
This is the string content with <em>emphasized text</em> text
</ham>"""
from Ft.Xml import Parse
doc = Parse(XML)
# If the above XML document were located in the file
# "target.xml", we could have used `Parse("target.xml")`.
print doc.xpath('string(ham//em[1])')
3.1.2 The full Domlette reader API
You create Domlette instances by parsing XML documents with the
reader system. For general use, the Ft.Xml.Domlette package contains instances
of the different reader classes that can be used directly after you
import them. These instances include
NonvalidatingReader and
ValidatingReader, which provide non-validating
parsing and validating parsing services, respectively. The validation in
this case refers to DTD validation. For RELAX NG validation, see “Validation using RELAX NG”. All the reader classes (and, hence, their bundled
instances) are described in later sections. After you have obtained one
of these reader instances, you feed your XML document entity's byte
stream to the reader. We summarize the available reader methods
below.
parseUri(uri)
-
The parseUri method takes a single
argument; this uri argument is the absolute
URI of the document entity to parse. The URI will be dereferenced
by the default resolver.
parseString(st, uri)
-
The parseString method takes two
arguments; st is the XML document entity in
the form of an encoded Python string (not a
Unicode string). See the next section for details on
the uri argument.
parseStream(stream, uri)
-
The parseStream method takes two
arguments; stream is a Python file-like
object that can supply the document entity's bytes via
read() calls. See the next section for
details on the uri argument.
parse(inputSource)
-
The parse method takes a single
argument; inputSource is an
Ft.Xml.InputSource.InputSource object,
described in “InputSource objects”.
The next two sections cover some of the issues that you should
understand before using these functions. Then we start seeing some
examples in “NonvalidatingReader”.
3.1.3 The importance of base URIs
In the first 3 methods listed in the previous section, the
uri argument is the URI of the document entity
that you are feeding to the parser. It is a very important—but often
overlooked—concept in document processing.
The URI gives the document entity a unique identifier that can
used to refer to the document as a whole. Also, each Domlette node
derived from a particular entity inherits that entity's URI as the
node's baseURI property, unless an alternative base
URI was indicated, such as with xml:base, or if part of the document was
loaded as an external entity or XInclude.
The document's URI is also used as the "base URI" for resolving
any relative URI references that may appear within the document itself.
Relative URI references may occur in a document in places like:
-
<!DOCTYPE> or
<!ENTITY>, immediately following the keyword
SYSTEM
-
<xsl:import> and
<xsl:include>, in the value
of the href attribute
-
<xi:include>, in the
value of the href
attribute
-
<exsl:document>, in
the value of the href
attribute
-
the arguments to XSLT's document()
function
It is a common misconception that relative URI references in a
document's content are considered to be relative to the processor's
current working directory. They are actually resolved relative to the
URI of the document that contains the relative URI reference (more
specifically, relative to the URI of the entity in which the reference occurs, keeping in
mind that a document may be comprised of multiple entities, i.e.,
separate files).
In all cases, the document URI that you supply in the reader API
must be "absolute", which means that it has a scheme, e.g.
"http://spam/eggs.xml", not just
"/spam/eggs.xml" or
"eggs.xml".
If you know there are not going to be any relative URI references
to resolve during initial parsing or during processing of the Domlette
by other tools, then you can safely omit the argument, or, preferably,
supply a dummy URI like "urn:dummy" or
"http://spam/eggs.xml". If you choose to omit URI arguments
from APIs that need them, you may get a Python warning, and a random
URI—which is probably not what you want—will be assigned.
If you've understood all this and yet you want to just go ahead
and not specify a base URI, you may have to turn off the likely
warnings. You can do so with code such as in the following example.
import Ft.Xml.Domlette
import warnings
def disable_warnings(*args): pass
warnings.filterwarnings("ignore", category=Warning)
warnings.showwarning = disable_warnings
XML = "<spam/>"
doc = Ft.Xml.Domlette.NonvalidatingReader.parseString(XML)
Ft.Xml.Domlette.Print(doc)
You can also in such a case use the convenience function
Ft.Xml.Parse (see above).
3.1.4 Parsing XML that's already a Unicode string
Because 4Suite is trying to provide as thin a wrapper as possible
to the underlying parser, and due to complexities in the APIs of these
parsers, there is no API in 4Suite for parsing Python's Unicode
strings.
If your XML is in the form of a Unicode string, you must encode
the string as bytes so that the underlying parser can read it. Once you
have an encoded string, you can pass it to the reader's
parseString(), or wrap it in an
InputSource using
Ft.Xml.CreateInputSource, or the
fromString() method of an
InputSourceFactory. If the string is not UTF-16 or
UTF-8 encoded, then you must tell the reader what encoding it actually
uses. You can do this either by writing or replacing the XML declaration
in the string itself, or (much easier) setting the optional encoding
keyword argument in the reader's parseString()
method or the InputSourceFactory's
fromString() method. For an example, see the
Akara article on external encoding declarations.
3.1.5 NonvalidatingReader
Use NonvalidatingReader for basic parsing.
NonvalidatingReader performs its parsing without
validating against a DTD.
The following example will parse an XML source taken from the
supplied URI, which is treated as a URL by the default resolver.
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseUri(
"http://www.w3.org/2000/08/w3c-synd/home.rss")
The following example also parses an XML source taken from the
supplied URI, which is treated as a URL. In this case, the default
resolver tries to read the XML source from the filesystem.
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseUri("file:///tmp/spam.xml")
The following example parses XML from the filesystem. When given a
relative file path in the local OS's format, we must first convert that
path to a URI that our reader objects can use.
from Ft.Xml.Domlette import NonvalidatingReader
from Ft.Lib import Uri
file_uri = Uri.OsPathToUri('spam.xml')
doc = NonvalidatingReader.parseUri(file_uri)
The following example parses XML from a string. Note that it does
not provide a document/base URI.
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseString("<spam>eggs</spam>")
In the following example, we are parsing XML from a string in a
case where the document does need a base URI to be specified.
from Ft.Xml.Domlette import NonvalidatingReader
s = """<!DOCTYPE spam [ <!ENTITY eggs "eggs.xml"> ]>
<spam>&eggs;</spam>"""
doc = NonvalidatingReader.parseString(s, 'http://foo/test/spam.xml')
# during parsing, the replacement text for &eggs;
# will be obtained from http://foo/test/eggs.xml
In all of the above examples, doc is now a Domlette node object.
4Suite currently offers one Domlette implementation, written in C,
called cDomlette.
3.1.6 EntityReader Examples
Sometimes you need to parse a fragment of XML rather than the full
document. If operating in non-validating mode is sufficient, Domlette
has a reader that can handle this case. When parsing such a fragment,
EntityReader returns a Domlette document fragment
rather than a document object.
from Ft.Xml.Domlette import EntityReader
s = """
<spam1>eggs</spam1>
<spam2>more eggs</spam2>
"""
docfrag = EntityReader.parseString(s, 'http://foo/test/spam.xml')
Note
The content parsed by EntityReader must
be an XML External Parsed Entity. This means that it can't be just any
XML document. The main limitation is that it must not have a a
document type declaration.
3.1.7 ValidatingReader
If you want to validate a document with a DTD as you parse it, use
the ValidatingReader object instead. If
ValidatingReader discovers that the document that
it is currently parsing is invalid, then it throws a
Ft.Xml.ReaderException and does not finish
parsing the document. The following example illustrates these
concepts.
# ValidatingReader is a global instance
from Ft.Xml.Domlette import ValidatingReader
XML = """<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b/></a>"""
doc = ValidatingReader.parseString(XML, "urn:x-example:valid-a")
# And of course, as with other readers, you can use `parse`, `parseUri`, and
# `parseStream` as well.
# The following document, however, is invalid because an `a` element can only
# have two `b` children according to its DTD.
XML = """<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b/><b/></a>"""
# This throws a `Ft.Xml.ReaderException` when it encounters invalid structure,
# and does not finish parsing the document into `doc`.
doc = ValidatingReader.parseString(XML, "urn:x-example:invalid-a")
3.1.8 NoExtDtdReader
When using NonvalidatingReader to parse a
document, that document's DTD is still opened and read to obtain
information such as entity declarations and default attribute values.
You cannot suppress reading of the internal DTD subset, but you can
prevent the external subset from being accessed by using
NoExtDtdReader. This won't affect the processing
of external parameter entities defined in the internal DTD subset. Use
this object as you would use
NonvalidatingReader.
3.1.9 Creating your own reader instance
In some cases you might not want to use the global reader
instances. For instance in multithreaded use, you might want a reader
per thread. Or you might want to change some of the parameters on the
readers. If so, you can create your own reader instance:
from Ft.Xml.Domlette import NonvalidatingReaderBase
reader = NonvalidatingReaderBase()
doc = reader.parseUri("http://xmlhack.com/read.php?item=1560")
Instead of NonvalidatingReaderBase, you
could instead use NoExtDtdReaderBase or
ValidatingReaderBase, depending on your needs.
Each of these 3 readers take an optional
inputSourceFactory constructor argument, which
you can use to supply a custom URI resolver.
3.1.11 Converting from other DOM libraries
You can convert another Python DOM object (e.g. 4DOM or minidom)
to a Domlette object using the function
ConvertDocument:
from Ft.Xml.Domlette import ConvertDocument
converted_document = ConvertDocument(oldDocument, documentURI=u'http://www.example.org/')
The DocumentURI parameter provides a base
URI for the converted nodes. If not specified, attributes documentURI
and then baseURI are checked in the source DOM, as defined in DOM Level 3. If no
URI is found in this way, a warning is issued and a UUID URI is
generated for the new Domlette object.
3.2 Domlette API summary
Interacting with Domlette documents
You will use a large part of the Domlette API to interact with the
model of your XML documents. The implementation of this part of the API is
found in the Ft.Xml.cDomlette
module. This part of the API allows you to navigate around a document and
modify the content of that document. It is very similar to the DOM Level 2
specification and follows some of the DOM Level 3
specification; feel free to refer to those specifications and the
4Suite API documentation for details about the intended behavior of this
API. You can find brief descriptions of the methods and attributes
provided by this API listed below. This API is also nearly the same as the
API for xml.dom, which is bundled
with Python. The node type constants are inherited directly from
xml.dom.Node.
Many objects that you will work with in the Domlette API are
descendents of the Domlette Node class.
Documents, document fragments (of class
DocumentFragment), Elements,
attributes (class Attr), text (class
Text), processing instructions (class
ProcessingInstruction), and comments (class
Comment) are all nodes; any node operations are
defined on objects of these types, as well. Some operations do not make
sense on some objects, however. For example, it does not make sense to add
children to an attribute node.
In the DOM model of XML documents, there is a
Document node which represents the starting point
for the other pieces of the document. This node is not the root element of the document; rather, the
Document node contains the root element as its only element
child. The Document node may have other children,
though, such as processing instructions and comments.
You can easily access properties of a node directly. The following
properties are available on any node. These properties generally store
information about the structure of the document in the near "vicinity" of
the target node.
Properties available on every Node
object
- attributes
-
This is a python dictionary containing the attributes defined
on the target node. The key for the dictionary is a tuple containing
the namespace and local name of the attribute. The value associated
with this attribute name tuple is the attribute (of class
Attr) itself.
node = Parse("<foo a='1'/>")
print node.childNodes[0].attributes
{(None, u'a'): <Attr at 0x40870ecc: name u'a', value u'1'>}
- baseURI
-
This is the base URI in scope for the target node as a Python
unicode string.
- childNodes
-
This is the Python list of all the node children of the target
node. Note that in DOM terminology, the attributes of a node are
not children of that node.
node = Parse("<foo a='1'/>")
print node.childNodes
[<Element at 0x4086052c: name u'foo', 1 attributes, 0 children>]
- firstChild
-
This is the first child node of the target node. This is
equivalent to childNodes[0], and is a useful property
for quickly walking the document tree.
node = Parse("<foo a='1'/>")
print node.firstChild
<Element at 0x40860a6c: name u'foo', 1 attributes, 0 children>
- lastChild
-
This is the last child node of the target node. This is
equivalent to childNodes[-1].
node = Parse("<foo a='1'/><!--Hi!-->")
print node.lastChild
<Comment at 0x4087caf4: u'Hi!'>
- localName
-
This is the local name of the target node as a Python unicode
string.
- namespaceURI
-
This is the namespace URI of the target node as a Python
unicode string.
- nextSibling
-
This is the node immediately following the target node, or
None if the target node is the last child of its parent
(or if the target node is an attribute, as attributes are
unordered).
- nodeValue
-
This is the value of the target node as a Python unicode
string, if the target node has a string value. If not, this is
None. To illustrate some of the possibilities,
attributes and text nodes have values, while elements and documents
do not.
- ownerDocument
-
This is the Document node in which the
target node is contained.
- parentNode
-
This is the parent of the target node. If the target node is a
Document node, then this will be
None; Document nodes do not have
parents.
- prefix
-
This is the namespace prefix of the current node, or
None if the current node does not (or cannot) have a
namespace prefix.
- previousSibling
-
This is the node immediately preceding the target node, or
None if the target node is the first child of its
parent (or if the target node is an attribute, as attributes are
unordered).
- rootNode
-
This is a synonym for
ownerDocument.
- xmlBase
-
This is a synonym for baseURI.
In addition to accessing the structure relative to a node, there are
also a set of operations that we can perform on these structures,
including a variety of operations for modifying the document. Some of
these methods allow you to add new nodes in various places; note that in
the DOM, only Document nodes can create new nodes. See “Methods available to Document
objects” for details. The following methods are
available on any node.
Methods available to every Node
object
appendChild(node)
-
This method adds node as the last child
of the current instance. This is useful for manually building a
document in breadth-first document order.
insertBefore(newChild, refChild)
-
This method adds the node newChild to
the current instance immediately before child node
refChild.
replaceChild(newChild, oldChild)
-
This method replaces the child node
oldChild with the
newChild node.
removeChild(oldChild)
-
This method removes the oldChild node
as a child of the instance node.
cloneNode(deep)
-
This method returns a new copy of the current instance. If
(and only if) deep is true, then we copy
deeply: the node's attributes and children are also copied
deeply.
isSameNode(otherNode)
-
This method determines whether the instance node and
otherNode are the same node based upon object
identity.
normalize()
-
This method merges any adjacent text nodes in the attributes
or descendents of the current instance.
hasChildNodes()
-
This method returns true if and only if the instance node has
any child nodes.
xpath(expr, explicitNss)
-
This method evaluates the XPath expression
expr with the current instance as the
expression context and returns an appropriately-valued result. The
explicitNss parameter is optional; it is a
Python dictionary mapping namespace prefixes to namespaces for use
in the expression. See “XPath queries” for
details.
In addition to their behavior as nodes,
Document nodes are uniquely responsible for a
number of tasks. For example, only Document nodes
can create other nodes. The following methods are availble only to
Document nodes.
Methods available to Document
objects
createElementNS(namespaceURI, qualifiedName)
-
This method creates and returns a new
Element with the given namespace URI and
qualified name.
createAttributeNS(namespaceURI, qualifiedName)
-
This method creates and returns a new attribute
(Attr object) with the given namespace URI
and qualified name.
createTextNode(data)
-
This method creates and returns a new
Text node with the string value of
data.
createProcessingInstruction(target, data)
-
This method creates and returns a new processing instruction
(ProcessingInstruction object) with the given
target name and contents taken from
data.
createComment(data)
-
This method creates and returns a new
Comment with the string value of
data.
createDocumentFragment()
-
This method creates and returns a new, empty document fragment
(DocumentFragment object).
importNode(importedNode, deep)
-
Nodes can only belong to one document at a time. This method
creates a copy of the node importedNode that
belongs to the instance (but which does not yet have a parent). If
(and only if) deep is true, then we copy
deeply: the node's attributes and children are also copied deeply
and imported.
Document nodes also have a number of properties that are not found
on other nodes. These properties are summarized in the following
list.
Properties available on Document
objects
- doctype
-
This is a DocumentType object that
encapsulates info about the document's "type", as described in its
DOCTYPE tag. In Domlette, which doesn't use such objects, the value
of the doctype property will always be
None.
- documentElement
-
This is the root element of the document.
- documentURI
-
This is the URI that identifies the document.
- implementation
-
This is the DOMImplementation that
created the document.
- publicId
-
This Domlette-specific property is the public ID of the DTD of
this document.
- rootNode
-
This refers to the current instance.
- systemId
-
This Domlette-specific property is the system ID of the DTD of
this document.
- unparsedEntities
-
This is the list of unparsed entities in the current
document.
Attributes (Attr objects) do not have any
special methods, but they do have a few additional properties. These
properties are summarized in the following list.
Properties available on Attr
objects
- name
-
This is the qualified name of the current instance.
- nodeName
-
This is a synonym for the name
property.
- ownerElement
-
This is a synonym for the parentNode
property.
- specified
-
You will probably never need this property. It is always
1. DOM says it should be 0 if
it is present through defaulting, rather than explicitly specified
in the document. This is only possible if the DOM implementation
preserves certain details from DTD processing, which 4Suite never
does. Therefore the value is always 0.
- value
-
This is a synonym for the nodeValue
property.
Since attributes can only be attached to elements,
Element objects have a set of special methods for
managing which attributes are attached to them. We describe these methods
below.
Methods available to Element
objects
hasAttributeNS(namespaceURI, localName)
-
This method returns true if the current instance has an
attribute with the given namespace URI and local name, and false
otherwise.
getAttributeNS(namespaceURI, localName)
-
This method returns the attribute value of the attribute with the given
namespace URI and local name, if one exists. If not, this returns
None.
getAttributeNodeNS(namespaceURI, localName)
-
This method returns the Attr object of
the attribute with the given namespace URI and local name, if one
exists. If not, this returns None.
removeAttributeNS(namespaceURI, localName)
-
This method removes the attribute with the given namespace URI
and local name from the current instance element.
removeAttributeNode(node)
-
This method removes the attribute node
from the current instance element.
setAttributeNS(namespaceURI, qualifiedName, value)
-
This method adds an attribute or replaces an attribute with
the specified namespace URI and qualified name and sets the content
of that attribute to value.
setAttributeNodeNS(node)
-
This method adds or replaces an attribute using the
Attr object
node.
Elements also have several properties above
and beyond what they get from being Nodes. See the
list below for details.
Properties available on Element
objects
- nodeName
-
This is the qualified name of the current instance.
- tagName
-
This is a synonym for nodeName.
Both Text and Comment
nodes are also more general CharacterData nodes in
the DOM. CharacterData nodes have several
additional properties and methods for managing the string data that they
contain. The individual Text and
Comment nodes, however, do not add any
functionality to their general CharacterData parent
class. You can find descriptions of the properties and methods offered by
CharacterData objects below.
Properties available on CharacterData
objects
- data
-
This is the string content of the current instance.
- length
-
This is the length of the string content of the current
instance.
- nodeValue
-
This is a synonym for data.
Methods available to CharacterData
objects
insertData(offset, data)
-
This method inserts the string data
into the content of the current instance at the index specified by
offset.
appendData(data)
-
This method appends the string data to
the end of the value of the current instance.
replaceData(offset, count, data)
-
This method replaces count number of
characters found at index offset in the
current instance with the string data.
substringData(offset, count)
-
This method retrieves and returns the part of the string value
of the current instance that begins at index
offset and extends
count characters.
deleteData(offset, count)
-
This method deletes the part of the string value of the
current instance that begins at index offset
and extends count characters.
A few DOM actions are not "owned" by any individual document. In
effect, they are general-purpose operations. They can be found in
DOMImplementation objects. One such precreated
instance can be conveniently found at and used from
Ft.Xml.Domlette.implementation. The general methods
that such a DOMImplementation object offers are
listed below.
DOMImplementation methods:
createDocument(namespaceURI, qualifiedName, doctype)
-
This standard DOM method creates and returns a
Document object associated with the given
DocumentTyype object, and having a single
element child with the given QName and namespace. Since Domlette
does not use DocumentTyype objects, the
doctype argument must be given as None.
createRootNode(documentURI)
-
This Domlette-specific method creates a
Document object with the specified document
(base) URI. No document element is created. This method is generally
preferred over createDocument(); see the
following section, 'Building a DOM from scratch'.
hasFeature(feature, version)
-
This method tests whether the DOM implementation implements a
specific feature.
3.2.1 What about
getElementsByTagName()?
The getElementsByTagName() method isn't
supported, because there are better options. In particular, you can just
use XPath:
For more possibilities, see getElementsByTagName
Alternatives.
3.3 Serializing Domlette nodes
Domlette comes with a couple of very fast printer functions which
also go to great pains to correctly handle character encoding issues:
Print and PrettyPrint.
Here are some serialization examples using the Domlette printers, given a
node 'node' (it doesn't have to be a document
node).
from Ft.Xml.Domlette import Print, PrettyPrint
# basic serialization to sys.stdout
Print(node)
# ... with extra whitespace (indenting)
PrettyPrint(node)
# ... using a single tab, rather than 2 spaces, to indent at each level
PrettyPrint(node, indent='\t')
# serializing to a utf-8 encoded file
f = open('output.xml','w')
Print(node, stream=f)
f.close()
# ... to an iso-8859-1 encoded file
f = open('output.xml','w')
Print(node, stream=f, encoding='iso-8859-1')
f.close()
# ... to an ascii encoded string
import cStringIO
buf = cStringIO.StringIO()
Print(node, stream=buf, encoding='us-ascii')
buf.close()
s = buf.getvalue()
# Normally, output syntax (XML or HTML) is chosen based on the DOM type,
# which is automatically detected. A Domlette or XML DOM can be output in
# HTML syntax if the asHtml=1 argument is given.
PrettyPrint(node, asHtml=1)
See also: Serializing
XML from DOM or Domlette documents
3.4 Building a DOM from scratch
As an alternative to parsing a preexisting XML document, you can
also build a document model, with certain limitations, from the ground up.
W3C and Python DOM facilities for doing this are intended mainly for creating
a temporary document whose nodes will be imported into an existing document,
and while Domlette does offer a more convenient document creation method,
it has many of the same limitations. However, for most documents, its
capabilities should be sufficient.
The Ft.Xml.Domlette module
contains a DOMImplementation instance named
implementation which provides a set of methods for
initializing new Documents. The
implementation.createRootNode method takes a base URI
argument and provides a natural approach for creating an XPath model root node.
This is similar to the DOM idea of a document node and even closer to a DOM
document fragment (multiple element children are allowed). The
implementation.createDocument method, on the
other hand, is designed to come close to the DOM interface, although its
doctype argument must be None.
doc = implementation.createRootNode('file:///article.xml')
is the equivalent of
from Ft.Xml import EMPTY_NAMESPACE
doc = implementation.createDocument(EMPTY_NAMESPACE, None, None)
with the added advantage of doc.baseURI being set to
'file:///article.xml', which is not possible to set via standard DOM interfaces
(the baseURI attribute is read-only).
Similarly,
from Ft.Xml import EMPTY_NAMESPACE
doc = implementation.createRootNode('file:///article.xml')
docelement = doc.createElementNS(EMPTY_NAMESPACE, 'article')
doc.appendChild(docelement)
is the equivalent of
from Ft.Xml import EMPTY_NAMESPACE
doc = implementation.createDocument(EMPTY_NAMESPACE, 'article', None)
plus doc.baseURI being set to 'file:///article.xml'.
If you want as much fidelity to the DOM API as Domlette offers, use
implementation.createDocument. If you just want to
create a document or other such root-level node, and never mind the
strange parameters, use
implementation.createRootNode.
3.5 XPath query
You can easily perform XPath queries by use the
xpath method for cDomlette nodes as
follows:
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseString("<spam>eggs<a/><a/></spam>")
print doc.xpath(u'//a')
print doc.xpath(u'string(/spam)')
Notice: this is nothing like W3C DOM's XPath query module. The
emphasis, as usual with Domlette, is on speed, simplicity and
pythonic-ness.
The API, in brief:
node.xpath(expr[, explicitNss])
-
node - will be used as core of the context for evaluating the
XPath
-
expr - XPath expression in string or compiled form
-
explicitNss - (optional) any additional or overriding namespace
mappings in the form of a dictionary that maps prefixes to namespace
URIs. The base namespace mappings are taken from in-scope declarations
on the given node. This explicit dictionary is superimposed on the
base mappings.
For additional details, see “XPath queries”.
3.6 More on base URIs
For some users, always specifying a base URI feels like an
inconvenience. Perhaps they always generate XML sources from text or
streams without naturally associated URIs, and they have to figure out
schemes to come up with base URIs for the parse. But there is good reason
for this pickiness. Just ask one of the users who
got bitten by carelessness with base URIs in practice. It's better
to always put some amount of thought into base URIs when processing XML,
and 4Suite encourages this.
Note that 4Suite only enforces the requirement for base URIs in
cases where they are needed to make sense of a requested operation. Your
document must have a valid base URI if you use external entities,
XInclude, xsl:import, xsl:include, the XSLT document() function, the EXSLT
exsl:document element, or any other operations that require access to an
external resource. If your main use for URI resolution is XSLT import and
includes, you can avoid having to give valid base URIs by using XSLT
include paths.
A valid base URI starts with a scheme, such as
http:. A simple name, such as "spam" is a valid
relative URI reference, but not a valid base URI. Without a base URI, a
relative reference is no more useful than an apartment number given
without the address of the entire apartment building. Merging a base URI
with a relative reference is a string operation that is undertaken in a
standard manner, and is generally only useful when the base URI is
hierarchical; that is, it is a URL using one of the common schemes that
have slashes as path separators (e.g., http:, ftp:, gopher:, and most
file: URLs). The built-in 4Suite URI resolver
Ft.Lib.Uri.BASIC_RESOLVER knows
how to perform such resolution.
3.7 Why does Domlette diverge from the DOM specification?
Domlette is not a complete or fully conformant DOM implementation,
but it does provide an interface very close to W3C DOM Level 2 and the
corresponding Python mapping as laid out in the
xml.dom API docs.
The areas of divergence are inconsequential for most users,
and generally reflect decisions made in the interest of eliminating
redundancy, inefficiency, and, to some degree, un-Pythonic design.
Also, one of the important design principles for Domlette is that
where DOM and XPath disagree, XPath wins; aside from making things
more efficient to implement, this behavior is generally what people
want in an XML document model.
It is also worth noting that in the interest of usability,
all DOM implementations exhibit some degree of variation from the
specs. Coding a completely implementation-agnostic DOM application
is difficult and usually unnecessary.
4 SAX
Saxlette is a fast SAX implementation, all written in C. Its API is
similar to those of Python's
built-in SAX.
from xml import sax
from Ft.Xml import CreateInputSource
class element_counter(sax.ContentHandler):
def startDocument(self):
self.ecount = 0
def startElementNS(self, name, qname, attribs):
self.ecount += 1
parser = sax.make_parser(['Ft.Xml.Sax'])
handler = element_counter()
parser.setContentHandler(handler)
#'file:ot.xml' or file('ot.xml') or file('ot.xml').read() would work just as well, of course
parser.parse(CreateInputSource('ot.xml'))
print "Elements counted:", handler.ecount
If you don't care about PySax compatibility, you can use the more
specialized API, which involves the following lines in place of the
equivalents above:
from Ft.Xml import Sax
...
class element_counter:
....
parser = Sax.CreateParser()
The biggest API differences between Saxlette and PySax are that
Saxlette only supports SAX 2. For example,
feature_namespaces is hard-wired to
True and feature_namespace_prefixes to
False (which is exactly what SAX2 says is required).
Saxlette also combines all adgacent text events, which eliminates one of the
pain points of PySax.
The argument to the parse method is a URI, a SAX
input source or a 4Suite input source. In the example above a URI was used.
The following example shows similar code using 4Suite's Ft.Xml.InputSource.
from Ft.Xml import InputSource, Sax
factory = InputSource.DefaultFactory
isrc = factory.fromUri("file:ot.xml")
doc1 = NonvalidatingReader.parse(isrc)
class element_counter:
def startDocument(self):
self.ecount = 0
def startElementNS(self, name, qname, attribs):
self.ecount += 1
parser = Sax.CreateParser()
handler = element_counter()
parser.setContentHandler(handler)
parser.parse(isrc)
print "Elements counted:", handler.ecount
4.1 Validating a document while parsing it using SAX
To enable validation of your documents while otherwise parsing them
normally with SAX, set the
xml.sax.handler.feature_validation feature to
True on your parser using a line similar to
parser.setFeature(xml.sax.handler.feature_validation, True).
The parser will then throw an
xml.sax._exceptions.SAXParseException exception if
it determines that the document is invalid, and it will stop parsing the
document. Handlers for document components that have been parsed will be
called, however. The following example illustrates these concepts.
from Ft.Xml import InputSource, Sax
factory = InputSource.DefaultFactory
XML = """<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b/></a>"""
isrc = factory.fromString(XML, 'urn:x-example:valid-a')
class element_counter:
def startDocument(self):
self.scount = 0
self.ecount = 0
def startElementNS(self, name, qname, attribs):
self.scount += 1
def endElementNS(self, name, qname):
self.ecount += 1
parser = Sax.CreateParser()
handler = element_counter()
parser.setContentHandler(handler)
# And now, to enable validation...
import xml
parser.setFeature(xml.sax.handler.feature_validation, True)
parser.parse(isrc)
print "Saw", handler.scount, "start tags"
print "Saw", handler.ecount, "end tags"
# And now we show what happens on an invalid document:
XML = """<!DOCTYPE a [
<!ELEMENT a (b, b)>
<!ELEMENT b EMPTY>
]>
<a><b/><b/><b/></a>"""
isrc = factory.fromString(XML, 'urn:x-example:invalid-a')
parser.parse(isrc)
print "Saw", handler.scount, "start tags"
print "Saw", handler.ecount, "end tags"
# The above document is invalid; it has one more `b` element than is
# allowed by the DTD. The handlers have still been called for those
# parts of the document that have been parsed.
4.2 Walking a DOM to fire SAX events
Saxlette has the ability to walk a Domlette tree, firing off events
to a handler as if from a source document parse. This ability used to be
too well, hidden, though, and I made an API addition to make it more
readily available. This is the new
Ft.Xml.Domlette.SaxWalker. The following example
should show how easy it is to use:
from Ft.Xml.Domlette import SaxWalker
from Ft.Xml import Parse
XML = "<a><b/><b/></a>"
class element_counter:
def startDocument(self):
self.ecount = 0
def startElementNS(self, name, qname, attribs):
self.ecount += 1
#First get a Domlette document node
doc = Parse(XML)
#Then SAX "parse" it
parser = SaxWalker(doc)
handler = element_counter()
parser.setContentHandler(handler)
#You can set any properties or features, or do whatever
#you would to a regular SAX2 parser instance here
parser.parse() #called without any argument
print "Elements counted:", handler.ecount
4.3 Building a Domlette from SAX events
Saxlette includes a convenience ContentHandler
(Ft.Xml.Sax.DomBuilder) which listens for SAX
events and constructs Domlette Documents.
4.4 Feeding a generator from SAX events
Python's generators are special functions that can produce a series
of partial results within the course of running. The calling program can
start up a generator, which is suspended when a partial result is yielded,
and resumed explicitly by the program when the next result is required.
This capability is mirrored in the Expat parser that is the basis of
Saxlette. Saxlette has a feature, FEATURE_GENERATOR
which you can set on a parser object to enable generator semantics. If
this feature is set, the parse() method returns an
iterator. This iterator yields results set by the the SAX handlers. The
handlers specify the partial results by setting the property
PROPERTY_YIELD_RESULT with the value to be yielded. As
an example, the following code reports the name of all attributes used in
the document.
class report_attributes:
def __init__(self, parser):
self.parser = parser
return
def startElementNS(self, name, qname, attribs):
self.parser.setProperty(Sax.PROPERTY_YIELD_RESULT, attribs)
return
from Ft.Xml import Sax, CreateInputSource
parser = Sax.CreateParser()
parser.setFeature(Sax.FEATURE_GENERATOR, True)
handler = report_attributes(parser)
parser.setContentHandler(handler)
attribs_iterator = parser.parse(CreateInputSource('test.xhtml'))
for attribs in attribs_iterator:
for name in attribs.keys(): print name
4.5 SAX filters
In SAX processing, the parser passes to the application a stream of events that represents the XML content. An important aspect of SAX is the user's ability to create SAX filters, which accept a stream of SAX events and pass on a modified stream. For example, you might use a SAX filter to take look for DOcbook sect1, sect2 etc. elements, and rename them to section elements before passing them on for further processing (presumably by a SAX handler that only understands how to deal with the latter form). You can chain SAX filters as well, and the idea behind SAX filters is usually reuse across a broad array of applications, focusing each filter they on a single task that can be cleanly separated from upstream and downstream processing. SAX filters can thus be useful building blocks for XML pipelines.
from xml import sax
from xml.sax.saxutils import XMLFilterBase
from Ft.Xml import CreateInputSource, XML_NAMESPACE as XMLNS
from Ft.Xml.Sax import SaxPrinter
XML = """<?xml version="1.0" encoding="utf-8"?>
<menu>
<item id="A" xml:lang="en">Orange juice</item>
<item id="A" xml:lang="es">Jugo de naranja</item>
<item id="B" xml:lang="en">Toast</item>
<item id="B" xml:lang="es">Pan tostada
<note xml:lang="en">Wheat bread only, please</note>
</item>
</menu>
"""
#Define constants for the two states we care about
ALLOW_CONTENT = 1
SUPPRESS_CONTENT = 2
class english_only_filter(XMLFilterBase):
def __init__(self, downstream):
XMLFilterBase.__init__(self, downstream)
return
def startDocument(self):
#Set the initial state, and set up the stack of states
self._state_stack = [ALLOW_CONTENT]
XMLFilterBase.startDocument(self)
return
def startElementNS(self, name, qname, attrs):
#Check if there is any language attribute
lang = attrs.get((XMLNS, 'lang'))
if lang:
#Set the state as appropriate
if lang[:2] == 'en':
self._state_stack.append(ALLOW_CONTENT)
else:
self._state_stack.append(SUPPRESS_CONTENT)
#Always update the stack with the current state
#Even if it has not changed
#Only forward the event if the state warrants it
if self._state_stack[-1] == ALLOW_CONTENT:
XMLFilterBase.startElementNS(self, name, qname, attrs)
return
def endElementNS(self, name, qname):
self._state_stack.pop()
#Only forward the event if the state warrants it
if self._state_stack[-1] == ALLOW_CONTENT:
XMLFilterBase.endElementNS(self, name, qname)
return
def characters(self, content):
#Only forward the event if the state warrants it
if self._state_stack[-1] == ALLOW_CONTENT:
XMLFilterBase.characters(self, content)
return
if __name__ == "__main__":
parser = sax.make_parser(['Ft.Xml.Sax'])
#SaxPrinter is a special SAX handler that merely writes
#SAX events back into an XML document
filtered_parser = english_only_filter(parser)
handler = SaxPrinter()
filtered_parser.setContentHandler(handler)
filtered_parser.parse(CreateInputSource(XML))
Most SAX handlers operate as state machines, meaning they manage some variables based on the stream of events that come in, and change behavior based on these variables. english_only_filter is set up to be in one of two states: one in which content is passed on to the downstream handler, and one in which content is suppressed. This state is marked in the self._state_stack. The state is initially set to ALLOW_CONTENT, and changed to SUPPRESS_CONTENT if the filter encounters an xml:lang attribute that represents a language other than English (which can be done by checking the first two characters of the value, according to the rules of standard language codes). It has to be a stack because XML language specifications are scoped, so that in the example XML at the top of the listing the string "Pan tostada" is within the scope of the element with the attribute xml:lang="es", and so it is marked as being in Spanish. The entire note element, however, is marked as being in English by an overriding xml:lang="en" attribute.
The SAX handler is set to Ft.Xml.SaxPrinter, which channels the final SAX evenis onto a 4Suite printer which creates a serialized XML document. It's quite easy to chain filters. If you wanted the parser to send events to a filter of class some_other_filter which then passed on events to english_only_filter the relevant line would look as follows:
filtered_parser = english_only_filter(some_other_filter(parser))
4.6 Streaming canonicalization
The combination of streaming parsing using Saxlette and streaming serialization using Ft.Xml.Lib.CanonicalXmlPrinter allows for
very efficient XML canonicalization (c14n).
import sys
from xml import sax
from Ft.Xml import CreateInputSource
from Ft.Xml.Sax import SaxPrinter
from Ft.Xml.Lib.XmlPrinter import CanonicalXmlPrinter
parser = sax.make_parser(['Ft.Xml.Sax'])
handler = SaxPrinter(CanonicalXmlPrinter(sys.stdout))
parser.setContentHandler(handler)
parser.parse(CreateInputSource(' <a><b b="1" a="2"/></a> '))
5 XPath queries
4Suite provides an XPath processing engine, compliant with the W3C XPath 1.0 specification.
This query engine is accessible through Ft.Xml.XPath.
5.1 The quickest option
If you are using Domlette, as described above, the quickest and
easiest way to use the XPath facility in 4Suite is the
xpath() method, which any Domlette
Node supports:
from Ft.Xml.Domlette import NonvalidatingReader
doc = NonvalidatingReader.parseString("<spam>eggs<a/><a/></spam>")
doc2 = NonvalidatingReader.parseString("<spam>eggs<eggs n='1'> and ham</eggs></spam>")
print doc.xpath(u'(//a)[1]')
print doc.xpath(u'string(/spam)')
print doc2.xpath(u'string(//eggs/@n)')
The line
print doc.xpath(u'(//a)[1]')
Is actually a shortcut for the following more involved construct,
which is described in detail in the next section:
from Ft.Xml.XPath import Evaluate
print Evaluate(u'(//a)[1]', contextNode=doc)
This example prints three lines. The first line shows a string
representation of a list containing a single element. As we see from this
line, an XPath selection of nodes returns a Python list. In this case, it
is a list containing a single element—the first element with a local name
of a, which has no attributes and no
children. The second line shows the correct string value of the selected
spam element, and the third line shows
the correct string value of the n
attribute.
[<Element at 0xb7d10bb4: name u'a', 0 attributes, 0 children>]
eggs
1
5.2 Type mappings
4Suite XPath functions return results with Python types that depend
on the XPath data model type of the query result. The following list shows
how the five XPath result types (String, number, boolean, node-set and
object) are mapped to Python types:
-
XPath string: Python unicode type
-
XPath number: Python float type (int or long also accepted), or
instance of Ft.Lib.number.nan (for NaN) or Ft.Lib.number.inf (for
Infinity)
-
XPath boolean: Ft.Lib.boolean instance
-
XPath node-set: Python list of Domlette nodes, in document
order, with no duplicates
-
XPath foreign object: any other Python object (you will very
rarely encounter this case)
5.3 Advanced use
XPath expressions can refer to both variables and qualified names
(QNames) that must be defined by the environment that is executing the
XPath expression. This section describes how to use these advanced
features of XPath using the 4Suite interface.
4Suite's XPath implementation uses a Domlette node as the context
node for XPath operations. The following example demonstrates the use of
XPath to extract content from an XML document. The document must be parsed
before Xpath can be used to access it. The following example parses the
XML document and explicitly sets up an XPath context to run an XPath
query.
XML = """
<ham>
<eggs n='1'/>
This is the string content with <em>emphasized text</em> text
</ham>"""
from Ft.Xml import Parse
from Ft.Xml.XPath.Context import Context
from Ft.Xml.XPath import Evaluate
doc = Parse(XML)
ctx = Context(doc)
nodes = Evaluate(u'//em', ctx)
# The return value, a node set, comes back as a Python list of nodes
# which may be accessed using an iterator
for n in nodes:
# print dir(n)
print n.tagName
print n.firstChild.nodeValue
XPath always requires a context for execution; a common XPath
context is the root of the target document, such as we did in the above
example. Think about an XPath query being executed from some location in
an XML document. This location in the document is a necessary component of
using XPath.
There is more to an XPath context than just the context node, but if
your needs are as straightforward as that of the above example, there is
an abbreviated version of the Evaluate method for
this purpose. For example, the following fragment is equivalent to the two
lines creating a context and evaluating the expression in the above
example.
# No need to create a context object
Evaluate(u'//em', contextNode=doc)
If your source document uses XML Namespaces you will likely need to
use QNames in your XPath expressions. For this to work, you'll need to
introduce namespace mappings into your XPath context. For example, if the
elements of our XML document above are in an XML namespace, then we must
set up our context slightly differently.
XML = """<ham xmlns="http://example.com/ns#">
<eggs n='1'/>
This is the string content with <em type='bold'>emphasized Namespaced Text</em> text
</ham>"""
from Ft.Xml import Parse
from Ft.Xml.XPath.Context import Context
from Ft.Xml.XPath import Evaluate
NSS = {u'ex': u'http://example.com/ns#'}
doc = Parse(XML)
ctx = Context(doc, processorNss=NSS)
nodes = Evaluate(u'//ex:em', ctx)
for n in nodes:
# print dir(n)
print n.tagName
print n.firstChild.nodeValue
You define XPath namespace prefixes through a Python dictionary
(NSS in the above example) which maps these prefixes,
such as 'ex' in the above example, to the appropriate
namespace URI, such as 'http://example.com/ns#' in the
above example. This prefix mapping is added to your XPath context using
the processorNss parameter to the
Context function.
In a similar way, you can also pass in variable bindings which may
be used as values later in your XPath expressions. In this case, however,
variables are Python tuples containing the namespace URI and local name of
the variable.
ctx = Context(node, varBindings=
{(EMPTY_NAMESPACE, u'date'): u'2003-06-20'})
Evaluate('event[@date = $date]', context=ctx)
This creates a variable in the default namespace named 'date', with
a value of '2003-06-20'; this is then used for
comparison with the date attribute in the Xpath expression.
XPath variables are Qnames, so you pass in variable names as
namespace/local name tuples. The values can be numbers, unicode objects or
boolean objects:
from Ft.Xml.XPath import boolean
ctx = Context(node, varBindings={(EMPTY_NAMESPACE, u'test'): boolean.true})
This sets the variable 'test' to the boolean value true (remember
that this is for the XPath environment, not the Python one), and again
this may be used as in any XSLT stylesheet.
If you only want a value once, you may of course still use string
constants, as in
nodes=Evaluate(u'//testPrefix:em[@type="bold"]',ctx)
Note the quotes used? These must be balanced, hence the literal
value uses double quotes.
5.4 Reusing parsed XPath queries
Sometimes you want to re-use an XPath expression and namespace
mapping multiple times, for efficiency and convenience. The following
example shows an example of this:
from Ft.Xml.XPath.Context import Context
from Ft.Xml.XPath import Compile, Evaluate
from Ft.Xml import Parse
DOCS = ["<spam xmlns='http://spam.com'>eggs</spam>",
"<spam xmlns='http://spam.com'>grail</spam>",
"<spam xmlns='http://spam.com'>nicht</spam>",
]
# Pre-compile for efficiency and convenience
expr = Compile(u"/a:spam[contains(., 'i')]")
ctx = Context(None, processorNss={u"a": u"http://spam.com"})
i = 1
for doc in DOCS:
doc = NonvalidatingReader.parseString(doc.encode('UTF-8'),
"http://spam.com/base")
retval = Evaluate(expr, doc, ctx)
if len(retval):
print "Document", i, "meets our criteria"
i += 1
Which should display:
Document 2 meets our criteria
Document 3 meets our criteria
5.5 Migration from PyXML's XPath
There is a usable XPath module in PyXML (warning: PyXML's XSLT
implementation is not usable: use 4Suite if you need XSLT), but there are
a lot of updates and improvements in the XPath library version in
4Suite.
If you are familiar with PyXML, you may have used a different form
of imports to load in XPath and XSLT features. The imports are different
under 4Suite.
Usage example:
-
PyXML usage (do not use with 4Suite):
import xml.xslt
import xml.xpath
-
4Suite usage (use these imports):
import Ft.Xml.XPath
import Ft.Xml.Xslt
6 XSLT processing
6.1 The super-simple XSLT API
For basic XSLT transform needs, or to get started quickly, the
Ft.Xml.Xslt module offers a quick
way to apply transforms XML documents and get back the simple string
result. Within this module, the function of interest is
Transform.
Transform(fname_or_uri, string_stream_fname_uri_isrc, [param], [output])
-
The Transform function takes two
arguments, with an optional third. The first is the source XML for the transform. The
second is the XSLT document. Both are given as a string, an object like an
open file, a local file path on your computer, an absolute URI, or
an InputSource object. The optional params is a dictionary of stylesheet parameters, the keys of
which may be given as unicode objects if they have no namespace,
or as (uri, localname) tuples if they do. The values are the overriden parameter values. If you do not supply the optional output parameter the return value is a string with the result
of this transform. If you do supply this parameter it must be a file-like object to which the output will be written, and then the return value is None.
XML = """
<ham>
<eggs n='1'/>
This is the string content with <em>emphasized text</em> text
</ham>"""
from Ft.Xml.Xslt import Transform
# URL for the identity transform: reproduces the input XML in the result
ID_TRANSFORM = 'http://cvs.4suite.org/viewcvs/*checkout*/4Suite/Ft/Data/identity.xslt'
result = Transform(XML, ID_TRANSFORM)
print result
# If the above XML document were located in the file
# "target.xml", we could have used `Transform("target.xml", ID_TRANSFORM)`.
#It's more efficient to redirect the processor output to an output stream. The following does so:
import sys
result = Transform(XML, ID_TRANSFORM, output=sys.stdout)
print result
6.2 Full XSLT processing API
Here is the general procedure for using the Python API for XSLT
processing:
-
Create an Ft.Xml.Xslt.Processor.Processor
instance.
-
Prepare Ft.Xml.InputSource instances (via
their factory) for the source XML and stylesheet.
-
Call the Processor's appendStylesheet
method, passing it the stylesheet's
InputSource.
-
Call the Processor's run method,
passing it the source document's
InputSource.
For input to our transform, we will use the namespaced example as in
the last section.
$ cat testNS.xml
<ham xmlns="http://example.com/ns#">
<eggs n='1'/>
This is the string content with
<em type='bold' f='2'>emphasized Namespaced Text</em>
text
</ham>
For our stylesheet, we will again use one of the simplest useful
examples, the identity stylesheet.
$ cat identity.xsl
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="@*|node()">
<xsl:copy>
|