There are two levels of correctness of an XML document:
- Well-formed-ness. A well-formed document conforms to all of XML’s syntax rules. For example, if a start-tag appears without a corresponding end-tag, it is not well-formed. A document that is not well-formed is not considered to be XML.
Sample characteristics:
- XML documents must have a root element
- XML elements must have a closing tag
- XML tags are case sensitive
- XML elements must be properly nested
- XML attribute values must always be quoted
- Validity. A valid document conforms to semantic rules. The rules are included as XML schema, especially DTD. Examples of invalid documents include: if a required attribute or element is not present in the document; if the document contains an undefined element; if an element is meant to be repeated once, and appears more than once; or if the value of an attribute does not conform to the defined pattern or data type.
XML Structure, Continued
XML validation mechanisms include using DTD and XML schema like XML Schema and RelaxNG.
Document Type Definition (DTD)
A DTD defines the tags and attributes used in a XML or HTML document. Elements defined in a DTD can be used, along with the predefined tags and attributes of each markup language. DTD support is ubiquitous due to its inclusion in the XML 1.0 standard.
DTD Advantages: |
DTD Disadvantages: |
Easy to read and write (plain text file with a simple semi-xml format). |
No type definition system. |
Can be used as an in-line definition inside the XML documents. |
No means of element and attribute content definition and validation. |
Includes #define, #include, and #ifdef; the ability to define shorthand abbreviations, external content, and some conditional parsing. |
|
A sample DTD document
1 <?xml version=”1.0” encoding=”UTF-8”?>
2 <!ELEMENT publications (book*)>
3 <!ELEMENT book (title, author+, copyright, publisher, isbn,
4 description?)>
5 <!ELEMENT title (#PCDATA)>
6 <!ELEMENT author (#PCDATA)>
7 <!ELEMENT copyright (#PCDATA)>
8 <!ELEMENT publisher (#PCDATA)>
9 <!ELEMENT isbn (#PCDATA)>
10 <!ELEMENT description (#PCDATA)>
11 <!ATTLIST book id ID #REQUIRED image CDATA #IMPLIED>
12 <!ATTLIST isbn kind (10|13) #REQUIRED >
Line 2: publications element has 0...unbounded number of book elements inside it. |
Line 3: book element has one or more author elements, 0 or 1 description elements and exactly one title, copyright, publisher and isbn elements inside it. |
Line 11: book element has two attributes, one named id of type ID which is mandatory, and an image attribute from type CDATA which is optional. |
Line 12: isbn element has an attribute named kind which can have 10 or 13 as its value. |
DTD Attribute Types
DTD Attribute Type |
Description |
CDATA |
Any character string acceptable in XML |
NMTOKEN |
Close to being a XML name; first character is looser |
NMTOKENS |
One or more NMTOKEN tokens separated by white space Enumeration List of the only allowed values for an attribute |
ENTITY |
Associates a name with a macro-like replacement |
ENTITIES |
White-space-separated list of ENTITY names |
ID |
XML name unique within the entire document |
IDREF |
Reference to an ID attribute within the document |
IDREFS |
White-space-separated list of IDREF tokens |
NOTATION |
Associates a name with information used by the client |
What a DTD can validate |
Element nesting |
Element occurrence |
Permitted attributes of an element |
Attribute types and default values |
XML Schema Definition (XSD)
XSD provides the syntax and defines a way in which elements and attributes can be represented in a XML document. It also advocates the XML document should be of a specific format and specific data type. XSD is fully recommended by the W3C consortium as a standard for defining a XML Document. XSD documents are written in XML format.
XSD Advantages: |
XSD Disadvantages: |
XSD has a much richer language for describing what element or attribute content “looks like.” This is related to the type system. |
Verbose language, hard to read and write |
XSD Schema supports Inheritance, where one schema can inherit from another schema. This is a great feature because it provides the opportunity for re-usability. |
Provides no mechanism for the user to add more data types. |
It is namespace aware and provides the ability to define its own data type from the existing data type. |
|
A sample XSD document
1 <?xml version=”1.0” encoding=”UTF-8”?>
2 <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”
3 xmlns:extras=”http://xml.dzone.org/schema/publications”
4 attributeFormDefault=”unqualified” elementFormDefault=”unqualified”
5 xmlns=”http://xml.dzone.org/schema/publications”
6 targetNamespace=”http://xml.dzone.org/schema/publications”
7 version=”4”>
8 <xs:element name=”publications”>
9 <xs:complexType>
10 <xs:sequence>
11 <xs:element minOccurs=”0” maxOccurs=”unbounded”
12 ref=”book”/>
13 </xs:sequence>
14 </xs:complexType>
15 </xs:element>
16 <xs:element name=”book”>
17 <xs:complexType>
18 <xs:sequence>
19 <xs:element ref=”title”/>
20 <xs:element minOccurs=”1” maxOccurs=”unbounded”
21 ref=”author”/>
22 <xs:element ref=”copyright”/>
23 <xs:element ref=”publisher”/>
24 <xs:element ref=”isbn”/>
25 <xs:element minOccurs=”0” ref=”description”/>
26 </xs:sequence>
27 <xs:attributeGroup ref=”attlist.book”/>
28 </xs:complexType>
29 </xs:element>
30 <xs:element name=”title” type=”xs:string”/>
31 <xs:element name=”author” type=”xs:string”/>
32 <xs:element name=”copyright” type=”xs:string”/>
33 <xs:element name=”publisher” type=”xs:string”/>
34 <xs:element name=”isbn”>
35 <xs:complexType mixed=”true”>
36 <xs:attributeGroup ref=”attlist.isbn”/>
37 </xs:complexType>
38 </xs:element>
39 <xs:element name=”description” type=”xs:string”/>
40 <xs:attributeGroup name=”attlist.book”>
41 <xs:attribute name=”id” use=”required” type=”xs:ID”/>
42 <xs:attribute name=”image”/>
43 </xs:attributeGroup>
44 <xs:attributeGroup name=”attlist.isbn”>
45 <xs:attribute name=”kind” use=”required”>
46 <xs:simpleType>
47 <xs:restriction base=”xs:token”>
48 <xs:enumeration value=”10”/>
49 <xs:enumeration value=”13”/>
50 </xs:restriction>
51 </xs:simpleType>
52 </xs:attribute>
53 </xs:attributeGroup>
54 </xs:schema>
Lines 2 – 7: Line 2 defines XML Schema namespace. Line 3 defines available schemas where it can use its vocabulary. Line 4 specifies whether locally declared elements and attributes are namespace qualified or not. A locally declared element is an element declared directly inside a complexType (not by reference), Line 5 declares the default namespace for this schema document. Lines 6 and 7 define the namespace that a XML document can use in order to make it possible to validate it with this schema. |
XML Schema Definition (XSD), Continued
Lines 9 – 14: An element named publications has a sequence of an unbounded number of books inside it. |
Line 20: the element named book has a sequence of multiple elements inside it including author which at least should appear as 1, and also an element named description with a minimum occurrence of 0. Its maximum occurrence is the default value which is 1. |
Lines 34 – 38: the isbn element has a group of attributes referenced by a attlist.isbn. This attribute group includes one attribute named kind (Lines 46 – 51) with a simple value. The value has a restriction which requires it to be one of the enumerated values included in the definition. |
The separation of an element type definition and its use. We declared our types separately from where we referenced them (use them). ref attributes point to a declaration with the same name. Using this technique we can have separate XSD files and each of them contains definition and declarations related to one specific package. We can also import or include them in other XSD documents, if needed.
Import and include. The import and include elements help to construct a schema from multiple documents and namespaces. The import element brings in a schema from a different namespace, while the include element brings in a schema from the same namespace. When include is used, the target namespace of the included schema must be the same as the target namespace of the including schema. In the case of import, the target namespace of the included schema must be different.
To validate XML files using external XSD, replace line 17 – 20 of the DOM sample with:
factory.setValidating(false);
factory.setNamespaceAware(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance("http:/www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(new Source[]{new StreamSource("src/publication.xsd"))});
XML Schema validation factors
Validation factor |
Description |
Length, minLength, maxLength, maxExclusive, maxInclusive, minExclusive, minInclusive |
Enforces a length for the string derived value, either its maximum, minimum, maximum or minimum, inclusive and exclusive. |
enumeration |
Restricts values to a member of a defined list |
TotalDigits, fractionDigits |
Enforces total digits in a number; signs and decimal points skipped. Enforces total fractional digits in a fractional number |
whiteSpace |
Used to preserve, replace, or collapse document white space |
XML Schema built-in types
Type |
Description |
anyURI |
Uniform Resource Identifier |
base64Binary |
base64 encoded binary value |
Boolean; byte; dateTime; integer; string |
True, false or 0, 1; Signed quantity >= 128 and < 127; An absolute date and time; Signed integer; Unicode string |
ID, IDREF, IDREFS,ENTITY, ENTITIES, |
Used to preserve, replace, or collapse document white space |
NOTATION, NMTOKEN,NMTOKENS |
Same definitions as those in DTD |
language |
"xml:lang" values from XML 1.0 Recommendation. |
name |
An XML name |
DTD and XSD validation capabilities
W3C XML Schema Features |
DTD Features |
Namespace-qualified element and attribute declarations |
Element nesting |
Simple and complex data types |
Element occurrence |
Type derivation and inheritance |
Permitted attributes of an element |
Element occurrence constraints |
Attribute types and default values |
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}