Monday, July 21, 2008

The basics of using XML Schema to define elements






developerWorks > XML >
The basics of using XML Schema to define elements
Get started using XML Schema instead of DTDs for defining the structure of XML documents

The new XML Schema system, now nearing acceptance as a W3C recommendation, aims to provide a rich grammatical structure for XML documents that overcomes the limitations of the DTD (see the sidebar, Limitations of DTDs). This article demonstrates the flexibility of schemas and shows how to define the most fundamental building block of XML documents -- the element -- in the XML Schema system.
XML Schema is more powerful than DTD. To illustrate the power of the XML Schema mechanism, the first three listings briefly compare the different ways of representing elements. Listing 1 shows an excerpt of an XML document. Listing 2 shows these two elements declared in DTD syntax, and Listing 3 consists of the corresponding XML Schema syntax. Note that the syntax in Listing 3 is the same as XML syntax. Through the schema, a validating parser can verify that the element InvoiceNo is a positive integer and the element ProductID consists of one letter between A and Z followed by six digits. By contrast, a validating parser referring to the DTD can only verify that these elements are represented as strings.Listing 1: An XML document fragment
123456789
J123456Listing 2: DTD fragment describing elements in Listing 1

Listing 3: XML Schema fragment describing elements in Listing 1
Using namespaces in XML Schema
In the collaborative world, one person may be processing documents from many other parties and the different parties may want to represent their data elements differently. Moreover, in a single document, they may need to separately refer to elements with the same name that are created by different parties. How can you distinguish between such different definitions with the same name? XML Schema allows the concept of namespaces to distinguish the definitions.
Limitations of DTDs
Although DTDs have served SGML and HTML developers well for 20 years as a mechanism of describing structured information, DTDs have severe restrictions compared to XML Schema.
DTDs call for elements to consist of one of three things:
A text string
A text string with other child elements mixed together
A set of child elements DTD does not have XML syntax and offers only limited support for types or namespaces.
A given XML Schema defines a set of new names such as the names of elements, types, attributes, attribute groups, whose definitions and declarations are written in the schema. Listing 3 defines the names as InvoiceNo, ProductID, and ProductCode.
The names defined in a schema are said to belong to its target namespace. A namespace itself has a fixed but arbitrary name that must follow the URL syntax. For example, you can set the name of the namespace for the schema excerpted in Listing 3 to be: http://www.SampleStore.com/Account.
The syntax of namespace names can be confusing. Even though the namespace name starts with http://, it does not refer to a file at that URL that contains the schema definition. In fact, the URL http://www.SampleStore.com/Account does not refer to any file at all, only to an assigned name.
Definitions and declarations in a schema can refer to names that may belong to other namespaces. In this article, we refer to those namespaces as source namespaces. Each schema has one target namespace and possibly many source namespaces. In fact, every name in a given schema belongs to some namespace. The names for the namespaces can be fairly long, but they can be abbreviated with the syntax of xmlns declaration in the XML Schema document. We can add more to the example schema as shown in Listing 4 to illustrate these concepts.Listing 4: Target and source namespaces
In the XML Schema in Listing 4, the targetNamespace name is http://www.SampleStore.com/Account, which contains the names InvoiceNo, ProductID, and ProductCode. The names schema, element, simpleType, pattern, string, and positive-integer belong to source namespace http://www.w3.org/1999/XMLSchema, which is abbreviated as xsd through the xmlns declaration. There is nothing special about the alias name xsd; we could have chosen any name. For convenience and simplicity in the rest of this article, we use xsd to refer to namespace http://www.w3.org/1999/XMLSchema and we omit the qualification xsd in some code snippets. In this example, the targetNamespace also happens to be one of the source namespaces because the name ProductCode is used in defining other names.Figure 1: Namespaces for Listing 4
The schema fragment in Listing 4 does not need to specify locations of source schema files. For the overall "schema of schemas," http://www.w3.org/1999/XMLSchema, you need not specify a location because it is well known. For the source namespace http://www.SampleStore.com/Account, you do not need to specify a location since it also happens to be the name of the target namespace that is being defined in this file. To understand better how to specify the schema location and use the default namespace, consider the extension to the example in Listing 5.Listing 5: Multiple source namespaces, importing a namespace

Listing 5 includes one more namespace reference: http://www.PartnerStore.com/PartsCatalog. This namespace is different from targetNamespace and standard namespaces. As a result, it must be imported using the import declaration element whose schemaLocation attribute specifies the location of the file that contains the schema. The default namespace is http://www.w3.org/1999/XMLSchema, whose xmlns declaration does not have a name. Every unqualified name such as schema and element belongs to default namespace http://www.w3.org/1999/XMLSchema. If your schema refers to several names from one namespace, it is more convenient to designate that as the default namespace.
An XML instance document may refer to names of elements from multiple namespaces that are defined in multiple schemas. To refer to and abbreviate the name of a namespace, again use xmlns declarations. We use the schemaLocation attribute from the XML Schema instance namespace to specify the file locations. Note that this attribute differs from the same named attribute schemaLocation of xsd namespace in the previous examples.Listing 6: Using multiple namespace names from multiple schemas
123456789Figure 2: Namespaces for Listings 5 and 6
Back to top
Defining elements
To define an element is to define its name and content model. In XML Schema, the content model of an element is defined by its type. Then, the instance elements in an XML document can have only values that fit the types defined in its schema.
Simple typesXML Schema specification defines a number of simple types for values, as shown in Table 2: Predefined simple types of values.
A type can be simple or complex. A simple type cannot contain elements or attributes in its value. A complex type can create the effect of embedding elements in other elements or it can associate attributes with an element. (The examples in this article to this point have been user-defined simple types (see ProductCode)). The XML Schema spec also includes predefined simple types (see the sidebar Simple types). A derived simple type constrains the values of its base type. For example, the values of derived simple type ProductCode is a subset of the values of base type string.
Simple, non-nested elements have a simple type
An element that does not contain attributes or other elements can be defined to be of a simple type, predefined or user-defined, such as string, integer, decimal, time, ProductCode, etc.Listing 7: Some simple types for elements
Elements with attributes must have a complex type
Now, try adding the attribute currency to the simple element price from Listing 7. You can't. An element of a simple type cannot have an attribute. If you want to add an attribute, you must define price as a complex type. In the example in Listing 8, we have defined what is called an anonymous type, where no explicit name is given to the complex type. In other words, the name attribute of the complexType element is not defined. Listing 8: A complex element type
Elements that embed other elements must have a complex type
In an XML document, an element may embed other elements. This requirement is expressed directly in the DTD. XML Schema instead defines an element, which has a type, and that type can have declarations of other elements and attributes. See Table 1 for a simple exampleTable 1: A comparison of complex data types in DTD and XML SchemaXML document


No comments: