My Pages

Thursday, 17 March 2011

XML Practice

Introduction


The term XML stands for eXtensible Markup language but this three letter acronym fails to reveal the real potential behind XML. Those people having a good grasp on what XML is know of the importance and potential of XML but on the other hand XML can be quite confusing to people just starting out since XML is a subject that is hard to pin down due to the large number of technologies and specifications involved. For starters XML is used in a multitude of technological areas and web development is one of them. For example XML can be used from configuration files for an application  to facilitating distributed computing over the internet (Microsoft .NET Remoting). 


It's a common misconception to consider XML as a markup language this is due to the fact that  XML defines or specfies a markup language such as XHTML ,XSLT ,XPATH, SOAP and RSS. All the aforementioned markup languages were defined using XML and inherit all the properties of an XML markup language such as that of being well formed ,structured and correct. XML has an important role in interconnecting applications especially applications working on different platforms. Most of the large software corporations such as Microsoft and Sun have adopted and incorporated XML in their systems and retail software due to XML's usefulness in interconnecting applications and to XML's independence from the medium.  


XML's syntax is quite easy to understand and the most basic of XML documents has to have the following terms:
  • XML declaration <?xml version="1.0" ?>
  • open root element  e.g. <student>
  • close root element e.g.  </student>
An XML document is composed of a number of nested elements each of which can have one or more attributes. Attributes are optional and since an attribute is written on the same line as the element declaration the parsed xml stream is slightly smaller in size. XML must be well formed which means that the syntax forming the document must conform to a number of basic rules. 
  • An element always has to be closed and if that element neither contains an element or data then that element needs to be closed in the opening tag such as :
    • <br />
    • <input type="text" id="btnSubmit" />
    • <student />
  • An element can never be nested in a different nesting level.
Apart from these  rules which are part of every XML document there are other ways in which rules can be added  to an XML document, primarily this validates the document against the schema e.g. XHTML. The schema defines the following constructs in an XML type document.


  • Define which elements may appear
  • Element Attributes
  • Constraint (Optional or Required)
The schema can be created using DTD (Data Type Definition) Schema or XML (XSD) Schema. Later in this post we will go through the creation and application of schema on an XML document.


Task Overview

The exercise for this week relate to XML are these were the tasks assigned to get better acquainted with the subject:
  • Create an XML file to keep the following data about a student project:
    • student name, student ID, project title, project category, abstract, date submitted
  • Use element and attributes to define the student project
  • Validate the XML using a W3C Validator
  • Create a DTD Schema for the XML file and validate the XML against the schema using W3C DTD Validator


Create and validate XML file using both elements and attributes

There are a lot of xml editors out there which can help with the development of XML since writing an XML document is a daunting task prone to errors. I find that Altova XMLSpy to be one of my personal favorite for a number of reasons like the XML grid with shows a tabular structure of the XML and the validate and well-formedness function. 


Altova XML Spy Grid View




The first element in the file represents the root node of the document which in this case is <projects>. The decision to make projects as the root node as opposed to students is to allow for multiple students working on the same project and the same student working on more than one project (many to many relationship).  In the projects node a number of projects can be found which are denoted by the <project> element. The project element is assigned the project title attribute. In each project a student node is set which will contain the students pertaining to the project using the student element. The student elements in the student parent node contain the student ID attribute and the student name element. The elements remaining are the project category, abstract and date submitted which are all placed within the project element. The final markup without any data will look like this in an XML editor:


XML Representing Student Project




Validating the xml document at this stage will involve checking the well-formedness and correctness of the document. The character data will not be checked unless the data is placed outside the element.


Validating XML Document




Create a DTD Schema for the XML file and validate against the DTD Schema


 A DTD schema is primarily a list of elements and attributes that define the document using the schema. A DTD can be written inside the XML document or referenced externally.




Referencing an external DTD from an XML document
The elements defined are the elements making up the XML document such as projects which in turn contain multiple of the project element. The element is denoted by the !ELEMENT and to denote multiple project element the + syntax is used. 
Any element which contain one or more elements nested inside them are all listed within brackets. If only text is required to be listed in an element #PCDATA is stated. Defining an attribute is slightly different and if the attribute is required #REQUIRED is stated.

DTD for the Student Project XML document

Validating the XML document with the DTD triggers a validation procedure in which the document being validated is matched against the rules set in the schema.  Essentially the validator is an XML parser which on encountering an error will access the parseError object to send the appropriate message. 


Validating XML against DTD
Conclusion


Throughout this post we focused on the creation of an XML document, it's rules and the use of a DTD schemas to apply new rules. There are a vast number of XML based technologies which can be used for example XSL standing for eXtensible stylesheet languages which consists of 
XSLT, XPath and XSL-FO. Using these XML markup languages with what we discussed can create a segregated document in terms of Structure, style and data.

No comments:

Post a Comment