Thursday, October 15, 2009

Manipulating XML Programmatically



Chapter 4 -
XML and XSLT
Microsoft Visual Studio 2008 Programming
by Jamie Plenderleith and Steve Bunn 
McGraw-Hill/Osborne © 2009























Manipulating XML Programmatically


The .NET Framework allows developers to query and manipulate XML data in many ways, from very high-level idioms to low-level classes that allow very granular data manipulation. Examples include:




  • Low-level functionality that uses classes like XmlDocument in the System.Xml namespace, available since .NET 1.x




  • Extensible Stylesheet Language Transformations (XSLT)




  • Higher-level functionality that uses classes like XElement in the System.Xml.Linq namespace, available since .NET 3.5




  • Language Integrated Query (LINQ)




Each of these methods has its pros and cons. The low-level functions give the developer incredible flexibility but often require a great deal of code to be written. A downside to some of the higher-level constructs, such as LINQ, is that they are available only in later versions of the .NET libraries toolset.


The first two items in the preceding list are explained in this chapter, and the second two are explained in the next chapter, which discusses LINQ.




Enumerating XML Documents


To get the ball rolling, the following is a snippet of code required to read the previous fragment of XML (describing the company directory) into memory using the first generation of XML classes and methods in the System.Xml namespace:





Dim EmployeeList As New List(Of Employee)
Dim myXMLDocument As New XmlDocument

'' Load the contents of XMLFile1.xml into memory
myXMLDocument.Load("XMLFile1.xml ")
'' Retrieve a list of nodes from the XML Document in memory
Dim myXMLNodes = myXMLDocument.SelectNodes("StaffDirectory/StaffMember")

'' Enumerate the list of nodes
For Each myNode As XmlNode In myXMLNodes
Dim myEmployee As New Employee
'' Populate the employee instance with some data from the XML file
With myEmployee
.Name = myNode.SelectSingleNode("Name").InnerText
.Department = myNode.SelectSingleNode("Department").InnerText
.Manager = myNode.SelectSingleNode("Manager").InnerText
End With
'' Add the employee instance to the List(Of Employee)
EmployeeList.Add(myEmployee)
Next



The following code shows a sample Employee class that you can use with the preceding code:




Public Class Employee

Private _Name As String
Private _Department As String
Private _Manager As String

Public Property Name() As String
Get
Return _Name
End Get
Set(ByVal value As String)
_Name = value
End Set
End Property

Public Property Department() As String
Get
Return _Department
End Get
Set(ByVal value As String)
_Department = value
End Set
End Property

Public Property Manager() As String
Get
Return _Manager
End Get
Set(ByVal value As String)
_Manager = value
End Set
End Property

End Class


The Employee class here is a simple class with some public properties for Name, Department, and Manager, with respective backing fields.


Because of the ubiquity of XML, nearly everything “talks” XML, including SQL Server, Oracle, SAP, MySQL, ASP.NET, XML web services, and even AutoCAD.


XML is concerned with what the data is that we’re trying to describe, not with how to display the data. We can use XML to store staff lists, stocks lists, a representation of a database table, a fragment of data to pass between systems, and so forth. Representation and display of information encapsulated in XML is beyond the scope of XML itself and is the territory of technologies such as XSLT.


The .NET Framework offers great XML support and is a first-class citizen used by numerous .NET objects such as the DataSet object from the System.Data namespace. (If you come from an ASP background, the DataSet object is somewhat similar in principle to the RecordSet object.)


To get data into your DataSet, you can load the data from either a data source or an XML file. That the DataSet object can both read and save XML from and to a file or stream is a useful feature. The following is an example of grabbing some records from a database and saving the data to an XML file. Note that we don’t need to do any special formatting of the data or loop through rows of data, preparing them to be saved.





string SQLConnectionString = "Data Source=mssql01.local; " +
"Integrated Security=True; Initial Catalog=TestDB; " +
"Pooling=false;";
string Query = "EXEC GetRecentOrders";
IDataAccess SQLAccessObject = IDataAccess.CreateNew(
ProviderType.SQLClient, SQLConnectionString);

System.Data.DataSet MyData = SQLAccessObject.ExecuteDataSet(Query);
MyData.WriteXml("data.xml");



In the preceding code example, we are using a simple data access layer component. We have a connection string to a database server on the network, in our case a Microsoft SQL Server 2005 Database. The code example executes a stored procedure called GetRecentOrders, which simply returns a list of recent orders.


When this stored procedure executes, it returns a table of results. We then use a SQLDataAdapter object to import this table into a DataSet object. This data is now stored in the “MyData” DataSet. Once the data is in this object, we can do anything we want with it. We could convert it to XML and send it to someone via e-mail; post it to a web service running on a remote machine; print it; save it to a network share; transform it (using XSLT) to HTML; and so on. In our case, we simply saved it to a file called data.xml.


To retrieve XML from a database query, our data access component fills a DataSet object and uses the DataSet to convert its contents to XML. If you are not using data access in your applications, you can skip this section. It is also possible to retrieve XML directly from the SQL Server database. There are two options that we will look at for retrieving data from your database, both of which you can tack onto the end of your SELECT statements: FOR XML RAW and FOR XML AUTO.


The following code examples demonstrate retrieving employee records from the database. In our fictitious company, an employee can be a member of multiple departments. There are three main database tables that we are concerned with: Employees, Departments, and EmployeeDepartments. The Employees and Departments tables just store information describing employees and departments, respectively. The EmployeeDepartments table stores a list of departments that each employee works in. Look at the following two queries:





SELECT
Employees.FirstName, Employees.LastName, Departments.DepartmentName
FROM
Employees
INNER JOIN
EmployeeDepartments ON Employees.EmployeeID =
EmployeeDepartments.EmployeeID
INNER JOIN
Departments ON EmployeeDepartments.DepartmentID =
Departments.DepartmentID FOR XML RAW;
SELECT
Employees.FirstName, Employees.LastName, Departments.DepartmentName
FROM
Employees
INNER JOIN
EmployeeDepartments ON Employees.EmployeeID =
EmployeeDepartments.EmployeeID
INNER JOIN
Departments ON EmployeeDepartments.DepartmentID =
Departments.DepartmentID FOR XML AUTO;



They’re both very simple SQL queries that perform an INNER JOIN between tables. Executed without the FOR XML options at the end, they’d return a typical result set. However, by adding the FOR XML options, we get the results as chunks of XML. Listings 4-1 and 4-2 show the outputs from the two queries.




Listing 4-1: Using FOR XML RAW







<row FirstName="Jamie" LastName="Plenderleith" DepartmentName="Marketing" />
<row FirstName="Jamie" LastName="Plenderleith" DepartmentName="IT
Department" />
<row FirstName="Justyna" LastName="Smith" DepartmentName="Sales" />
<row FirstName="Paul" LastName="Owens" DepartmentName="Marketing" />
<row FirstName="Paul" LastName="Owens" DepartmentName="IT Department" />
<row FirstName="Steve" LastName="Bunn" DepartmentName="IT Department" />
<row FirstName="Mike" LastName="McGrath" DepartmentName="Sales" />
<row FirstName="Sean" LastName="Timmons" DepartmentName="Marketing" />
<row FirstName="Sean" LastName="Timmons" DepartmentName="Sales" />
<row FirstName="Sean" LastName="Timmons" DepartmentName="IT Department" />















Listing 4-2: Using FOR XML AUTO







<Employees FirstName="Jamie" LastName="Plenderleith">
<Departments DepartmentName="Marketing" />
<Departments DepartmentName="IT Department" />
</Employees>
<Employees FirstName="Justyna" LastName="Smith">
<Departments DepartmentName="Sales" />
</Employees>
<Employees FirstName="Paul" LastName="Owens">
<Departments DepartmentName="Marketing" />
<Departments DepartmentName="IT Department" />
</Employees>
<Employees FirstName="Steve" LastName="Bunn">
<Departments DepartmentName="IT Department" />
</Employees>
<Employees FirstName="Mike" LastName="McGrath">
<Departments DepartmentName="Sales" />
</Employees>
<Employees FirstName="Sean" LastName="Timmons">
<Departments DepartmentName="Marketing" />
<Departments DepartmentName="Sales" />
<Departments DepartmentName="IT Department" />
</Employees>














The RAW option just tells us that we have received back a certain number of rows, and shows us the fields that each row contains. There’s nothing necessarily wrong with this. However, look at what the AUTO option has done. It has split the data up into a hierarchical structure. Therefore, we can see that, for example, the employee Sean Timmons is a member of three different departments, instead of seeing different instances of “Sean Timmons,” each with a different department name.


This section has demonstrated how easy it is to get data into XML format from your database. However, what do we do once we have the list of recent orders in XML format? Do we really want to display raw structured data to a nontechie? Not if we value our jobs! So how do we get the data from that raw state into something more aesthetically pleasing? The answer is described next.





Transforming XML Using XSLT


XSLT, short for Extensible Stylesheet Language Transformations, is a way to convert your XML into another form. Usually it is used to convert an XML document or fragment into HTML, but it can be used anywhere you need to convert XML to another format or structure. So let’s say you’ve pulled back a dataset from a database (or any other data source) and you want to display this data to the user. Assume that it’s an ASP.NET web application. You can perform the transformation on the web server in server-side code and output the rendered HTML to the user’s web browser, or you can output both the XML and the XSLT code to the user’s web browser and let the browser do the transformation. XSLT is an entirely different and tricky programming language in its own right. It’s also very powerful. A full description of XSLT is beyond the scope of this book.




Server-Side XSLT Transformations


If you are performing the transformation on the web server, then you do not need to be overly careful about what data is pulled back from your database. You can just choose what you want to display to the user. However, if you are sending the XML and XSLT data to the user and letting their browser perform the transformation, this could expose some of your database to the user. So be careful about what you’re putting into the XML file—pull only the fields you would like the user to see.






Note 

The user wouldn’t normally see the raw XML, but if they are tech-savvy, then it wouldn’t be difficult for them to see it.







Note 

XHTML is an application of XML, whereas HTML is an application of SGML. You can think of XHTML as a cross between XML and HTML. It defines HTML documents that conform to XML syntax.



Here is an example that demonstrates the XSLT transformation in server-side code and sends the transformed XHTML to the user’s browser. We’re using a table called Employees, with the following creation script:





CREATE TABLE [dbo].[Employees](
[EmployeeID] [uniqueidentifier] NOT NULL
CONSTRAINT [DF_Employees_EmployeeID] DEFAULT (newid()),
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](50) NOT NULL,
[Department] [uniqueidentifier] NOT NULL
CONSTRAINT [DF_Employees_Department] DEFAULT (newid()),
CONSTRAINT [PK_Employees] PRIMARY KEY CLUSTERED
(
[EmployeeID] ASC
) ) ON [PRIMARY]



Try executing the following code, and the debugger will stop on the third code line:






Tip 

If you’re using VB.NET, using the word Stop has the same effect.






IDataAccess MyDataAccess =
IDataAccess.CreateNew(ProviderType.SQLClient, "Data
Source=mssql01.local; " +
"Integrated Security=True; Initial Catalog=TestDB;
Pooling=false;");
XmlDocument Employees =
MyDataAccess.ExecuteXML("SELECT
[EmployeeID],[FirstName],[LastName] "+
"FROM Employees");
System.Diagnostics.Debugger.Break();



If you hover over the Employees variable and inspect the InnerXml property, you will see the chunk of XML. In this example, the XML document looks as follows:





<NewDataSet>
<Table>
<EmployeeID>35fb48fb-d5fc-4ba6-9828-3c04f51b1a45</EmployeeID>
<FirstName>Jamie</FirstName>
<LastName>Plenderleith</LastName>
</Table>
<Table>
<EmployeeID>efff6ad7-6697-4067-8bad-3e17dbf8e4b9</EmployeeID>
<FirstName>Justyna</FirstName>
<LastName>Stevens</LastName>
</Table>
<Table>
<EmployeeID>0a68e3c5-f7d3-4aaa-9801-4dae426ef9c2</EmployeeID>
<FirstName>Steve</FirstName>
<LastName>Bunn</LastName>
</Table>
</NewDataSet>



The easiest thing to do at this point is to save this chunk of XML in an XML file in your project. You can much more easily inspect and analyze the XML if it’s in a file than if it’s coming from your database. Once you’ve saved the XML in a file (or, if you prefer, have it coming from the database), you can start transforming it into something else. In this example, we’re going to spruce up the XML a bit and display it in the user’s browser. Next we’re going to add the following XSLT file to our solution:





<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<html>
<body>
<table>
<tr>
<th>Name</th>
<th>Email Address</th>
</tr>
<xsl:for-each select="NewDataSet/Table">
<tr>
<td>
<xsl:value-of select="LastName" />,
<xsl:value-of select="FirstName" />
</td>
<td>
<xsl:value-of select="FirstName" />.
<xsl:value-of select="LastName" />
@company.com
</td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>



As you can see, it’s HTML, but with extra bits and pieces thrown in. These extra bits and pieces are the XSL that performs the transformations. Once you’ve added the XSLT file, add the following code. This code connects to the SQL Server database and queries the Employees table for the list of employees. It then creates a new instance of an XslCompiledTransform object. This object enables us to transform XML data using an XSLT stylesheet. It is a member of the System.Xml.Xsl namespace. The XSLT transform file we’re using is above.





IDataAccess MyDataAccess =
IDataAccess.CreateNew("ProviderType.SQLClient, " +
"Data Source=mssql01.local; " +
"Integrated Security=True; Initial Catalog=TestDB; " +
"Pooling=false;");
XmlDocument Employees =
MyDataAccess.ExecuteXML(
"SELECT [EmployeeID],[FirstName],[LastName] FROM Employees"
);


XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(Server.MapPath("XSLTFile.xslt"));

xslt.Transform(Server.MapPath("XMLFile.xml"),
Server.MapPath("books.html"));




For this to execute properly, the IIS worker process needs write permissions to the web folder. We’re outputting the transformed XML into a file for the sake of simplicity. Doing this in a production environment isn’t recommended, because the IIS worker process would be writing/overwriting the same file over and over. Write to a file with a different name and send that to the client, or send the transformation output directly to the client. We want to open it and see what it contains, and whether it contains the right stuff. It should contain something like this:




<?xml version="1.0" encoding="utf-8"?>
<html>
<body>
<table>
<tr>
<th>Name</th>
<th>Email Address</th>
</tr>
<tr>
<td>Plenderleith, Jamie</td>
<td>Jamie.Plenderleith@company.com
</td>
</tr>
<tr>
<td>Stevens, Justyna</td>
<td>Justyna.Stevens@company.com
</td>
</tr>
<tr>
<td>Bunn, Steve</td>
<td>Steve.Bunn@company.com
</td>
</tr>
</table>
</body>
</html>


You could use something similar to the preceding code in a Windows Forms application for producing reports. Just pull back some XML from a data source, transform it using XSL, and you now have a formatted report for someone to view.


More to the point, however, instead of dumping the transformed XML into a file, let’s write it directly to the screen. One of the overloads of the Transform() method of the XslCompiledTransform class allows us to write the XML directly to an XmlWriter object:





IDataAccess MyDataAccess = IDataAccess.CreateNew(
ProviderType.SQLClient, "Data Source=mssql01.local; " +
"Integrated Security=True; Initial Catalog=TestDB; " +
"Pooling=false;");

XmlDocument Employees =
MyDataAccess.ExecuteXML(
"SELECT [EmployeeID],[FirstName],[LastName] " +
"FROM Employees");

XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(Server.MapPath("XSLTFile.xslt"));

StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);

xslt.Transform(Server.MapPath("XMLFile.xml"), null, sw);
litToDisplay.Text = sb.ToString();



This code example assumes that we have an ASP.NET Literal control on the ASP.NET WebForm called litToDisplay. The important part of the code, though, is that we have stored the transformed XML document in a StringBuilder object called sb. Once it is in the StringBuilder object, one obvious thing we could do is call ToString() on it and use the result somewhere.


So you have seen an example of performing the XSL transformation using server-side code. If you would prefer, you can use the ASP.NET Xml control to do this for you. Drop one from your toolbox onto your ASP.NET WebForm, populate its DocumentSource and TransformSource properties, and it will automatically display the transformation for you.


You can let the user’s browser do the transformation, as described next. As mentioned earlier, the user will be able to see the raw XML, as well as the transformation file. This may or may not—depending on what is in these files—be a security issue to you.





Client-Side XSLT Transformations


To allow a user’s browser to transform the XML document itself, just include the following at the top of the XML document:




<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="XSLTFile.xslt"?>


That is about as complicated as it gets! The user’s browser will download the content of the XML file, download the content of the XSLT file, and then apply the XSLT transformation to the XML file.


The problem with doing transformations in this way is that you will also have the usual HTML output for an ASP.NET WebForm, something like this:





<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
Untitled Page
</title></head>
<body>
<form name="form1" method="post" action="Default.aspx" id="form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"
value="VGhlIHRpbWUgaGFzIGNvbWUgdGhlIHdhbHJ1cyBzYWlk" />
</div>
<div>

</div>
</form>
</body>
</html>








Note 

The __VIEWSTATE hidden input box with its cryptic value parameter is added by ASP.NET to all ASP.NET pages to maintain page state. It has nothing to do with XML/XSLT.



Therefore, if you perform a transformation, you will end up with your data in the middle of the HTML document, or at the end or start of it, depending on how you do it. What should you do if you want only to output the transformed data? Fans of the World of Warcraft game by Blizzard Entertainment likely have seen the “Armory” (http://armory.wow-europe.com, for example). For those not familiar with the game, the Armory website allows a player to see their own profile, as well as the profiles of other players in the game. It outputs a chunk of XML to the user’s browser and lets their browser pretty up the page. It’s a very interesting example of a large-scale usage of XML and XSLT. Each of the statistics pages on this website is just one small chunk of XML. You do not get this impression, though, when visiting the site, because the simple XML has been turned into beautiful web pages, adorned with images, styles, borders, headers, and so forth.


So, here is an example of what is sent to the user’s browser. If you visit the website, you’ll see something very different appear (images, backgrounds, borders, styles, fonts, JavaScript rollovers, and so forth) on your screen, because of the XSLT transformation. Following are only some small snippets from the XML data, because some of the lines are very long.





<baseStats>
<strength attack="73" base="61" block="-1" effective="83"/>
<agility armor="1494" attack="737" base="179" critHitPercent="17.14"
effective="747"/>
<stamina base="107" effective="440" health="4220" petBonus="132"/>
<intellect base="77" critHitPercent="6.83" effective="258" mana="3590"
petBonus="-1"/>
<spirit base="83" effective="101" healthRegen="16" manaRegen="75"/>
<armor base="5743" effective="5743" percent="35.23" petBonus="2010"/>
</baseStats>
<ranged>
<weaponSkill rating="0" value="350"/>
<damage dps="313.4" max="802" min="669" percent="0" speed="2.35"/>
<speed hastePercent="0.00" hasteRating="0" value="2.35"/>
<power base="912" effective="1995" increasedDps="142.0" petAt-
tack="438.90"
petSpell="256.76"/>
<hitRating increasedHitPercent="2.98" value="47"/>
<critChance percent="28.86" plusPercent="3.71" rating="82"/>
</ranged>




We’re going to do something similar with our employees listing. Granted, it’s not going to be quite as spectacular as Blizzard Entertainment’s site, but it should give you a good idea of how you would go about doing this:





IDataAccess MyDataAccess =
IDataAccess.CreateNew(ProviderType.SQLClient,
"Data Source=mssql01.local; Integrated Security=True; " +
"Initial Catalog=TestDB; Pooling=false;");
XmlDocument Employees =
MyDataAccess.ExecuteXML(
"SELECT [EmployeeID],[FirstName],[LastName] " +
"FROM Employees");

Response.Clear();
Response.ClearHeaders();
Response.ClearContent();

Response.ContentType = "text/xml";
Response.Write("<?xml version=\"1.0\" encoding=\"utf-8\" ?>");
Response.Write(
"<?xml-stylesheet type=\"text/xsl\" href=\"XSLTFile.xslt\"?>"
);
Response.Write(Employees.InnerXml);
Response.End();



The preceding code will display the transformed piece of XML in the user’s browser. What is interesting to note is that when we look at the HTML source code of the page, we see XML source code, and not the transformed and improved HTML source code. The web browser applies the XSLT transform to the XML data, and renders this HTML source code in memory.






Note 

If you were writing the preceding code in VB.NET, because of its support for inline XML, you could write it as follows:






Response.ContentType = "text/xml"
Response.Write(<?xml version="1.0" encoding="utf-8\"?>
<?xml-stylesheet type="text/xsl" href="XSLTFile.xslt"?>
<%= Employees.InnerXml %>)






Comparing Server-Side and Client-Side Transformations


We have developed a small Windows Forms application to test the performance of doing XSLT transformations on the server side and on the client side. The application will hit two different ASP.NET WebForms. Both ASP.NET WebForms pull the same set of data from the database. One ASP.NET WebForm dumps the data as XML to the user’s browser, along with an XSLT link to transform it to HTML. The other ASP.NET WebForm uses a ListView control, with a DataSet object bound to it. The application performs 20 tests against each ASP.NET WebForm and records the time it takes to run the test. Each test consists of 200 calls to the particular ASP.NET WebForm. Table 4-1 shows the results of these tests. Remember, the XSLT transformation is performed on the client side, whereas the ListView transformation is performed on the server side.

























Table 4-1: Comparison of XSLT and ListView Transformations of Data
 

XSLT



ListView



Average duration



740 ms



833 ms



Shortest duration



688 ms



766 ms



Longest duration



812 ms



906 ms



Data sent



2076 bytes



9900 bytes



As Table 4-1 demonstrates, simply giving the data to the user’s browser results in much less work for the web server to do. What’s more striking, however, is the amount of data sent to the user’s browser. Allowing a browser to perform its own transformation causes the web server to send (in this example anyway) about 79 percent less data.




























No comments:

Post a Comment