|
RSS XML from SQL Server 2005
7/3/2007 2:00:00 AM
Introduction
I recently read an article in a magazine about creating an RSS feed in ASP.Net. In this article, data was retrieved from SQL Server tables, then processed in .Net code to create the RSS XML file. This is all well and good, but if you could do the same thing with less processing code, and have it perform just as well, if not better, wouldn't you?
It turns out you can do it, if you are using SQL Server 2005. SQL Server had XML capabilities before, but in the 2005 version, they are expanded, allowing you to write queries that produce XML in a very flexible way, without all the explicit definitions required before.
So given that you have data about your articles in a table (and I will include categories as well) you can have SQL Server 2005 produce RSS XML for you that is ready to go and doesn't need any further processing. This article will show you how I did it.
A Little Background Info
First, here is a little bit of information about the features in SQL Server 2005 that make this possible. This will help you understand how the query works, so you can add to it, or do similar queries with other types of data. This is not in any way a complete explanation of using XML in SQL Server; I will just focus on the pieces we need. See the link on using the PATH mode in the References section for a more complete reference.
The key to all this is that SQL Server 2005 supports the XML data type. So rather than be a string data type (Varchar), XML returned in a query can be strongly typed as XML. A single column in a SELECT statement can be of type XML. And this XML column can contain any amount of XML, or in other words, any amount of data. This is important because it allows you to nest queries that return XML inside each other. This, along with the PATH mode allows you to build an XML hierarchy in a more intuitive way.
When you write a query that produces XML, you use the FOR XML clause in a SELECT statemment to tell SQL Server that you want the results returned as XML. In the FOR XML clause, you specify the mode, which tells SQL Server how to construct the XML output. Previously, if you wanted full control over the hierarchy and layout of the results, you had to use the EXPLICIT mode and write the query in a very strict, cumbersome way. Now, with PATH mode, the query is more intuitive to write and with nesting, you can build the hierarchy any way you want. The column names or aliases help determine how the data is placed in the results, either as attributes or sub-elements. This will be illustrated in the examples below.
The Table Structures
There are two tables involved, one to hold information about the articles, and one to hold categories. Articles are organized into categories, which is supported by the RSS Spec. The articles table has a foreign key that joins to the categories table to indicate the category for that article.
The articles table looks like this:
[dbo].[tblArticles](
[ArticlesID] [int] IDENTITY(1,1) NOT NULL,
[Title] [varchar](200) NULL,
[URL] [varchar](200) NULL,
[CategoriesID] [int] NULL,
[Description] [varchar](500) NULL,
[PublishDate] [smalldatetime] NULL
)
And the categories table looks like this:
[dbo].[tblCategories](
[CategoriesID] [int] IDENTITY(1,1) NOT NULL,
[Category] [varchar](50)
)
The RSS Query
Here is the complete RSS query, as written in a T-SQL stored procedure:
ALTER PROCEDURE [dbo].[usp_GetArticles_rss]
@link varchar(50),
@desc varchar(200)
AS
BEGIN
SET NOCOUNT ON;
SELECT '2.0' AS "@version",
(SELECT @desc AS title,
@link AS link,
@desc AS description,
'en-us' AS [language],
LEFT( DATENAME( dw, GetDate()), 3 ) + ', ' +
CONVERT(varchar(20),GetDate(),113) + ' GMT' AS lastBuildDate,
(SELECT Title AS title,
@link + A.URL AS link,
A.Description AS description,
C.Category AS category,
LEFT( DATENAME( dw, A.PublishDate), 3 ) + ', ' +
CONVERT(varchar(20),A.PublishDate,113) + ' GMT' AS pubDate
FROM tblarticles A INNER JOIN tblCategories C
ON A.CategoriesID = C.CategoriesID
FOR XML PATH('item'), type)
FOR XML PATH('channel'), type)
FOR XML PATH('rss')
END
GO
This query is composed of three SELECT statements; two nested inside of the third, outer statement. I'll break it down, going from outside in, to show you how each part works.
First, you need to build the outer RSS element that wraps the whole feed. This outer SQL statement provides that element.
SELECT '2.0' AS "@version"
FOR XML PATH('rss')
Results:
<rss version="2.0"/>
Notice that this statement isn't getting anything from tables. It just provides the root element with a hard coded RSS version number. In the FOR XML clause, you specify that the mode is PATH, and that the element name for each row returned (in this case, only one) is 'rss'. This creates the single XML RSS element.
In XML, data is either shown as text between XML elements, or as an attribute inside an element. In the RSS element, the version number should be shown as an attribute. To do this in T-SQL, you alias the column with a name that begins with @. ('2.0' AS "@version"). This causes the data to be placed in an attribute inside the RSS tag. The alias is double-quoted so that it won't be confused with a variable name.
Inside the RSS element is a channel element that provides some top level information about the RSS feed. This is represented by the middle SELECT statement.
SELECT @desc AS title,
@link AS link,
@desc AS description,
'en-us' AS [language],
LEFT( DATENAME( dw, GetDate()), 3 ) + ', ' +
CONVERT(varchar(20),GetDate(),113) + ' GMT' AS lastBuildDate
FOR XML PATH('channel'), type
Results:
<channel>
<title>desc</title>
<link>link</link>
<description>desc</description>
<language>en-us</language>
<lastBuildDate>Fri, 29 Jun 2007 10:57:51 GMT</lastBuildDate>
</channel>
The channel information comes from the stored procedure parameters that are passed in, but you could just as easily put it in a table. This time the data should be shown as elements, so the column aliases (without the @) become tag names within the 'channel' tag. Note that order is important; the order of the columns in the SELECT statement is the order they are arranged in the XML output. Also, the element names are output exactly as the aliases, including case (note the capital letters in lastBuildDate). The lastBuildDate is formatted to match the required RFC 822 format. Since this SELECT statement is nested in the outer one, the resulting XML is nested within the RSS element.
Remember that this nesting works because of the TYPE directive in the inner SQL statement. The entire statement returns something that is of type XML, which then can be placed as a whole unit in the outer XML. Or in other words, one column of the outer SQL statement is an XML value, which is the complete results of the inner statement.
Now you should be starting to see the power of this. You can create an XML hierarchy the way you want, no matter how the data is organized in your database. And you don't need to write any application processing code or ugly XSLT to get there.
The inner-most SQL statement is where we actually get the data for the published articles.
SELECT Title AS title,
@link + A.URL AS link,
A.Description AS description,
C.Category AS category,
LEFT( DATENAME( dw, A.PublishDate), 3 ) + ', ' +
CONVERT(varchar(20),A.PublishDate,113) + ' GMT' AS pubDate
FROM tblarticles A INNER JOIN tblCategories C
ON A.CategoriesID = C.CategoriesID
FOR XML PATH('item'), type
Results:
<item>
<title>aaaaArticle Test</title>
<link>http://vbwebdev.net/articles/aaaatest.aspx</link>
<description>article description</description>
<category>Cat1</category>
<pubDate>Mon, 01 Jan 2007 00:00:00 GMT</pubDate>
</item>
...
This statement creates 'item' elements for each row returned, with sub-elements for each column. Again, the whole result set of this statement is an XML value that is nested in the outer SQL statement. With all three parts put together, you have a complete RSS XML document.
Conclusion
The technique shown here will give you a correct, basic RSS XML feed. If you look at the RSS Spec. you will see that there are many other attributes that you can include in your RSS. You can add them to the appropriate places in the query, either passing in the values as parameters, or adding columns to the tables. You probably noticed that there are two places where a date is formatted in the RFC 822 format. I left the code in line for simplicity, but you can consider making that format code a User Defined Function to eliminate code repetition.
Your web application can call this query to retrieve the RSS. Unless your web site is constantly changing, you probably don't want to make a database call every time someone requests the RSS feed. You can cache the feed, or write it to disk periodically. Even the big players like Yahoo don't update their RSS every second.
This technique can be used to output any sort of XML from your SQL Server data. No procedural code is necessary to manipulate the XML contents in the business or data layer of your application, and even better, no writing of that powerful but ugly language, XSLT.
References
|