Chris Lovett wrote a wonderful .NET library called
SGMLReader, which,
when passed a regular HTML file will spit out
XHTML. The library
relies on
SGML parsing and uses a
DTD (Document Type Definition) file to parse
unformatted HTML.
I've been playing with Chris's library this
week and trying to convert HTML to XHTML on the fly using an ASP.NET
Http Filter. The code below consists of the following:
- An HttpModule, which sets the filter property of the response object for all ASPX page requests.
- An Http filter, which is a custom stream, to manipulate the HTML content for the requested page.
- The entry in the web.config file, required for the module to operate.
So, how does the code work?
To understand my code, you should know a little about how Http modules
work in ASP.NET. I am not going to breach this subject in this post,
but you can out all you need to know about Http modules on
MSDN.
My module code interjects with each incoming web request sent through
IIS, checks to see if an ASPX page has been requested, and if so, sets
the filter property of the response object to a new instance of
XhtmlFilter.
When ASP.NET is ready to push processed HTML content back to the client
browser, via IIS, it uses the filter property in the response
object. The default filter used by ASP.NET is -:
System.Web.HttpResponseStreamFilterSink. However, it is possible
to reassign the filter property a custom filer, which will perform so
additional processing before forwarding the content to the default
filter. This exactly what my filtering code does. My
XhtmlFilter
class is a stream derived class, which captures incoming HTML data from
ASP.NET, converts it to XHTML using SGMLReader, and then forwards the
changed content to the default filter.
The
XhtmlFilter class inherits from a base class -
HttpFilterBase, which abstracts away the stream functions not supported by Http
filters. Inside my filter class I make use of a
MemoryStream
object to capture all the HTML content pushed by the framework before
processing content with SGMLReader. As the SGMLReader parses the HTML
data the converted XHTML is streamed out to the default filter, and out
to the client browser.
The code follows....
XhtmlFilterModule.cs
XhtmlFilter.cs
HttpFilerBase.cs
Module entry in web.config