Ph: 20030728

The XAO of Pooh: XML Access Objects as a New Pattern for Web Development

Posted on Wednesday, July 30th, 2003 10:29 PM

Okay, I lied. One last thing before I take a break - I've been working on this post for a while and wanted to publish this idea I've had and some sample code, but haven't gotten the code done. So here's just the idea, samples later.

Instead of actually producing anything over the past few months, I've been instead trying to find the perfect system for producing web content. This is, of course, my way to procrastinate. I get to play with a bunch of different technologies, learn a bunch of stuff, write a bunch of code and at the end I actually don't have anything accomplished. However, finally, I think I may have stumbled upon the system I've been looking for and I'll explain it here.

First you have to understand my opinions about web development. Doing web dev is not brain surgery. What you're doing is simply reading data from a database and formatting to present it to a browser, maybe taking some data given to you sometimes and popping it back in the db. That's it. Everything else is just abstractions from this basic data read/write paradigm. And that's the stuff that drives me crazy: abstractions.

Russell's #1 rule for app development: For every layer that you put in between you and your data you better have a damn good reason.

However, that said, there are some good reasons out there. Slapping all your code into .jsp pages is definitely not great from a maintenance standpoint at all. My website has survived some decent size blasts from Slashdot, Wired News and averages over 750,000 hits a month so it's not a bad way of doing things from a performance standpoint (as many people on the current MVC bandwagons would lead you to believe) but JSP can quickly turn into spaghetti code. So there's one reason for putting a layer on your website: Separating out your presentation from your logic.

For this, I decided I'm just using Struts. Why? Because it has been blessed by Sun and it has an incredible amount of docs and support. There's like, what, 5 books, endless numbers of websites and constant work being done to improve it. I'm not Struts biggest fan, but I got it to work like I wanted it to with some tweaking so I'm sticking with it. I looked at developing my own simplified MVC, at JPublish, Cocoon and various Python systems, but decided in the end that Struts did what I needed it to do and for the most part got out of my way, so I went that way. There's a lot to be said for standards and Struts is becoming the MVC standard. (Don't jaw at me about WebWork. I looked at it and it just had *too many abstractions*. Bad.)

In this process I took a serious and hard look at Cocoon. I decided that it was too heavy. Too much of big black box who's way of functioning were a complete mystery to me. And waaaay too much use of XSL. However, I *like* Cocoon's way of thinking a lot. I learned a ton by messing with it and came away with two must haves for any future web dev I do.

First, the URL that's sent to the web server has to be completely separated from the logic and data that's returned. Cocoon provides a great mapping layer where you can define the URLs you want to respond to using RegEx expressions and it works great. This allows you to present your url like http://www.russellbeattie.com/notebook/20030728 and have it correspond to a dynamic query. There's no reason for anyone out on the web to know you're using Java or Python or who knows what because of the URL you send. You should have complete control over the URL.

Cocoon is a lot more powerful that Struts in this respect and uses a lot of server-side magic to make this happen. At first I tried to get Struts to work the same way by just hacking a servlet mapping with /* into the web.xml of my app and then from there handle everything in a custom servlet, but it's a lot of work and more processing - doing it this way, you're responsible for all the .jpgs and .gifs and .css and every other static file as well. The *other* reason is that you can't use .jsp pages mapping /* because eventually when you do a Forward to the .jsp page (after you've done your logic on the back end) the servlet container balks because of the cyclical reference. In other words .jsp matches /* as well... There are some work arounds for Tomcat - you can define a /*.jsp first and point it at Jasper, however it's not a portable solution and again you have to deal with all the static documents so I decided not to go that route. Instead I decided that I would map *.html and *.xml to my custom servlet instead. This servlet simply throws the original path into request object attribute and the forwards the request off to a Struts action. Like this:

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;


public class URLProcessorServlet extends HttpServlet{

    public void doGet(HttpServletRequest request, HttpServletResponse response)
               throws IOException, ServletException
    {
        execute(request, response);
    }

    public void doPost(HttpServletRequest request, HttpServletResponse response)
                throws IOException, ServletException
    {
        execute(request, response);
    }

    public void execute(HttpServletRequest request, HttpServletResponse response)
        throws IOException, ServletException
        {

                request.setAttribute("originalPath",request.getServletPath());
                String redirectUrl = "/do/URLProcessor";

        RequestDispatcher requestdispatcher = request.getRequestDispatcher(redirectUrl);
        requestdispatcher.forward(request, response);

        }

}

Then the Struts action takes a look at the URL and processes it in a big if statement fowarding the request to specific actions that do the real work before forwarding to the jsp pages. I use a standard header on all the pages, so to make sure that someone doesn't call a .jsp page directly, I simply add this bit of JSTL at the top of the .jsp page:

<c:if test="${empty originalPath}">
        <c:redirect url="/index.html" />
</c:if>

Do you see how that works? Basically, if the .jsp page didn't get called through the URLProcessor servlet (which is actually badly named because all it does is forward the request on... but I wanted it to match up with the Action for clarity) then the "originalPath" attribute isn't set, and the page gets sent back to the front page.

So that's the first thing I got from Cocoon - the desire to have full control over the URIs that my server sends out. Having to have *.html at the end is a bit of a hack and if you read TBL, he says it's not a future proof URI, it's still standard enough that most people won't even look at the URL.

The second thing I got from Cocoon is the idea that XML is *the* only way to use data. The first thing you learn in Cocoon is that you need to start with XML before you can do anything else. Step one in the Cocoon presentation process is done by using a variety of "generators" which produce valid XML. Query a DB? It has to come out as XML. Files, URLs, whatever, before you can start manipulating it with Cocoon it has to start as XML.

This is a *fantastic* way of thinking. At first I fought the concept because, well, XML is a pain in the ass. Producing valid XML from random data sources is a real bitch. But if you want to eventually produce different types of XML like WML, XHTML, XHTML-MP, SVG, etc. the only real way is to start with valid XML and go from there. With XML it's really garbage-in, garbage out so you've got to start clean.

The idea is that you have to really have bought into the idea of XML as a flexible data transport. If you think XML is just pure crap, then there's no reason to continue reading because you'll think that everything from this point on is a pointless exercise. But if you get the idea that once data is within XML that it becomes super portable and transformable and incredibly useful and worth the inconveniences, please continue.

So as soon as I decided not to use Cocoon, I still wanted to keep this style of programming because I think it's just makes incredible sense. It passes my layer test: Yes, I'm putting a layer of XML processing in between me and the database, but the benefits of this extra layer (which I'll explain shortly) justify any drawbacks.

Happily, once I moved back to just "plain" development with .jsps, I discovered the joys of development with JSTL. It's *really* well done. Those of you with long memories might remember me almost a year ago on this blog bitching about how JSTL is horrible and ugly and a misuse of tags, but I was completely wrong. That was a reactionary opinion from someone who spent years with embedded Java in their JSP. The JSTL is done very well and is quick and easy to produce clean JSP pages. I really look forward to JSP 2.0 when the Expression Language is used throughout the page, but for now using just the tags is fine. It's still an abstraction - all the tags are doing is producing a servlet at the end, but it's an abstraction worth using as well (as long as you don't bump into the dreaded 64k class limit. Ugh.)

The absolute *best* thing about JSTL is the XML processing tags. They are INCREDIBLY powerful! I can't believe how cool they are. Using a simple import tag, you can grab your XML from anywhere (file or http request) and then use the other XML tags to loop through the XML and present it to your users. I don't really like XSLT - it's okay, but it's difficult for me to use. I mean, I do like how XSL is separate from a server implementation - an XSL stylesheet you write can be used by any XSL processor, but other than its portability, XSLT is just a bitch to use. Errors are cryptic and the results can be many times incredibly mysterious. Add any sort of complexity to the transform and the XSL page can be *HUGE* as well.

The one thing I like about XSL is its use of XPath. And happily, the JSTL uses it as well! AWESOME! So now you can use the power of loops and if logic on your page like you would normally do, but instead of doing it with SQL data, you're doing it with XML data queried via XPath. It really works well and the development time (when compared with XSL) is insanely quick. But hey, if you want to use XSL, JSTL supports using XSL stylesheets just as easily, so you don't have to pick or choose. Some things will naturally be easier to do via logical loops and ifs, other stuff will be more straight forward transformations.

So now I've come really, really close to having everything I need to quickly and easily produce modern web content via XML, but without much excess baggage. I have control over the URLs, separation of logic and presentation and presentation based on XML manipulation or transformation. The only thing left is the XML itself. And that's where XML Access Objects come in (finally). It's a revelation that just recently came to me and I'm still working out the issues, but hopefully you'll see the core of the idea and why its so cool.

You can think of XAOs as Data Access Objects for XML. They separate the application from the source of the data (which can be a database, an xml page, or whatever), but instead of returning some sort of Map or Collection of Beans like DAOs do, these classes only produce XML.

Here's the XAO interface:

public interface XAO {
        
        public String getXML();
        
        public String getXML(String xml) throws XAOException;
        
        public void setXML(String xml) throws XAOException;
        
        public String getSchema();
        
        public String getParamSchema();

}


What this does is create an standard interface for dealing with XML data in your application, but without specifying the implementation. Don't be mistaken by the first glance - XAOs aren't an object representation of XML or anything like that, they are simply XML generators. Once you've got that XML in your hands (generated from any source) your options are incredibly broad.

In fact, there's so many cool things you can do with XAOs I almost don't know where to start. XAOs aren't simply interfaces to XML data, they're also XML encapsulation of *any* data. Though the name is related to Data Access Objects, in use they're more like "XML Java Beans". Instead of encapsulating your data in a million varied JavaBeans, Collections and Maps, you instead use XML and reap a ton of benefits. (Maybe I should call them XEOs).

The first rule of XAOs is to keep them simple. The second is that that they produce only XML Strings. The third is that they only consume XML Strings. That's it - anything more is too complex and too abstracted. (Well, I say "only", but I've implemented them as an Interface, it's really just a rule that applies to when you're working within the framework of XAO.)

Here's how I see the interface working. A getXML() query always returns a default XML document. This can be dynamically generated from a database lookup, a local file store, an http request to another server, a XQuery to an XMLDB, or if the interface is set on top of a regular JavaBean or even an EJB, it can return the contents of the bean itself as XML as well.

But what does that XML look like? It can be any XML data you want, but you need to be able to communicate what its supposed to look like for other classes to use the XAO, right? So that's the job for the getSchema() method which will return a valid XML Schema document which describes the XML returned from the getXML() method in detail. (If you haven't used XML Schema yet, well, neither have I... but it's the best way to mimic JavaBean-level of detail in the data returned). I honestly haven't gotten much into this yet and have added it in for completeness - but I think it's important. Another option is to return a URI or a DTD instead, but that breaks rule #2. But since I just made the rules up, this may not be a problem. :-)

This same schema is also used when you want to push data to the XAO as well with the setXML(String xml) method. The XAO is expecting a document that it knows what to do with, so it has to be the same as the document that's returned with the getXML() methods.

There are obviously many, many times when a default data lookup isn't going to be enough, and that's where the getXML(String xml) method comes in. This will allow you to pass an XML document of parameters to the getXML() method to focus your "query". What does the XML that's passed to the XAO need to look like? Well that's where the getParamSchema() method comes in. This method returns a definition of the XML that the XAO knows how to parse in the getXML(String xml) method.

Now if all this schema stuff makes this seems complicated, it's because it is. The caveat I've decided on to make this process the most flexible is that if the schema methods return null, then *any* xml document can be returned or set. If you send something that the XAO doesn't like, it'll just throw a XAOException which you'll need to deal with programmatically. But by implementing the XAOs using the schema methods, you can then programmatically parse and validate the XML. For many applications this level of detail might be required, otherwise it's just a data free-for-all.

So here's some examples of how using XAOs can be insanely powerful. Here's what I'm doing for the next rev of the software that runs this site. First, I create a new XAO class called PostXAO which will allow me to grab my weblog posts from a MySQL db. Here's what the getXML() looks like (for example):


public String getXML(){
        
        Connection conn = null;
    try {
        Context ctx=new InitialContext();
        DataSource ds=(DataSource)ctx.lookup("jdbc/RussellBeattiePooledDS");
        conn=ds.getConnection();
    } catch (Exception ex) {
        ex.printStackTrace();
    }

    StringBuffer xmlBuf = new StringBuffer("");
    String sql = "select * from miniblog where parentId = 0 order by created desc";

    Statement s = conn.createStatement();
    ResultSet rs = s.executeQuery(sql);
        
        xmlBuf.append("document\n");

    int count = 0;
    while(rs.next()){
        count++;
        if(count > 20){
            break;
        }
        xmlBuf.append("\n");
        xmlBuf.append("" + rs.getString("id") + "\n");

        String title = rs.getString("title");
        String content = rs.getString("content");

        ByteArrayOutputStream sout = new ByteArrayOutputStream();
        org.w3c.dom.Document doc = tidy.parseDOM(new StringBufferInputStream(content), sout);
        String contentClean = sout.toString();
        contentClean = contentClean.replaceAll("&","&");
        contentClean = contentClean.replaceAll("&amp;","&");

        xmlBuf.append("\n");
        xmlBuf.append("" + contentClean + "\n");
        xmlBuf.append("" + rs.getString("created") + "\n");
        xmlBuf.append("\n");

    }
        
        xmlBuf.append("\n");
    
    try{

        rs.close();
        s.close();
        conn.close();

    }catch(Exception e){

    }

        return xmlBuf.toString();

}

What that does - in a very ugly hand-drawn way - is produce an XML document that looks sorta like this:
<document>
                <entry>
                        <id>1003490</id>
                        <title>Test</title>
                        <content><![CDATA[This is testcontent]]></content>
                        <created>2003-07-03 04:08:24.0</created>
                </entry>
</document>

I think it'd be better to use one of the XML libraries like JDOM, DOM4J or even Sun's stuff for producing the XML, but many times it's just easier to hand draw it. Now, here's how a XAO is used to grab the XML in a Struts Action before passing it to the display:

public class IndexAction extends Action {
        
  public ActionForward execute(ActionMapping mapping, 
                                                                ActionForm form, 
                                                                HttpServletRequest request, 
                                                                HttpServletResponse response)
        throws IOException, ServletException {
                
        PostXAO postXAO = new PostXAO();
        
        String xml = postXAO.getXML();
        
        request.setAttribute("xml",xml);
        
        String forward = "index";
        
        return (mapping.findForward(forward));
}

What that does is instantiate a new postXAO and grab the default XML document, which just happens to be the last 15 or so posts, stuffs it into a request variable and passes it on to the index page. Once it gets to the index page, I use JSTL's XML tags to process the XML on the page for display:
<x:parse xml="${xml}" var="mainXML"/>

<x:forEach select="$mainXML/document/entry">
        <fmt:parseDate pattern="yyyy-MM-dd HH:mm:ss.S" var="created"><x:out select="created"/></fmt:parseDate>
        <c:if test="${empty latestDate}">
                <c:set var="latestDate" value="${created}"/>
        </c:if>
         <div class="post">
                <a name="<c:catch><fmt:formatDate value="${created}" pattern="HHmmss"/></c:catch>"/>
                <h2><x:out select="title" escapeXml="false"/></h2>
                <h3><c:catch><fmt:formatDate value="${created}" pattern="EEEE, MMMM dd, yyyy  h:mm a"/></c:catch></h3>
                <x:out select="content" escapeXml="false"/>
                <p class="postlinks">
                        <a href="<c:catch><fmt:formatDate value="${created}" pattern="yyyyMMdd'.html#'HHmmss"/></c:catch>">Permalink</a> |
                        <a href="<c:catch><fmt:formatDate value="${created}" pattern="yyyyMMdd-HHmmss'.html'"/></c:catch>">Comments [<x:out select="count"/>]</a>
                </p>
         </div>
         <p />
</x:forEach>

If, however, I wanted to produce an RSS document, I could write some more JSTL and create something fun, but instead I'd rather just use good old XSLT. In that case, I could just write instead:

<c:import url="/xsl/rss.xsl" var="xslt"/>

<x:transform xml="${xml}" xslt="${xslt}" />

Which would grab the rss.xsl file from my local serve and apply it to the XML data stored in the ${xml} variable (JSTL looks up the stack for values, starting with Page level variables, then to Request, then Session and then Application. Sadly, Param variables seem to be in their own world, outside this stack.

Now this is all well and good, but it only shows a bit of what you can do with XAOs. Now that you've got the basics, you can start adding some neat things. First, let's imagine that you *hate* Struts and think it's a bloated nightmare of an app, and you just want to use XAOs from within your JSP pages - but still don't want to mess with taglet code. Well, here's a custom XAO tag that I'm working on now. It would instantiate a new XAO based on the XAO named and grab the XML based on the interface. Here's what I think they look like now:

<xao:getXML name="PostXAO" var="xml"/>

<xao:getXML name="PostXAO" var="xml">
        <?xml version="1.0"?>
        <parameters>
                <param name="type">getPostAndComments</param>
                <param name="id">1003768</param>
        </parameters>
</xao:getXML>



<xao:setXML name="PostXAO" value="${xmlValue}" />

<xao:setXML name="PostXAO">
        <?xml version="1.0"?>
        <test>
                <post>New Post</post>
        </test>
</xao:setXML>

I think this'll work because as I see the XAOs they don't hold state, though if they did, you'd have to mess with a method to instantiate the bean beforehand. I'll admit the tags aren't done yet (as I'm using the above method primarily) but I'm working on them. I've written the params that are passed to the XAOs in longhand, but I'm also working on a XAOHelper object that will take basic Java objects and transform them into standard parameter xml files. XAOHelper.getParames(Map map) for example, will produce XML that looks similar to the parameter XML passed to the xao:getXML tag above.

Another neat extension of basic XAOs is a caching layer. If you're concerned about how XAOs are going to hold up under a massive number of hits - because processing XML does take time - then you might want to cache the data that's being returned. You can easily in your JSTL pages cache the results via OSCache or Jakarta's Taglib Cache tags. Or - because XAOs *always* return XML, you can cache the results at the source instead.

What I've done is copied much of the code from Jakarta's Cache taglibs into another singleton class I've called XAOCache. Now when I instantiate a XAO, it grabs a reference to the main XAOCache and uses it to improve performance like this:


public String getXML(String xml){
        
        boolean exp = xaoCache.expired(xml);
        
        if(exp){
                String returnXML = slowHttpGrabber.getAllMyShit();
                return returnXML;
        } else {
                return xaoCache.getXML(xml);
        }

}


I use the XML parameters that are passed to the XAO as the key for whether the contents of cache. That's all I've gotten working so far, but the idea that I'm trying to work on is making the XAOCache centrally controlled so that you can set expiration times dynamically from a central properties file. This allows web designers not to worry their pretty little heads about performance and allows the back-end developers to have more control over the data. Since the data that was usually spread all over the place in various beans and collections is now encapsulated in XML, you can control it a lot easier from both a macro and micro standpoint. Remember, the XAO cache can sit in front of *any* sort of data query that you can use the XAO for, including fun things like XSL transforms.

And that's the final thought. Instead of passing the XML down to the JSP page to do the transform like in the example above, you can use XAOs as cacheable "transform objects" as well. The designer still does their work on the XSL style sheet, just the control of that transform is pulled up to the back-end, instead of being on the web page. Combined with custom XAO tags, this could be a way to improve XSLT performance across a website. This is how it would work: using the getXML(String xml) method, instead of passing an XML document with parameters, you instead pass the XML document that you want to transform. It would look something like this in a Struts Action before it's passed to the JSP page:


        TransformXAO transformXAO = new TransformXAO();
        PostXAO postXAO = new PostXAO();
        
        String newXML = trasformXAO.getXML(postXAO.getXML());


You can see that you could do this sort of encapsulation indefinitely. And with proper caching on the server side, it might even be a reasonable thing to do based on your app. For example, you could pass off much of the client-capabilities (browser, mobiles, Palm, etc.) detection to another library, then return a XAO that best transforms your data based on the results of that analysis. Lots of things like that.

Okay, that's all I've done so far. As I play with this idea more, I'll post more thoughts. Your comments welcome (though I'll probably be reading them from my phone on vacation. :-) )

-Russ


You are viewing a mobilized version of this site...
View original page here

Mobilized by Mowser Mowser
Mobilytics