RELAX NG Implementation Cook Book: Day 2

$Id: ShortRNG.docbook,v 1.2 2001-08-16 22:16:50-07 bear Exp bear $

Abstract

On the second day, we'll extend our grammar model and implement datatype related parts.


Table of Contents

Datatype Implementation
Datatype Library Interface
Built-in Datatype Library
DataExp
ValueExp
Name Class
ElementExp / AttributeExp
A Little Bit of A Hidden Flavor
End of the Day

Datatype Implementation

RELAX NG has two datatype related patterns: <data> and <value>. Our grammar model has corresponding classes: DataExp and ValueExp.

On our 2nd day, we'll implement code related to datatype library, and see how they works.

Datatype Library Interface

RELAX NG supports arbitrary datatype library. You have to implement the built-in datatype library at least, and you may also want to (partially) support W3C XML Schema Part 2, because many schema uses it.

So the good start is to implement a set of interface for datatype library so that you can implement datatype implementations later.

RELAX NG SourceForge Project hosts the datatype interface set for several languages (currently, Java, C#, C++ and COM). If you chose other language, then you may want to mimic those interfaces. (and please host it at SourceForge for other people!)

Built-in Datatype Library

Every conforming validator must implement a built-in datatype library, and it'll occupy the default namespace "". Fortunately, it is very easy to implement it. It only has two datatypes (string and token), and they don't have any parameter. In short, any string is valid for both datatypes. The only difference is the createValue method. The string type should return the input literal as-is, and the token type should return the whitespace normalized string.

The section 6.2.10 of the spec formally describes it.

From now on, I'll call the datatype interface Datatype.

DataExp

DataExp needs to have two fields: one is a Datatype object whose purpose is obvious; the other is an Expression which represents the <except> clause. As a whole it looks like this: [1]

class DataExp : Expression
{
	Datatype	datatype;
	Expression	except;
}

ValueExp

ValueExp represents <value> pattern. It also needs at least two fields: one is a Datatype object, which represents the type of this value pattern; the other is a generic object that is compared with the literal found in instance documents.

The notion of the "generic object" differs from language to language. For example, Java has the "Object" type. In C++, probably you'll choose "void*".

The following code illustrates ValueExp in Java.

class ValueExp : Expression
{
	Datatype	datatype;
	Object	value;
}

Name Class

In RELAX NG, tag names and attribute names are checked against "name classes". The main idea of the name class is that it either accepts or rejects names.

RELAX NG has four types of the name class. As the base interface, the following interface is sufficient:

interface NameClass
{
	boolean accepts( string namespaceURI, string localName );
}

Name classes are very easy to implement. Here are some tips.

  1. AnyNameClass represents <anyName> and it accepts any name. You need one child NameClass to keep the value of <except> clause.

  2. NsNameClass represents <nsName>. Since it matches names in a particular namespace, one string field is necessary to keep the target namespace URI. Since <nsName> can have a <except> clause, this class also needs the "except" field.

  3. SimpleNameClass represents <name>. This class should have two string fields to keep the target namespace URI and local name. You don't need the except clause for this kind of the name class.

  4. ChoiceNameClass represents <choice>. For simplicity, you can restrict this class to have two child name classes and binarize it (just like we did for ChoiceExp). Or you may choose to allow arbitrary number of children.

  5. NotAllowedNameClass, which accepts nothing. RELAX NG surface syntax does not have this name class, but it becomes useful when you see a <anyName> or <nsName> without the <except> clause. In that case, you can treat it as

    <anyName>
      <except>
        <notAllowed/>
      </except>
    </anyName>
    
    [2]

Unlike Expression classes, there is not much necessity to reuse the name class objects. So just implement these classes and that's it. Of course, you can implement the NameClassBuilder if you feel like.

ElementExp / AttributeExp

By using a name class, ItemExp (the base class of ElementExp/AttributeExp) can be characterized:

class ItemExp : Expression
{
	NameClass	name;
	Expression	contentModel;
}

No additional field is necessary for ElementExp and AttributeExp. th

A Little Bit of A Hidden Flavor

There are some extras that you can do so that you feel more comfortable with what you've implemented.

The first thing is the visitor design pattern support for expressions and name classes. You can implement the whole validator without them, but the visitor pattern will become valuable in the later days of this course.

The second is the expression dumper. Since a expression is organized in a tree, it is hard to tell what it is from a debugger screen. Thus a function that converts an expression tree into a human-readable string (like (A,B),C) would be very helpful.

If you've implemented the visitor support, the expression dumper can easily implemented as a visitor of expressions like this:

class ExpDumper : ExpVisitor {
    string onChoice( ChoiceExp exp ) {
        return "(" + exp.exp1.visit(this) + "," + exp.exp2.visit(this) + ")";
    }
    ....
}

If you've skipped the visitor support, then you can just as easily implement the expression dumper as a method of Expression like this:

class ChoiceExp : BinaryExp {
    ....
    
    string dump() {
        return "(" + exp1.dump() + "," + exp2.dump() + ")";
    }
}

The above dumper produces a string with a lot of brackets. You need to sweat more if you want to remove those brackets.

End of the Day

Your grammar model is now complete and fully funtional. After all, it wasn't that hard, was it. The corresponding part of the source code of Jing is available here.

Tomorrow, we'll tackle the hardest part --- the validation engine. Stay tuned!



[1] Don't forget to make them immutable.
[2] Alternatively, you can choose to set null to the except field to indicate that the <except> clause was not specified. If you choose to do it in this way, you don't need NotAllowedNameClass.