Defining logical relationships between documents, schemata, URIs, resources and entities

Jonathan Borden The Open Healthcare Group May 4, 2001

This paper forms the foundation for a schema independent type framework. The relationship between URIs, Resources and Entities are formally defined. XML Namespaces are defined using tuples.

We define a schema generically through a validity predicate. This predicate tests an instance with respect to a schema. This predicate serves to define the set of Instances of a particular schema.

[1] valid(a,A) := a document a is valid with respect to a schema A

The predicate valid(a,A) is true for all documents awhich are valid with respect to a schema A. Validity with respect to a particular schema is defined by the particular schema specification.

[2] Instances(A) := for all a valid(a,A) => a in Instances(A)

The set of all documents valid with respect to a schema A.

Definition: schema equality

[3] equal(A,B) := Instances(A) = Instances(B)

The schemata A and B are equal if the set of documents valid with respect to A is equal to the set of documents valid with respect to B. Note, the schemata may have different specifications (e.g. DTD vs. XML Schema). This predicate provides a way to test for equality of schema having different specifications (e.g. an XML Schema equal to a TREX schema)

Definition: schema equivalence

[4] equivalent(A,B) <=> There exists <t,t'> such that dataPreserving(t) and for all a in Instances(A) there exists b in Instances(B) such that b = t(a) and a = t'(b) and for all b in Instances(B) there exists a in Instances(A) such that b = t(a) and a = t'(b).

Two schemata are equivalent if there exists a pair of transforms capable of transforming instances of A into instances of B and vis versa.

[4a] dataPreserving(t) => for all d Data(d) = Data(t(d))

[4d] Data(d) := for all n in Nodes(d) text(n) or attribute(n) => value(n) in Data(d)

We move on to define a generic type hierarchy built on membership in schema instance sets.

Definition: schema restriction

[5] restriction(A,B) := Instances(A) < Instances(B)

Definition: schema extension

[6] extension(A,B) := restriction(B,A)

Extension is the inverse of restriction.

Definition: Class subType restriction

[7] typeOf(x,cr) := x in Instances(c)

[7a] restriction(c,cr) <=> Instances(c) < Instances(cr)

A class may be a subType of another class. There are two types of subType class relationships: extension and restruction. In the restriction subType relationship: derivation by restriction, a subClass is a proper subSet of the parent class.

Definition: Class subType extension

[8] extension(c,ce) <=> Instances(ce) < Instances(c)

[9] subClassOf(cs,c) <=> Instances(cs) <= Instances(c)

URI Resources and Entities

In this section we define a simple set of relationships between URIs, Resources and Entities. A Resource is defined in RFC 2396 as the conceptual mapping of a URI. A URI may be resolved into an entity which represents the resource at a particular point in time. A URI is thus mapped to a set of entities which may vary over time and/or the conditions on which the entity has been retrieved given the URI (e.g. content negotiation).

Definition: URI Resource Equivalence

[10] equivalent(URIa,URIb) := Entities(URIa) = Entities(URIb) and cardinality(Entities(URIa)) > 0

Two URIs are equivalent when they map to the same set of entities.

[11] equivalent(A,B) <=> exists URIa such that A = resource(URIa) and exists URIb such that B = resource(URIb) and equivalent(URIa,URIb)

Two resources a and b are equivalent if the set of entities given the URIa and URIb are equal where URIa identifies a and URIb identifies b.

An issue arises given the mapping of a URI to a set of entities and a URI reference to a particular node within an entity. Some usages of the term resource do not distinguish between the entity retrieved from resolution of a URI, from the node obtained from resolution of the URI and a fragment identifier (together a URI reference).

[13] QName := <URIreference,localname>

[14] URI(Qname) := URIPart(URIReference(QName))+'#'+localname(QName)

A namespace qualified name (QName) is a pair consisting of a namespace URI reference and a localname. The URI reference corresponding to a QName is formed by composing the URI part of the namespace URI reference with the localname as a fragment identifier.

Definition: XML Namespace

[15] resourceDescription(id) := <title,nature,purpose,lang,href>

[16] Namespace(URI) := for each id in Ids(Entities(URI)) => resourceDescription(id) in Namespace(URI)

Note: Is this correct syntax to express "the set of resource descriptions identified by the set of ids in the entity obtained from the namespace URI"?

According to the RDDL specification a namespace is formally defined as a set of tuples each which defines a resource description. A resource description has an id, a title, a nature, a purpose, a language and refers to a URI which identifies the resource being described.

Definition: Hierarchical URIs

[17] hierarchical(URI) => Exists Children(URI) such that for each uri in Children(URI) => startsWithEquivalent(URI,uri)

A hierachical URI has a set of child URIs each of which starts with a URI prefix equivalent to the parent URI

The next section describes the relationships between URIs, fragment identifiers, and what they identify. URIs identify a resources. URIs are rendered at various points in time and under various situations such as content negotiation into a set of entities. A rendered entity typically is associated with a MIME media type which defines the document format. Formats are typically specified using grammars such as EBNF. Generally a grammar defines a parse tree or directed labelled graph in which an entity defines a set of related nodes. In the absense of a well defined logical structure, an entity transferred over a network as a stream of characters can be represented as a root node and a series of ordered child nodes, one for each character. Generically a document is represented as a set of nodes.

Definition: Node Identifier (Fragment Identifier)

[18] node(id,e) := for all e in Entities(URI) node(id,e) => id in Ids(e)

Every identifier id in the set of identifiers of an entity (Ids(e)) identifies a node

[19] rootNode(e) := node("",e)

[20] AbstractNodeSet(URIref) := for all e in Entities(uriPart(URIref)) exists n = node(fragmentPart(URIref),e) => n in AbstractNodeSet(URIref)

A URI reference is defined to identify an abstract node. The node is termed abstract because a URI identifies a single abstract resource yet references a set of entities. For each entity in this set, the fragment identifier identifies a single node hence the abstract node is instanciated as this set of concrete nodes. In a similar fashion to which a URI indicates a single resource and a set of entities, a URI reference indicates a single abstract node and set of nodes. The relationships between URI, URI reference, Resource, Entity, Abstract Node and Node are represented by the following table:

URIURI reference
ResourceAbstract Node

Definition: Class Identification

[21] node(URIref,S) <=> exists e in Entities(uriPart(URIref)) such that e in Instances(S) and exists n = node(fragmentPart(URIref),e)

A node may be subclassed with respect to a schema.