Problems with Dynamically Assembled Document Portions, and Some Solutions

Steven J. DeRose

Chief Scientist

Inso Corporation
Electronic Publishing Solutions
One Richmond Square
Providence, Rhode Island 02906
USA

Christopher R. Maden

Senior Tools Specialist

O'Reilly & Associates
90 Sherman Street
Cambridge, Massachusetts 02140
USA

http://www.oreilly.com/people/staff/crism/

Abstract

The SGML community has had increasing interest in the capability of assembling document displays from multiple referenced parts. Elements in such documents reference other elements or objects that are retrieved on the fly and transparently displayed much as if their content had actually occurred inline. This is commonly called ``dynamic document assembly'' or ``boilerplating'' and traces back to the older notion of transclusion (dynamic data inclusion). Transclusion has previously been very theoretical, but is becoming less so daily, especially with the coming use of XML. This presentation looks at reasons for transclusion, possible problems, and some proposed solutions.

Introduction

Most text processing systems provide a way to ``include'' parts of one document, or independent fragments, into another. The methods, however, vary widely. Many systems provide a ``mail merge'' that can embed small parts repeatedly, or a raw copy-when-referenced facility like most programming languages, for pulling in prerequisite files, macro packages, and so on. This is very useful, for a variety of basic reasons:

Economy

Avoiding duplication of data that is often re-used, such as abbreviations or stock phrases.

Reliability

Ensuring that such duplications stay the same after editing, by ensuring that there really is only one ``normative'' copy.

Dynamic update

Always seeing the latest copy of referenced data, even though it may be getting updated on an entirely different schedule than the referencing data, or even created on-the-fly (such as a stock quote or inventory count).

This mechanism has been around since long before SGML, having been standard equipment in programming and macro languages for ages. It was also prominent in the pioneering hypertext and hypermedia systems developed in the late 1960s:

More recently we have seen it in less dynamic forms; what we now call ``dynamic document assembly'' is simply a great deal of transclusion, except that it is typically done either as a batch process, or by a heavy server assembling things for a client (rather than being woven right into the thread of every transaction). Much specialized web-server processing is also faking transclusion (since HTML didn't contemplate it except for one case described below). So transclusion is becoming quite widespread; but only in certain specific guises, and with many unnecesary constraints. Here we discuss some of the general principles and processes underlying all these different guises of transclusion, and how those can be addressed in SGML and XML systems.

Definitions

Transclusion is the dynamic inclusion of data from one document in another. In essence this is quotation, but with a wide range of advantages (and challenges) that arise from making it dynamic and online.

We call the document which references other data the ``referring document,'' and the place in it where the reference occurs the ``referring context.'' Technically, there is no need for any kind of reference to be expressed inline, and many hypertext systems (including HyTime, TEI, Microcosm, and XLL) provide ways to specify links externally; we ignore this distinction here for simplicity.

We call the data, which is referenced, accessed, and then shown largely as if it has occurred in the referenced context, the ``referenced data.'' The place where it originally occurs is the ``referenced'' or ``original'' context, which may or may not bear much resemblance to the referencing context.

The definition of transclusion is not limited to including entire documents or to including documents of similar type. In its original sense, the inclusion is meant to be real-time, and to be able to accommodate issues of dynamic updates and versioning. It is, however, crucial to transclusion that the referenced data retain its identity: the reader can tell that something is a quotation instead of the work of the referring document's author; and moreover can readily access the referenced information in its referenced/original context.

For example, an abbreviation (such as an SGML entity used to expand an oft-repeated acronym such as NASA) is not properly a transclusion: it does not achieve the required level of identity and referenced-context meaning. In contrast, a quotation element typically does represent a transclusion if the quotation is done by live reference rather than data-copying. If done by copying but with an accompanying link to the referenced context, it is a marginal case of transclusion: the identity is expressed, but the quote isn't really live; this renders moot some of the semantic issues, but introduces its own problems such as undetectable obsolescence of the copy.

In HTML

Until recently, HTML did not contemplate dynamic inclusion. There was no provision for including anything on the fly, with one exception: graphics. That is, the behavior of these two HTML constructs is radically different, although both are links:

<A HREF="foo.htm">
<IMG SRC="bar.gif">

The A element in HTML has built-in semantics that include a requirement that the reader actuate the element's content before the referenced data is retrieved. Of course a program might choose to access the data in anticipation as a way of optimizing performance, but logically the link is not followed until the reader requests it. Therefore, this is not transclusion as we have defined it.

In contrast, the IMG element is typically retrieved as soon as it is encountered, with no user intervention. This is essentially a transclusion, except that there is no (standard) way of transcluding only a portion of a graphic, or any kind of data other than a graphic (making the SRC attribute of IMG point to HTML does not have the effect one might wish for).

So both transclusion and inclusion exist in HTML, but that distinction is conflated: referenced HTML can only be included, whereas images can only be transcluded, leaving some cases hard to achieve. Since IMG transfers can be slow, most browsers provide a global option setting to change IMG's behavior to resemble that of A; but only globally, rather than on a per-link or per-link-type basis. This leaves some useful functionality unachievable, and is one of the reasons the XLL linking specification adds these particular semantics directly.

HTML's FRAME and IFRAME elements are another way to achieve transclusion. These function somewhat like SGML SUBDOC entities: they are able to include only entire documents, and they create a separate address space, making it difficult to link into transcluded content. (See further discussion of SUBDOC below.)

In SGML

SGML can do parts of transclusion in several ways. The most obvious is direct entity references, which can be used to pull in almost any data. However, entities are too powerful in some ways, and not powerful enough in others:

A second mechanism SGML provides is the SUBDOC entity. This has the advantage of being synchronous, and of isolating the parsing contexts of the referring and referenced documents (thus an unmatched delimiter in one doesn't radically modify the other's parse). However, a SUBDOC entity can only be an entire SGML document, making quotation again difficult; promoting sub-elements to function as entire documents is sometimes easy, but sometimes requires re-coding the SGML to avoid all inclusion exceptions, #CURRENT attributes, USEMAPs, and other non-local features. Since SUBDOC also separates the ID spaces, linking across the transclusion boundary requires more work.

In the end, the best way to do transclusion with SGML is to treat it like other semantics of hypertext: build it as specific applications rather than using intrinsic SGML features.

In XML

XML rules out most of the constructs that make it hard to implement transclusion with SGML, such as asynchronous entities and document portions that can parse differently depending on where they are referenced. This makes truly dynamic transclusion processors more feasible. Also, XLL provides a powerful convention for referring to subtrees of documents on the fly (based on TEI extended pointer notation), making it very easy to transclude very specific document portions as required for quotation. XLL also provides transclusion semantics as a specific property that can be set for individual links: a link can be declared to require on-the-fly retrieval and display, or inline display on demand, or new-window display on demand, and so on, thus allowing all the needed combinations of behavior.

Classes of problems

Transclusion introduces a number of problems and design decisions. Most of them arise from a single characteristic: one piece of data exists in multiple contexts. In SGML, an entity referenced from several different contexts has no real ``identity'' or structure apart from each particular context. If it is referenced once in a normal parsing context, once in an RCDATA element context, once in an RCDATA marked section context, once inside an attribute value, and once in a context where there happen to be five NET-enabling start-tags still pending, it might parse to totally different element structures in each case. But in SGML terms this is not a problem: the entity is defined to have no structure apart from a given context, and no rule says it must have similar meanings in different contexts.

The issue with transclusion is that the referenced data's meaning, structure, and content from its original context must be maintained in the referencing context; remember that our definition included maintaining the data object's identity, and making the original context available on demand. Also, it is clearly not a fair quotation if the parsing or even content of the quoted material can be changed to an arbitrary degree, merely by the act of quotation.

Below, we discuss several examples of issues raised by transclusion. These and, we believe, any problem involving transcluded data can be considered as various combinations of no more than three classes of problems: presentational problems, addressing problems, and modification of the transcluded data.

Styling transcluded data

A simple but pervasive challenge in handling transclusions involves stylesheets. The ancestors of the transcluded element are typically quite different in the two contexts.

Why is the ancestry important? Mainly because stylesheet mechanisms for SGML (though not for low-end word processors) use inheritance down through ancestors to determine many formatting characteristics. For example, a P's font is usually not set on P at all, but inherited from a distant ancestor such as BOOK. This is also typically true of the font size for P, except that some kinds of intervening ancestors change it when present: the text of a P inside a FOOTNOTE or QUOTE may be smaller because the FOOTNOTE or QUOTE sets the smaller size, which is then inherited.

Typical style inheritance

Under the hood, what goes on is commonly that the formatter calculates fonts, geometry, colors, and linking semantics for each ancestor in turn, working down from the root element to the element being formatted. As each ancestor's specification is calculated, it becomes the ``basis'' for the next one down: most or all properties get re-applied to the next descendant, unless overridden. This is not a process specific to SGML applications; Brian Reid's Scribe system included it from the beginning.[2] Others also described similar algorithms.[3] But what happens when there are two sets of ancestors? Authors frequently transclude a paragraph that was not (in its originating context) buried within a FOOTNOTE or other special construct, to a referencing context where it is so buried. Which context wins?

[2] For retrospective information about Scribe see ``A high-level approach to computer document formatting,'' in the Conference Record of the Seventh Annual ACM Symposium on Principles of Programming Languages, January, 1980. In addition, Reid's dissertation is an absolute must-read: Scribe: A Document Specification Language and its Compiler, Ph.D. thesis, Carnegie-Mellon University, Pittsburgh, PA. Available as Technical Report CMU-CS-81-100.

[3] For example, John B. Smith and Stephen F. Weiss, ``Formatting Texts Accessed Randomly,'' in Software - Practice and Experience (SPE) 17(1), 1987.

For example, consider this transclusion:

In the referenced context, the STANZA has a FQGI (fully-qualified generic identifier, or the list of element type names for all its ancestors in order) of COLL/POET/POEM/STANZA. In the referencing context that STANZA ends up with quite a different FQGI: perhaps BOOK/CHAP/SEC/P/FOOTNOTE/QUO/STANZA (though see below for more details on this). Any such difference may lead to formatting differences, given typical state-of-the-art style mechanisms.

How should it work?

To calculate the ``right'' font and other layout parameters, we must decide how to relate the two contexts and use each to contribute to the appropriate style. From the reader's point of view several requirements must be met in the referencing display context, even though they may not always fit well together. Formatting that indicates logical qualities of the referenced data must be accurate for the referenced context, but the formatting must reflect the referenced data's status as a transcluded object within a certain context.

For the moment, make the simplifying assumption that an element's FQGI is the only thing given to a stylesheet to determine the element's formatting. That is, the stylesheet can set up layout parameters with knowledge of an element's ancestry, but no knowledge of its siblings, attributes, or other information. This is clearly insufficient for some applications, even ones so simple as auto-numbering; but it will do for the moment. Given that, the question reduces to ``what is the effective (or virtual) FQGI of a transcluded element?'' For the example above, we see at least these possibilities:

In general we believe the goals can be best achieved by combining the two contexts intelligently rather than choosing one or the other (the first two choices), since otherwise, depending on the author's intent, something important may be lost.

For example, any formatting associated with STANZA elements probably should still apply, since the transcluded data, after all, is one. On the other hand, the last-stated goal above, meshing formatting with that of the referencing document, cannot be achieved at all unless at least some of that document's context is applied.

Other combinations are possible, such as trying to combine things at a finer granularity, but this leads to many odd cases: any proposal involving taking some parameters (say, font) from one context, but others (such as color) from the other seems destined to become far too complex to explain. It would also share the known problems of proposals to combine stylesheets on a piecemeal basis.

On the whole, it seems the best compromise may be the short combination shown fourth above: calculate the style as usual in the referencing context, and then allow the referenced data to override it based on the GI at the root of the transcluded data (and of course internal ones if any), but not to inherit from the referenced context outside of the referenced data itself.

Before proceeding, we note that there are some cases where an extreme solution may be necessary: these are akin to other cases where formatting is critical for a particular purpose, and the usual strengths of descriptive markup systems such as SGML and XML may be overshadowed by other concerns. First, some documents have format as a part of their intrinsic meaning, such as concrete poetry where the layout is indeed part of the poem. Second, certain legal environments may require absolute format fidelity, such as mandatory warnings in aircraft manuals, where fine details of icon placement, geometry, font size, and the like may be regulated. In such settings of course the designer's options are more limited.

Apart from the issue of having two element contexts, there is a second issue. Typically, there are two style sheets active: one applicable to the referencing document, and one to the referenced document. It may be that the STANZA element has no style definition in the referencing context's applicable stylesheet, in which case it seems clear that the referenced context's should be used (there isn't any other obvious choice). But what if STANZA is defined in both? A couple of issues arise:

The first issue, name conflict, is insoluble in principle without some universal namespace to which local names such as STANZA can relate. XML is currently developing such mechanisms by providing syntax to declare a name as belonging to a particular ``namespace'' defined elsewhere. An alternative for the transclusion case is to modify the ``foreign'' names somehow, so that referencing stylesheet's own name usage cannot conflict: if it wants to affect transcluded STANZA elements, it would have to do it explicitly, such as by defining a style for QUO/#TRANSCLUDED#STANZA.

On the second issue, one can hardly ``average'' the colors, indents, or fonts specified, although some properties could be combined in a manner akin to Cascading Style Sheets. Assuming it can be determined that both definitions do refer to the same STANZA type conceptually, the easiest solution is to say that the referencing stylesheet wins. This allows local override, and is analogous to the usual solution for variable-name scoping in other contexts such as hierarchically scoped programming languages.

Stylesheet control of transclusion interpretation

A general solution for determining the virtual FQGI like the one above can work for many cases, but there will still be occasions in which a document designer will want to control interpretation of transcluded data on a case-by-case basis.

As one example, an author may transclude the same element into two locations, expecting two different presentations. One transclusion may be a cross-reference, and the author may desire that only the section's title be presented:

<PARA>In <XREF HREF="some.url#CHILD(3,CHAPTER)(2,SECTION)"/>,
the author suggests that pigs can fly.</PARA>
(This is very common in SGML systems.) But on another occasion, the author may actually desire that the entire section be presented to his reader:
<PARA>Here, see for yourself:</PARA>
<TRANSCLUDE HREF="some.url#CHILD(3,CHAPTER)(2,SECTION)"/>
Both of these examples are transclusions, since in both instances the author desires to include an object in another document as it exists at the time of the reader's access.

Similarly, a list item may be referred to in its original context, and the author may with the item's original number preserved (``See step 5.''). At another point, the author may wish to include that list item in her own list, numbered in the context of its new home. (These cases and those in the preceding paragraph are examples of the philosophical ``use vs. mention'' distinction.)

A stylesheet language designed with transclusion in mind can give the designer control over the nature of transcluded data's presentation. For the section reference above, a designer might specify that XREF elements should display a canonical object number (such as ``Section 3.2'') followed by the TITLE child of the transcluded object, and that the text should serve as a link to that object; whereas TRANSCLUDE elements should simply display the entire transcluded object.

A stylesheet mechanism like this can also address problems like those of the STANZA above. Instead of an application providing a single rule for determining the virtual FQGI of transcluded data, it might permit the stylesheet to state what aspects of presentation should come from which context.

Addressing and transclusion

Another interesting problem presented by transclusion is addressing - linking to documents that use transclusion, or even transcluding parts thereof. Let us consider what happens if a user selects a range in a document that uses transclusion. Let's take an example in which the user selects a range beginning outside transcluded information and ending within the transclusion, and marks that text for copying or exporting.

Exporting the text of the selection is not an interesting problem. But what if the selection is to be copied into a hypertext document?

The first case we consider is that an author makes the selection in another's document, and pastes the information into her own. The most likely behavior in this situation is to simply transclude the selection within the new document; recursive transclusions are no more complicated than a single one.

If the pasting operation is within a single document, transclusion would still be an option, but a more likely desired effect would be to copy the content and markup to the new location. The information native to the document could be copied as-is; the transcluded information, if selected in its entirety, could be referenced from the new location; if the selection is partial, the transclusion reference can be copied and then modified by the application, such that the reference now reflects the desired extent of the target information.

We also consider the special case in which the user has selected exactly the extent of a transcluded object (modulo some whitespace). The application's behavior here presents a potential field for market differentiation. One behavior would be to dumbly transclude the user's selection, creating a two-level transclusion. A more intelligent approach would be to recognize that the selection was composed solely of a transclusion, and create a transclusion in the new document directly to the original source of the information. In the latter case, the author might prefer that the reference actually be to the ``younger'' document; an application might prompt the user for the intended source of the information.

Transclusion with modification

A still more interesting case is the modification of transcluded objects. It is not at all uncommon to wish to edit quoted texts: to elide unnecessary verbage, or to replace ambiguous pronouns. Since such modifications will typically be different for every instance of modified transclusion, we suggest that instructions for modifying the transcluded information could be contained directly within the transcluding document.

For instance, DSSSL expression syntax or a similar syntax might be used:

<PARA>Then, the author ludicrously suggests:

<TRANSCLUDE HREF="some.url#CHILD(3,PARA)">
  <REPLACE>
    (list (list (node-list-first (children (current-node)))
                "[The author]")
          (list (node-list-tail (children (current-node))
                                89)
                (string #\[ #\horizontal-ellipsis #\])))
  </REPLACE>
</TRANSCLUDE>

</PARA>
which would replace the first character with the string ``[The author]'', and everything after a certain point with ``[...]''.

A hypertext language might also supply architectural forms or known semantics for a simple patching instruction set:

<TRANSCLUDE HREF="some.url#CHILD(3,PARA)">
  <REPLACE HREF="some.url#CHILD(3,PARA)STRING(1,'I',0)">
[The author]</REPLACE>
  <REPLACE
HREF="some.url#CHILD(3,PARA)STRING(23,'t',0)..DITTO()STRING(-1,'.',0)">
[&hellip;]</REPLACE>
</TRANSCLUDE>
which performs the same substitution with XLL addressing syntax.[4]

[4] As of 31 July 1997.

Most modification operations would be covered by a simple set of three operators: insert, delete, and replace, applicable to elements or strings therein. A hypertext language would provide the means of addressing regions to be modified.

Modification of transcluded information opens a variety of issues in user interfaces. Editing tools can distinguish themselves in how efficiently they can record the author's modifications. One particularly interesting area is that of reader modification. If a reader has permission to edit both the referring document and the referenced data, and changes transcluded information, which context should reflect the change? It is our suggestion that in this ambiguous case, a user interface should ask the user which was intended, since it seems unlikely that any general rule could anticipate the user's intention.

Conclusion

We believe that any case of transclusion can be considered by considering these three aspects (display, addressing, modification) separately. Market differentiation can occur based on how an application analyzes a transclusion problem. For instance, if a modified transclusion is copied to another document, an application might simply transclude the referenced section; it might copy the transclusion and modification information; or it might prompt the user to decide between those two options and the option to modify the original information again.

Some actions present complications of these aspects. For instance, out-of-line presentational aspects, such as footnotes, can pose a challenge to a user interface. Should the footnotes be presented along with those of the referring document, merged into the numbering? Should the original document's footnotes be displayed directly after the transcluded content - and if so, should they be numbered from 1, or should their original numbers be used? Should they be included at all?

There is also the thorny issue of information ownership. If someone is determined to violate the law or simply good taste by stealing information, there is little way to prevent it. With the current state of the Web, the pirate can simply copy from a document's source; unauthorized transclusion can at least let a curious reader determine the actual source of the information. It is important that user agents retain the original context of transcluded objects, so that information about the object can be determined, such as the ownership declared in metadata associated with the document's root element, cataloging information, and associated alternative stylesheets.

Financial issues must also be considered carefully. First, if transcluded information is only available to paying customers, a user agent should prompt the user before fetching it. (One would hope that a user agent would prompt before initiating any financial transaction.) And what if the transcluded information is subject to royalties? The creator of a transclusion would do well to examine the subject of her links; if her page proves to be popular, she may well become responsible for large sums to the information's owner.

In the highly non-static arena of the Web, there is also the issue of moving targets. It relates closely to ambiguities we encounter in speech: If someone says ``I want to meet the President,'' it seems obvious enough what they mean. But in fact he may mean two things, and we cannot tell which: he may wish to meet the particular person that held the office when they said it (even if she is no longer President later), or, instead, whoever happens to hold the office when the meeting occurs. In much the same way, a reference to given data may have different intents, which are hard to tease apart until something changes later (but then may become quite important).

We believe that no one solution can be proposed that will handle all forms of transclusion, but that there exists an excellent opportunity for user agents to distinguish themselves, as XLL makes transclusion an integral part of our Internet lives.

About the Authors

Steven J. DeRose

Steve DeRose has worked with document and hypermedia systems since 1979 when he joined the FRESS project. In 1989 he completed his Ph.D. at Brown University and co-founded Electronic Book Technologies (now part of Inso Corporation), developing DynaText and other products. In his role as Chief Scientist he is active in many standards committees such as TEI and XML, and is co-editor of XLL. He is a frequent speaker in industry and academe, and has written many papers and two books: Making Hypermedia Work: A User's Guide to HyTime (with David Durand), and The SGML FAQ Book. He lives, works, and ice-skates in Rhode Island with his wife Laurie and sons Todd and Brian.

Christopher R. Maden

Chris Maden holds a degree in electrical engineering from Brown University, where he first became interested in semantic markup. He began his career as a Technical Support Representative with Electronic Book Technologies, now a part of Inso Corporation, and now works heavily with SGML and DSSSL for O'Reilly, publishers of the popular Nutshell series of computer books. He was an active participant in the development of HTML 2.0 and of XML, and now sits on the XSL Working Group. He is writing a book on XML for O'Reilly, and lives in Newton, Massachusetts with his wife, Ellie.