Before reading this, you may wish to download and print out the formatted RTF output of the sample SGML and stylesheet.

An Ad-Hoc Proposal for Extensible Semantic Mathematical Markup

A common problem with mathematical markup schema is that it is difficult to reconcile presentational and semantic aspects of math. While it is desirable to have semantic markup for access by modeling programs, mathematical tradition relies heavily on presentational features to communicate information to readers.

Another problem is that mathematicians and scientists are constantly inventing new notations to explain new ideas. Any fixed mathematical DTD is doomed to disuse because of this.

This poster presents an informal suggestion for a method of using extensible semantic markup together with a stylesheet to control presentational features. I'll use Maxwell's equations of electrodynamics for a demonstration.

All functions are represented as single elements, whose children are their arguments. At the first level, we introduce generic functions: multiplication, negation, fractions, addition, dot products, and cross products. (See the first <eqns> element in the sample SGML.)

As a further level of abstraction, we introduce more specific operations; for instance, instead of representing the curl of a vector as a cross of nabla and the vector, we introduce a curl element which takes the vector as its single child. (See the second <eqns> element in the SGML.)

One immediately apparent benefit is that because the nabla is generated by the semantics of the <curl> element, it is no longer italicized like other vectors, giving a more appropriate appearance. (Compare the first and second equation groups in the output.)

The third level of abstraction does not introduce any new element types or stylesheet dependencies, but leads to even more semantic markup. I have defined entities for each variable; by referring to the entity instead of marking up a letter, the SGML is more easily interpreted even in the absence of a stylesheet. (See the third <eqns> element and the internal subset in the SGML.)

The DTD is simple. Elements are grouped by how many arguments they can take. This list can and should be extended by users as necessary for their needs. The stylesheet provides examples for inserting the mathematical symbols corresponding to the markup: addition places plus signs between every child; dot product places a cross between the two children.

Grouping is not dealt with here. In theory, DSSSL is very powerful, it should be possible to calculate where parentheses are necessary in the stylesheet. However, that would be extremely complex, and I suggest simply using a <group> or <paren> element to contain things which it is necessary to group.

This stylesheet uses tables to format groups of equations. This is only because DSSSL's math flow objects were not yet supported by Jade when this was developed. The math flow objects would obviously be a much better choice to use in the stylesheet.

For comparison, I've marked up the same equations in MathML, both presentational and semantic. Here is a critique of MathML in particular, and math DTD efforts in general.

All files discussed here, plus an additional document instance and formatted output with some examples from Donald Knuth's The TeXbook, can be found in this zip file.

Best viewed with any browser.

WebTechs HTML 2.0 Checked! Last updated and validated 19 February 1998 by Chris Maden.