Migrating from LaTeX and MathJax to Wolfram Language and MathML
• Posted in: Code, Web, Wolfram Language
Throughout the development of this blog, I have read a lot about the HTML standard specification. In fact, I consider HTML one of the greatest inventions in human history. What I love most about HTML is its semantics—I strongly support separation of content (semantics) and presentation (appearance). While typography helps conserve reader's attention and conveys the message more effectively, from a computational standpoint, elements like font size and color do not provide unequivocal information about a text's intended meaning.
In LaTeX, for example, I prefer the \emph
command to the \textit
command. In this case, \emph
(emphasis) conveys the author’s intent, contrary to \textit
(text italic). In general, I try to write custom LaTeX commands with semantically meaningful names that apply the desired formatting. Another example from my LaTeX writing style is defining the following mathematical command:
\newcommand{\order}[1]{O\bigl(#1\bigr)}
Using it like \order{x}
denotes the order of magnitude of x
and is printed using big-O notation, i.e., .
Even in Microsoft Word, I create and use styles with semantics in mind, even though Word was not designed with the separation of content and presentation in mind.
Also, when I write code, I try to write variables and functions around physical and mathematical concepts instead of memory addresses and bits. In this task, the Wolfram Language helps a lot because it has a higher level of abstraction than, e.g., Python.
Wolfram Language’s “raw” syntax (i.e., without syntactic sugar) is similar to M-expressions, providing a strictly regular syntax for expressing operations and transformations on data, which is very useful for writing mathematics. Wolfram Language went further by defining StandardForm, a unique and unambiguous representation of “raw” Wolfram Language expressions, i.e., FullForm. I would venture to say that StandardForm is superior to classical mathematical notation—or at the very least, its intent is highly valuable for human knowledge.
Let's consider the quadratic polynomial
that can be written in Wolfram Language as a*x^2 + b*x + c
. By using functions StandardForm
and FullForm
on that expression, we get the following:
(* StandardForm[a*x^2 + b*x + c] *)
c + b x + a x²
(* FullForm[a x^2 + b x + c] *)
Plus[c, Times[b, x], Times[a, Power[x, 2]]]
+
, *
, and ^
are syntactic sugar for actual function applications: Plus
, Times
, and Power
. It is the same with other operators and notations such as the space between variable names (x y
, multiplication) and the superscripts (x²
, power).
LaTeX notation
As a LaTeX user who prefers a semantic style of markup, I find TeX’s mathematical notation to be a significant shortcoming. The syntax used to write mathematical formulas in TeX is purely presentational, offering little semantic structure and considerable ambiguity. While it effectively typesets expressions in classical mathematical notation—allowing them to be correctly interpreted by readers—the source code itself lacks unambiguous meaning. Understanding a TeX expression by reading its code is a heuristic process: you recognize the author's intent through familiarity with the TeX language, but it is not inherently or reliably parsable by a computer.
In the TeX code ax^2 + bx + x
, for example, the term ax^2
can be read as a
times x^2
or as ax
squared. There is not an unambiguous specification on how the expression represents an actual mathematical operation. Still, one could argue that the ^
symbol applies only to x
, and that two adjacent letters denote implicit multiplication.
The ambiguity, however, becomes more complex and difficult to justify in an expression like
where
represents the Reynolds number. For example, an inexperienced LaTeX user may write it as Re^2
, but it would be rendered as
,
meaning
multiplied by
.
With knowledge of math formatting commands, one might write \mathrm{Re}^2
, where the braces imply grouping of Re
as a single variable identifier. However, the \mathrm
command merely applies math Roman font style, which does not convey semantic meaning.
MathJax
In some posts of this blog such as Wave Spectrum in Wolfram Language, I wrote math formulas in TeX language and configured MathJax to render them. In order to use MathJax, the following JavaScript code must be loaded in the HTML head
element:
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js"></script>
When loaded, the script renders the text between delimiters \(...\)
and \[...\]
as TeX equations, so the HTML looks like this:
<p>
Finally, we can verify (approximately!) that the zeroth moment
of the spectrum, \(\int_0^\infty S(f)\,df\), equals the variance
of the surface elevation. If we take the square root, we obtain
a metric of wave height in meters.
</p>
In the rendered web page, \(\int_0^\infty S(f)\,df\)
is replaced by a mjx-container
element that displays the equation with a TeX font. For instance:
Finally, we can verify (approximately!) that the zeroth moment of the spectrum,
, equals the variance of the surface elevation. If we take the square root, we obtain a metric of wave height in meters.
The mjx-container
element is non-standard HTML, and using TeX language inside the HTML source hinders the semantics of the web page by introducing foreign presentational notation. Furthermore, I’m trying to avoid using JavaScript on my website to ensure simple, secure, and lightweight content.
MathML
MathML is an XML-based mathematical markup language for the web. It comes in two complementary “flavors”: Presentation MathML and Content MathML.
Back to our quadratic polynomial, the Presentation MathML is the following:
<math xmlns='http://www.w3.org/1998/Math/MathML' display="block">
<mrow>
<mrow>
<mi>a</mi>
<mo>⁢</mo>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
</mrow>
<mo>+</mo>
<mrow>
<mi>b</mi>
<mo>⁢</mo>
<mi>x</mi>
</mrow>
<mo>+</mo>
<mi>c</mi>
</mrow>
</math>
The corresponding Content MathML code is below. Note that it resembles the Lisp language:
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<apply>
<plus />
<apply>
<times />
<ci>a</ci>
<apply>
<power />
<ci>x</ci>
<cn type='integer'>2</cn>
</apply>
</apply>
<apply>
<times />
<ci>b</ci>
<ci>x</ci>
</apply>
<ci>c</ci>
</apply>
</math>
The math
element can contain a mathematical expression in both types of MathML, conveying both appearance and semantics:
<math xmlns='http://www.w3.org/1998/Math/MathML' display="block">
<semantics>
<mrow>
<mrow>
<mi>a</mi>
...
</mrow>
<annotation-xml encoding='MathML-Content'>
<apply>
<plus />
...
</apply>
</annotation-xml>
</semantics>
</math>
Note that all the content of the math
element is wrapped in a semantics
element, and the Content MathML is wrapped in a annotation-xml
element following the Presentation MathML.
Writing MathML by hand is rarely an easy task, especially when the mathematical expression is very complex. This is where Wolfram Language helps. Function ExportString
can generate the Presentation and Content MathML of a given expression (see the documentation). For example:
ExportString[x^2, "MathML", "Content" -> True]
(* Results in:
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<semantics>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
<annotation-xml encoding='MathML-Content'>
<apply>
<power />
<ci>x</ci>
<cn type='integer'>2</cn>
</apply>
</annotation-xml>
</semantics>
</math>
*)
I can easily copy and paste the resulting MathML code into an HTML document. The math
element can be placed inside p
elements, flowing naturally with the text and integrating seamlessly into the content. Additionally, the display="block"
attribute in a math
element renders the math in a separate line, even in the middle of a p
element:
The expression doesn’t have the attribute. Now, with the attribute, the equation is displayed as where both are in the same
p
element, and the second equation is followed by a comma, being part of the sentence.
Alternatively, there are online MathML editors, such as iMathEQ, but they typically generate only Presentation MathML.
From now on, all the math on my website will be written in MathML, incorporating both Presentation and Semantic components. However, I still need to study how to style MathML content using CSS.