Physics Derivation Graph: Comparison of Design Options Documentation

Physics Derivation Graph navigation

Recommendation: Read the user documentation and FAQ first. This page assumes familiarity with the jargon used in the Physics Derivation Graph.

This page compares syntax options for the Physics Derivation Graph (PDG).

Comparison of Syntax Options

Here methods of capturing mathematical syntax required to describe derivations in physics are compared. This survey covers \(\rm\LaTeX\), Mathematical Markup Language (MathML), Mathematica, and SymPy. For MathML, both Presentation and Content forms are included.

Although \(\rm\LaTeX\) is intuitive for scientists and is concise, it is a typesetting language and not well suited for the web or use in Computer Algebra Systems (CAS). Mathematica is also concise and has wide spread use by scientists, though its cost limits accessibility. Mathematica is also proprietary, which limits the ability to explore the correctness of this CAS

\(\rm\LaTeX\) is concise and is widely used in the scientific community. It does not work well for portability to other representations and is ill-suited for use by CAS. For the initial phases of development for the Physics Derivation Graph, portability and compatibilty with a CAS are not the priority. Since getting content into the graph is the priority, the Latex representation will be used.

For this survey it is assumed that users cares about the learning curve, leveraging previous experience, how wide spread use in their community, speed, ease of input, presentation (rendering), ability to access content across devices, OS independence, ease of setup. To evaluate criteria relevant to users, including the ability to manually input syntax, the ability to transform between representations, and the ability to audit correctness.

Use of a single syntax for the graph content is important. To illustrate why, consider the approach where each syntax is used for its intended purpose -- Latex for rendering equation, SymPy for the CAS, and MathML for portability. This introduces a significant source of error when a single equation requires three distinct representation. The manual entry could result in the three representations not being sychronized. Thus, a single representation satisfying multiple criteria is needed. If no single syntax meets all the needs of the Physics Derivation Graph, then the requirements must be prioritized.

This comparison is between syntax methods which do not serve the same purpose. Latex is a type-setting language, while Mathematica and SymPy are Computer Algebra Systems (CAS). The reason these approaches for rendering and CAS were picked is twofold: they are widely used in the scientific community and they address requirements for the Physics Derivation Graph.

We can ignore syntax methods which do not support notation necessary for describing physics. Example of this include ASCII and HTML. Storage of the generated content (essentially a knowledge base for all of physics) isn't expected to exceed a Gigabyte, so compactness in terms of storage isn't a criterion in this evaluation.

Test Cases

to demonstrate the variety of uses in distinct domains of Physics, a set of test cases are provided in this section. These cases are not meant to be exhaustive of either the syntax or the scientific domain. Rather, they are examples of both capability requirements of the Physics Derivation Graph and of the syntax methods.

Case 1 is a second order polynomial. Algebra \begin{equation} a x^2 + b x + c = 0 \label{eq:polynomial_case1_body} \end{equation}

Case 2, Stokes' theorem, includes integrals, cross products, and vectors. Calculus \begin{equation} \int \int_{\sum} \vec{\nabla} \times \vec{F} \dot d\sum = \oint_{\partial \sum} \vec{F}\dot d\vec{r} \label{eq:stokes_case2_body} \end{equation}

Case 3: Tensor analysis. Einstein notation: contravariant = superscript, covariant = subscript. Used in electrodynamics \begin{equation} Y^i(X_j) = \delta^i_{\ j} \label{eq:tensor_analysis_case3_body} \end{equation}

Case 4 covers notation used in Quantum Mechanics. Symbols such as \(\hbar\) and Dirac notation are typically used.

Case 4a is the creation operator \begin{equation} \hat{a}^+ |n\rangle = \sqrt{n+1} |n+1\rangle \label{eq:creation_operator_case4a_body} \end{equation}

Case 4b is the uncertainty principle \begin{equation} \sigma_x \sigma_p \geq \frac{\hbar}{2} \label{eq:uncertainty_principle_case4b_body} \end{equation}

Case 4c: Lüders projection \begin{equation} |\psi\rangle \rightarrow \sum_n |c_n|^2 P_n,\ \rm{where}\ P_n = \sum_i |\psi_{ni}\rangle \langle \psi_{ni}| \label{eq:Luders_projection_case4c_body} \end{equation}

Quantitative Comparison of Test Cases

Name	case 1	case 2	case 3	case 4a	case 4b	case 4c
Latex	20	101	26	45	39	110
PMathML	324	538	348	372	250
CMathML	381
Mathematica
SymPy

MathML is comprised of empty elements (symbols, e.g., <plus/>), token elements (both ASCII and entities), and annotation elements. The token elements in Presentation MathML include mi=identifiers, mn=numbers, mo=operators. The token elements in Content MathML include ci=identifiers and cn=numbers.

Character count for the MathML was carried out using

wc -m mathML_presentation_case*.xml

Qualitative Comparisons of Syntax Methods

Latex, MathML, and SymPy are free and open source. Mathematica is proprietary and not free.

For Physicists comfortable writing journal articles in Latex or exploring ideas in Mathematica, these are natural syntax methods. Both Latex and Mathematica are concise, making them intuitive to read and quick to enter. MathML is a verbose syntax which is lengthy to manually enter and yield difficult to read the native XML.

Unicode is needed to support Dirac notation and any other non-ASCII text in MathML

Transforming between Syntax options

Wolfram Research offers the ability to convert from Mathematica expressions to MathML on their site www.mathmlcentral.com/Tools/ToMathML.jsp. A CAS typically produces output syntax such as Latex or MathML in a single format. However, there are often many ways to represent the same math, equation \ref{eq:example_partial_derivative_representations}.

Auditing the Correctness of Derivations

Latex and Presentation MathML are intended for rendering equations and are not easily parsed consistently by a CAS. For example, scientists and mathematicians often render the same partial differential operation in multiple ways: \begin{equation} \frac{\partial^2}{\partial t^2}F = \frac{\partial}{\partial t}\frac{\partial F}{\partial t} = \frac{\partial^2 F}{\partial t^2} = \frac{\partial}{\partial t}\dot{F} = \frac{\partial \dot{F}}{\partial t} = \ddot{F}. \label{eq:example_partial_derivative_representations} \end{equation} All of these are equivalent.