Comparison of Database Options for the Physics Derivation Graph

navigation / documentation overview / design choices / database comparison

Recommendation: Read the user documentation and FAQ first. This page assumes familiarity with the jargon used in the Physics Derivation Graph.

This page compares databases that have been used and could be used for the Physics Derivation Graph (PDG).

Historical design evolution

The Physics Derivation Graph has progressed through multiple architectures, with data structure changes keeping pace with the developer's knowledge.

  1. plain text: databases for comments, connections, equations, operators. Perl script to convert database content to images. One line per entry in each database.
  2. XML:
  3. CSV:

Each of these have required a rewrite of the code from scratch, as well as transfer code (to move from n to n+1). The author didn't know about property graphs when implementing v1, v2, and v3.

Within a given implementation, there are design decisions with trade-offs to evaluate. Knowing all the options or consequences is not feasible until one or more are implemented. Then the inefficiencies can be observed. Knowledge gained through evolutionary iteration is expensive and takes a lot of time.

Databases NOT used

A few storage methods were considered and then rejected without a full implementation.

Networkx

NetworkX supports Latex for node labels on graphs; see https://networkx.org/documentation/stable/auto_examples/drawing/plot_labels_and_colors.html

There may have been a problem with long Latex strings, but I don't have an example showing that.

import networkx as nx
G=nx.digraph()
G.add_edge([8332941,8482459])
G.add_edge([8482459,6822583])
G.add_edge([5749291,6822583])
G.add_edge([6822583,8341200])
G.add_edge([8341200,9483715])
G.add_edge([8837284,9483715])
G.add_edge([9483715,9380032])
G.add_edge([9380032,8345721])
nx.plot()
plt.show()

GraphML

https://en.wikipedia.org/wiki/GraphML

http://graphml.graphdrawing.org/primer/graphml-primer.html

https://dl.acm.org/doi/10.1016/j.entcs.2004.12.037: "GXL to GraphML and Vice Versa with XSLT"

See GraphML file format.






























GXL (Graph eXchange Language)

https://en.wikipedia.org/wiki/GXL

http://www.gupro.de/GXL/

Trivial Graph Format

https://en.wikipedia.org/wiki/Trivial_Graph_Format

RDF/OWL

The Physics Derivation Graph can be expressed in RDF.

Each step in a derivation could be put in the subject–predicate–object triple form. For example, suppose the step is

Input 1: y=mx+b
inference rule: multiply both sides by
feed: 2
output 2: 2*y = 2*m*x + 2*b

Putting this in RDF,

step 1 | has input | y=mx+b
step 1 | has inference rule | multiply both sides by
step 1 | has feed | 2
step 1 | has output | 2*y = 2*m*x + 2*b

While it's easy to convert, I am unaware of the advantages of using RDF. The Physics Derivation Graph is oriented towards visualization. SPARQL is the query language for RDF. Using RDF independent of using a computer algebra system for validation of the step.

Property Graph databases NOT used

https://db-engines.com/en/ranking/graph+dbms

See also "Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries", https://arxiv.org/pdf/1910.09017.pdf

Example comparison: https://db-engines.com/en/system/Blazegraph%3BNeo4j%3BOrientDB

TinkerPop

Gremlin query language for TinkerPop.

See https://stackoverflow.com/questions/13824962/neo4j-cypher-vs-gremlin-query-language
and https://www.nebula-graph.io/posts/graph-query-language-comparison-cypher-gremlin-ngql

https://tinkerpop.incubator.apache.org/

memgraph.com

"Drop-in replacement for Neo4j graph database with full Cypher and Bolt compatibility." has open-source and free repos: https://github.com/memgraph/memgraph and https://github.com/memgraph/documentation

4 years old? No entry on wikipedia Not much traffic on https://news.ycombinator.com beyond https://news.ycombinator.com/item?id=23165172 and https://news.ycombinator.com/item?id=24651091

GraphDB

Not free; not open source

OrientDB

open source and free: https://github.com/orientechnologies/orientdb

Titan (INACTIVE)

https://github.com/thinkaurelius/titan/

Aimed at multi-node scalability

blazegraph

Supports RDF and SPARQL APIs

Blazegraph TinkerPop3 Implementation (blazegraph-gremlin)

"The concept behind blazegraph-gremlin is that property graph (PG) data can be loaded and accessed via the TinkerPop3 API, but underneath the hood the data will be stored as RDF using the PG data model described in this document. Once PG data has been loaded you can interact with it just like you would interact with ordinary RDF - you can run SPARQL queries or interact with the data via the SAIL API."

Status: no activity since 2020.