Graph generation/parsing

Currently, the Graphs are updated manually. That is error prone, because the information shown in them is also defined in the dockerfiles, configuration YAML file and CI workflows. Ideally, graphs would be auto-generated by parsing/analysing those. However, the complexity of the graphs is non-trivial. There are many tools for automatically generating diagrams from large datasets, but not so many with features for visualizing complex hierarchical nets. The requirements we are willing to satisfy are the following:

  • Open source tool and format(s).

  • Generates diagrams programmatically from a text representation and/or through a Python or Golang API.

  • Supports hierarchical DAGs with multiple levels of hierarchy. I.e. (nested) clusters which are nodes and have ports.

  • Generates SVG of the hierarchical DAGs.

  • Supports node styles (shape and colour).

  • Ideally:

    • Clusters are collapsible when shown in an HTML site.

    • Text in the nodes can be links to web sites.

So far, the following options were analysed:

Graphviz and Gephi are probably the most known tools for generating all kinds of diagrams. Unfortunately, none of them supports hierarchical clusters as required in this use case. The same issue applies to aafigure and graphdracula.

yEd is a very interesting product, and fits most (if not all) the technical constraints. However, it’s not open source. The editor is freeware and the SDK is paid. For programmatic generation, the SDK is required. The draw.io/mxgraph toolkit seems to be the most similar open source solution. However, using SDKs for building custom interactive diagraming tools feels overkill in this use case. We don’t need graphs to be editable through a GUI!

Similarly, Tikz and SVG generation libraries (such as svgo) can be used for having the work done, but they require writing a non-negligible plumbing which would increase the maintenance burden, instead of reducing it. This would be a last resort.

Although d3-hwschematic and netlistsvg are very different use cases, the JSON format used in elkjs might be suitable.

dagre-d3 is meant for DAGs and it supports nested clusters (experimentally, Dagre D3 Demo: Clusters). Although clusters seem not to have ports, it might be an easy update from the current solution. Since it’s a client-side JS library, it does not write an SVG file to disk by default, but achieving it should be trivial.

As a result, it seems that the most suitable solution might be using the JSON format from elkjs, either with elkjs or with dagre-d3. Yet, generating an SVG programmatically seems not to be as straightforward as using other solutions such as Graphviz’s dot. The following references illustrate advanced features for building custom views/GUIs/editors:

However, it seems that writing a JSON is cumbersome. On the one hand, some nodes need to have a size for them to be shown. On the other hand, it seems not possible to draw edges across hierarchies. Ports need to be explicitly defined for that purpose. Therefore, the complexity of generating the JSON given a set of nodes, edges and clusters is non-trivial.

Note

Branch utils/pyHDLC/map.py@pymap contains work in progress. First, GenerateMap builds a DAG by parsing the dockerfiles. Then, report prints the content in the terminal, for debugging purposes. Last, dotgraph generates a Graphviz dot diagram. The dot diagram does not have clusters. We want to add those by parsing the GitHub Actions workflows (see below). However, we want to first reproduce the dot output using elkjs. See function elkjsgraph in utils/pyHDLC/map.py@pymap. Do you want to give it a try? Let us know or join the chat!

Reading dockerfiles

One of the two sources of information for the graph are dockerfiles. As far as we are aware, there is no tool for generating a DAG from the stages of a dockerfile. However, gh:asottile/dockerfile is an interesting Python module which wraps docker/moby’s golang parser. Hence, it can be used for getting the stages and COPY --from or --mount statements for generating the hierarchy. See utils/pyHDLC/map.py.

Reading GitHub Actions workflow files

The second source of information are CI workflow files. Since YAML is used, reading it from any language is trivial, however, semantic analysis needs to be done. Particularly, variables from matrix need to be expanded/replaced. gh:nektos/act is written in golang, and it allows executing GitHub Actions workflows locally. Therefore, it might have the required features. However, as far as we are aware, it’s not meant to be used as a library.