Concerning the ongoing discussions of a data model and the semantics of some HDF5 data layout here comes a first trial comparision of John Shalf's and my data model. Basically, we agree in a lot of points, but once it comes into some details, there are some differences. I try to concentrate just on these, without reviewing both data models in detail. Some differences are just other names, others concern which features are supported natively and which not, but I'll try also to include how missing features can be fixed in each model. If things are left unclear (I'm sure there are some, as here comes quite a lot in very compressed, incompletely explained form), let's discuss in detail. Any comments welcome. John, can you add how the mentioned data sets/data types would be formulated in your model? Especcially Thomas, but also Andre et al, do you see any constraints coming from the a possibly implementation in HDF5 on the suggested layouts? Usage of words: JS: Shape WB: Grid + Topology Timestep Slice In JS's model, timesteps can be ordered hierarchically: / T=0.0 / T=0.2 / T=0.4 / T=0.6 / T=0.8 / T=1.0 / T=1.2 / T=1.4 / T=1.6 / T=1.8 / T=2.0 ... In the WB model, this is not done in its standard form, but it can be extended to do so. Advantage is that timesteps/slices can be searched at some coarse range faster, i.e. without traversing each single timestep/slice. However, this will become significant for thousands of timesteps per file, for maybe less than than a thousand timesteps, it is probably not worth the effort (JS). So we may basically concentrate on the layout per timestep/slice, as the upper layout coincidences so far. Let us assume that we have: a 3D data set of gxx and gyy a 2D slice of gxx and gyy (both slices identical) an isosurface of gxx JS model orders data by variables (grid functions): /gxx /Unigrid3D /Shape (BBox, dimensions) /Data /Slice2D /Shape (BBox, dimensions) /Data /Isosurface,level=1.0 /Shape (connectivity, vertices) /gyy /Unigrid3D /Shape (BBox, dimensions) [link to /gxx/Unigrid3D/Shape/ ] /Data /Slice2D /Shape (BBox, dimensions) [link to /gxx/Slice2D/Shape/ ] /Data WB model orders data by grids: /Unigrid3D /Topology /Neighbourhood (Dims 128x128x128) /Cartesian /Positions (BBox) /gxx /gyy /Slice2D /Topology /Neighbourhood (Dims 64x64) /Cartesian /Positions (BBox) /gxx /gyy /Isosurface,level=1.0 /Topology /Cartesian /Positions (vertices) /Topology /[Topology] (reference to Topology) /Positions (connectivity) Surpressed is the implementation of a JS `Shape' object, this might include more hierarchy's, but might be similar to the WB Grid+Topology structure. So at this point just the `ordering by' issue is of relevance, not the specification of a shape, I just expanded that in detail in the WB model because I know better how to do it there. So from the native layout * JS model is optimized to find all shapes (grids) for a specific gridfunction * WB model is optimized to find all gridfuncs for a specific grid (shape) This leads to the following `critical' questions to each model, whereby we assume that the following axiom is true: Axiom: Following and handling links is faster than iterating through a hierarchy to find objects. For JS model: Q: How to detect that the shapes of two gridfunctions are equal? A: Use HDF5 links to use one and the same shape for multiple gridfunctions. Compare, if the shapes of to gridfuncs point to the same object. Disadvantage: data model depends on the `link' feature of the underlying IO layer. So it can't be easily ported to other IO libs/layers without that feature, it is quite bound to HDF5 and similar powerful libs. If the `link' feature is *not* used, then the shape has to be copied for each variable/gridfunc, and equality detection requires comparing each property of the shape instead of just comparing the link target. Q (to Thomas & Andre): Any comments concerning the complexity of readers/writers induced by using this `link' feature? E.g. when writing multiple 3D gridfuncs into a file, only the first write may really write the shape object, succeding writes must only generate links to the first shape object. For WB model: Q: How to find out in which shapes a specific grid function exists? A: Natively, iterate through hierarchy to find the same gridfunc in various shapes. By the demands of the above mentioned axiom, we want to avoid this. (Note that the axiom is no longer of relevance if there come issues like on-fly recombination of chunked data etc., as in this case hierarchy iteration is of minor relevance as compared to data recombination). Fix: Add links between gridfunc data objects. -> once one gridfunc found, others can be found by traversing through link list instead of searching hierarchy -> disadvantage: can't save gridfunc independently from another, once a gridfunc in a new shape/grid topology is saved, then the links of another gridfunc shape/grid topology has to be updated (is this a problem?) -> better: create global linktable per gridfunction without data -> can define: two data sets belong to the same gridfunc if they refer to the same gridfunc linktable -> independent of dataset name More Q&A for JS and WB model: Q: How to save a grid without data? How tos this fit into the the scheme? E.g. a tetrahedral grid, but no gridfunc. JS: don't know, as there is nothing associated with a gridfunc (John??) WB: Just save Grid + Topology, without data (natively supported) Q: How to relate data given on various aspects of the same grid? E.g. given tetrahedral grid plus: -> A, some data on each vertex -> B, some data on each tetrahedron -> C, some data on each face -> D, some data on each edge -> E, some data on a subset of tetrahedrons Need information, which faces and edges belong to the same tetrahedron. Also need information, which tetrahedrons share the same faces. Technically, these three data sets are all arrays if integers. JS: John, how is this formulated in your model? WB: /tetrahedral /Topology /Cartesian /Points (vertices) /A /Topology /[Topology] (reference to Topology) /Points (connectivity) /B /[Topology] (reference to Topology) /Points (which faces belong to each tetrahedron) /[Topology] (reference to Topology) /Points (which edges belong to each tetrahedron) /Topology /[Topology] (reference to Topology) /C /[Topology] /Points (which tetrahedrons share each face) /Topology /[Topology] (reference to Topology) /D Q: How to formulate some data given on some isosurface, e.g. isosurface of gxx at level 1.0, plus data of psi4 given on each vertex? JS: ? WB: /Isosurface,level=1.0 /Topology /Cartesian /Positions (vertices) /psi4 (data of psi4 given on each vertex) /Topology /[Topology] /Positions (connectivity) Q: How to formulate `geodesics' (or streamlines)? These are basically unconnected vertices, but the number might change with time, and in the worst case the numbering might change, too. So we need to store some `future/past' information, which relates a vertex at T=1.0 to a vertex in T=2.0 . JS: WB: /T=1.0 /geodesics /Topology /Cartesian /Positions (vertices) /alpha (proper time) /[T=2.0/geodesics/Topology] (reference to...) /Positions (which vertex is the future vertex of a vertex at T=1.0) /T=2.0 /geodesics /Topology /Cartesian /Positions (vertices) /alpha (proper time) /[T=1.0/geodesics/Topology] (reference to...) /Positions (which vertex was the past vertex at T=1.0 of a vertex) /[T=3.0/geodesics/Topology] (reference to...) /Positions (which vertex is the future vertex of a vertex at T=2.0) This file is also available via CVS in TIKSL/datamodel/datamodel.txt.