Geff specification
The graph exchange file format is zarr
based. A graph is stored in a zarr group, which can have any name. This allows storing multiple geff
graphs inside the same zarr root directory. A geff
group is identified by the presence of a geff_version
attribute in the .zattrs
. Other geff
metadata is also stored in the .zattrs
file of the geff
group. The geff
group must contain a nodes
group and an edges
group.
Geff metadata
geff_metadata
Type: objectGeff metadata schema to validate the attributes json file in a geff zarr
Geff Version
Type: stringMust match regular expression:
(0\.0)|(0\.1)
Directed
Type: booleanRoi Min
Type: array of numberNo Additional Items
Each item of this array must be:
Roi Max
Type: array of numberNo Additional Items
Each item of this array must be:
Position Attr
Type: string Default: "position"Axis Names
Default: nullNo Additional Items
Each item of this array must be:
Axis Units
Default: nullNo Additional Items
Each item of this array must be:
The nodes
group
The nodes group will contain an ids
array and an attrs
group.
The ids
array
The nodes\ids
array is a 1D array of node IDs of length N
> 0, where N
is the number of nodes in the graph. Node ids must be unique. Node IDs can have any type supported by zarr, but we recommend integer dtypes. For large graphs, uint64
might be necessary to provide enough range for every node to have a unique ID.
The attrs
group and node attribute
groups
The nodes\attrs
group will contain one or more node attribute
groups, each with a values
array and an optional missing
array.
values
arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of thevalues
array must have the same length as the nodeids
array, such that each row of the attributevalues
array stores the attribute for the node at that index in the ids array.- The
missing
array is an optional, a one dimensional boolean array to support attributes that are not present on all nodes. A 1 at an index in themissing
array indicates that thevalue
of that attribute for the node at that index is None, and the value in thevalues
array at that index should be ignored. If themissing
array is not present, that means that all nodes have values for the attribute.
Note
When writing a graph with missing attributes to the geff format, you must fill in a dummy value in the values
array for the nodes that are missing the attribute, in order to keep the indices aligned with the node ids.
- The
position
group is a special node attribute group that must be present and does not allow missing attributes. - The
seg_id
group is an optional, special node attribute group that stores the segmenatation label for each node. Theseg_id
values do not need to be unique, in case labels are repeated between time points. If theseg_id
group is not present, it is assumed that the graph is not associated with a segmentation.
The edges
group
Similar to the nodes
group, the edges
group will contain an ids
array and an attrs
group. If there are no edges in the graph, the edge group is not created.
The ids
array
The edges\ids
array is a 2D array with the same dtype as the nodes\ids
array. It has shape (2, E)
, where E
is the number of edges in the graph. All elements in the edges\ids
array must also be present in the nodes\ids
array.
Each row represents an edge between two nodes. For directed graphs, the first column is the source nodes and the second column holds the target nodes. For undirected graphs, the order is arbitrary.
Edges should be unique (no multiple edges between the same two nodes) and edges from a node to itself are not supported.
The attrs
group and edge attribute
groups
The edges\attrs
group will contain zero or more edge attribute
groups, each with a values
array and an optional missing
array.
values
arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of thevalues
array must have the same length as theedges\ids
array, such that each row of the attributevalues
array stores the attribute for the edge at that index in the ids array.- The
missing
array is an optional, a one dimensional boolean array to support attributes that are not present on all edges. A 1 at an index in themissing
array indicates that thevalue
of that attribute for the edge at that index is missing, and the value in thevalues
array at that index should be ignored. If themissing
array is not present, that means that all edges have values for the attribute.
If you do not have any edge attributes, the edges\attrs
group should still be present, but empty.
Example file structure and metadata
TODO: Example metadata for this file structure
/path/to.zarr
/tracking_graph
.zattrs # graph metadata with `geff_version`
nodes/
ids # shape: (N,) dtype: uint64
attrs/
position/
values # shape: (N, 3) dtype: float16
color/
values # shape: (N, 4) dtype: float16
missing # shape: (N,) dtype: bool
edges/
ids # shape: (E, 2) dtype: uint64
attrs/
distance/
values # shape: (E,) dtype: float16
score/
values # shape: (E,) dtype: float16
missing # shape: (E,) dtype: bool
# optional:
/segmentation
# unspecified, but totally okay:
/raw