Computing multidimensional persistence

Gunnar Carlsson

COMPUTING MULTIDIMENSIONAL PERSISTENCE*

Gunnar Carlsson, ${ }^{\dagger}$ Gurjeet Singh, ${ }^{\ddagger}$ and Afra Zomorodian ${ }^{\S}$

Abstract. The theory of multidimensional persistence captures the topology of a multifiltration - a multiparameter family of increasing spaces. Multifiltrations arise naturally in the topological analysis of scientific data. In this paper, we give a polynomial time algorithm for computing multidimensional persistence. We recast this computation as a problem within computational commutative algebra, and utilize algorithms from this area to solve it. While the resulting problem is EXPSPACE-complete and the standard algorithms take doubly-exponential time, we exploit the structure inherent within multifiltrations to yield practical algorithms. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.

1 Introduction

In this paper, we give a polynomial time algorithm for computing the persistent homology of a multifiltration. The computed solution is compact and complete, but not a topological invariant. Theoretically, this is the best one may hope for because compact complete invariants do not exist for multidimensional persistence [2]. We discuss computing an incomplete invariant, the rank invariant, and give an algorithm for reading off this invariant from the complete solution. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.

1.1 Motivation

Intuitively, a multifiltration models a growing space that is parameterized along multiple dimensions. For example, the complex with coordinate $(3,2)$ in Figure 1 is filtered along the horizontal and vertical dimensions, giving rise to a bifiltration. Multifiltrations arise naturally in topological analysis of scientific data. Often, such data is in the form of a finite set of noisy samples from some underlying topological space. Our goal is to robustly recover the lost connectivity of the underlying space. If the sampling is dense enough, we may approximate the space as a union of balls by placing $\epsilon$ -balls around each point. As we increase $\epsilon$ , we obtain a growing family of spaces, a 1-dimensional multifiltration,

^[1]

*The authors were partially supported by the following grants: G. C. by NSF DMS-0354543; A. Z. by DARPA HR 0011-06-1-0038, ONR N 00014-08-1-0908, and NSF CCF-0845716; all by DARPA HR 0011-05-1-0007. A portion of the work was done while the second author was at Stanford University.
${ }^{\dagger}$ Stanford University, gunnar@math.stanford.edu
${ }^{\ddagger}$ Ayasdi, Inc., gurjeet@ayasdi.com
§Dartmouth College, afra@cs.dartmouth.edu ↩︎

Figure 1: A bifiltration. The complex with labeled vertices is at coordinate (3,2). Simplices are highlighted and named at the critical coordinates that they appear.
also called a filtration. This approximation is the central idea behind many methods for computing the topology of a point set [22]. Often, however, the input point set is filtered via multiple functions. For instance, in analyzing the structure of natural images, we filter the data according to density [1]. We now have multiple dimensions along which our space is filtered. That is, we have a multifiltration.

We characterize a multifiltration through invariants. An invariant is a function that assigns identical objects to isomorphic structures. The trivial invariant assigns the same object to all structures, and is useless. The complete invariant assigns different objects to non-isomorphic structures, and is powerful. We want to obtain a discrete invariant: an invariant that yields a finite description and is not dependent on the underlying field of computation [2]. Therefore, in the ideal setting, we would like a complete discrete invariant for the structure of a multifiltration.

1.2 Prior Work

For one-dimensional filtrations, the theory of persistent homology provides a complete discrete invariant called a barcode, a multiset of intervals [23]. Each interval in the barcode corresponds to the lifetime of a single topological feature within the filtration. Since intrinsic features have long lives, while noise is short-lived, a quick examination of the intervals gives a robust estimation of the topology. The existence of a complete discrete invariant, as well as efficient algorithms and fast implementations have led to successful applications of persistent homology to a variety of problems, such as shape description [5], denoising volumetric density data [12], detecting holes in sensor networks [7], analyzing neural activity in the visual cortex [18], and analyzing the structure of natural images [1], to name a few.

For multifiltrations of dimension higher than one, the situation is much more complicated. The theory of multidimensional persistence shows that no complete discrete invariant exists. Instead, the authors propose an incomplete invariant, the rank invariant, which captures some persistent information. Unfortunately, this invariant is not compact, requiring

large storage, so its direct computation using the one-dimensional persistence algorithm is not feasible. A variant of the problem of multidimensional persistence appears in computer vision [10]. There is also a partial solution called vineyards [4]. A full solution, however, has not been attempted by any prior work.

1.3 Contributions

In this paper, we provide a complete solution to the problem of computing multidimensional persistence.

We recast the computation to a problem within computational commutative algebra, allowing us to utilize algorithms from this area.
We exploit the structure provided by a multifiltration to greatly simplify the algorithms.
We show that the resulting algorithms are polynomial time, unlike their original counterparts, which are EXPSPACE-complete, requiring exponential space and time.
We also provide algorithms to read off the rank invariant from our solution.
We implement all algorithms and show that the multidimensional setting requires different data structures than the one-dimensional case for efficient computation. In particular, the change in approach allows for parallelization.
We analyze the running time of our implementations with a suite of statistical tests with random multifiltrations.

As we shall see, our approach gives a specification of multidimensional persistence in terms of a set of polynomials. While this specification is complete, it is not an invariant, so our results are in line with the previous result that showed the non-existence of a complete invariant [2]. The lack of invariance means that we may not use our solution to compare multifiltrations directly. Instead, we give polynomial-time algorithms for reading off the rank invariant from our solution. As expected, the rank invariant is incomplete. Moreover, its direct computation requires exponential time and space.

We begin with background on multidimensional persistence in Section 2. In Section 3, we formalize the input to our algorithms and justify it. In Section 4, we reinterpret the problem of computing multidimensional persistence within computational commutative algebra. Having recast the problem, we use algorithms from this area to solve the problem. This gives us a computationally intractable solution. In Section 5, we simplify the algorithms by using the structure within multifiltrations. This simplification allows us to derive a polynomial time algorithm from the original doubly-exponential time algorithms. In Section 7, we describe our implementations of and furnish experiments to validate our work in practice.

2 Background

In this section, we review the theory of multidimensional persistence. We begin by formalizing multifiltrations. We then review a sequence of theories of homology: simplicial, persistent, and multidimensional. We end with a description of the rank invariant.

2.1 Multifiltrations

Let $\mathbb{N} \subseteq \mathbb{Z}$ be the set of non-negative integers and $\mathbb{R}$ be the set of real numbers. For vectors in $\mathbb{N}^{n}$ or $\mathbb{R}^{n}$ , we say $u \leq v$ if $u_{i} \leq v_{i}$ for all $1 \leq i \leq n$ , and define the $\geq$ relation similarly. The relations $\leq, \geq$ form partial orders on $\mathbb{N}^{n}$ and $\mathbb{R}^{n}$ . A topological space $X$ is multifiltered if we are given a family of subspaces $\left\{X_{u}\right\}_{u}$ , where $u \in \mathbb{N}^{n}$ , so that $X_{u} \subseteq X_{v}$ whenever $u \leq v$ . We call the family of subspaces $\left\{X_{u}\right\}_{u}$ a multifiltration. A one-dimensional multifiltration is called a filtration.

If $X$ is a cell complex, all subsets $X_{u}$ must also be cell complexes, as shown for a bifiltered simplicial complex in Figure 1. A critical coordinate $u$ for cell $\sigma \in X$ is a minimal coordinate, with respect to the partial order $\leq$ , such that $\sigma \in X_{u}$ . In a multifiltration, any path with monotonically increasing coordinates is a filtration, such as any row or column in the figure. Multifiltrations constitute the input to our algorithms. We motivate their use as a model for scientific data as well as formalize them in the next section.

2.2 Homology

Given a topological space, homology is a topological invariant that is often used in practice as it is easily computable. Here, we describe simplicial homology briefly, referring the reader to Hatcher [13] as a resource. We assume our input is a simplicial complex $K$ , such as the complexes in Figure 1. We note, however, that our results carry over to arbitrary cell complexes, such as simplicial sets [9], $\Delta$ -complexes [13], and cubical complexes [14].

The $i$ th chain group $C_{i}(K)$ of $K$ is the free Abelian group on $K$ 's set of oriented $i$ simplices. An element $c \in C_{i}(K)$ is an $i$ -chain, $c=\sum_{j} n_{j}\left[\sigma_{j}\right], \sigma_{j} \in K$ with coefficients $n_{j} \in$ $\mathbb{Z}$ . Given such a chain $c$ , the boundary operator $\partial_{i}: C_{i}(K) \rightarrow C_{i-1}(K)$ is a homomorphism defined linearly by its action on any simplex $\sigma=\left[v_{0}, v_{1}, \ldots, v_{i}\right] \in c$ ,

\partial_{i} \sigma=\sum_{j}(-1)^{j}\left[v_{0}, \ldots, \hat{v}_{j}, \ldots, v_{i}\right]

where $\hat{v}_{j}$ indicates that $v_{j}$ is deleted from the vertex sequence. The boundary operator connects the chain groups into a chain complex $C_{*}$ :

\cdots \rightarrow C_{i+1}(K) \xrightarrow{\partial_{i+1}} C_{i}(K) \xrightarrow{\partial_{i}} C_{i-1}(K) \rightarrow \cdots

Using the boundary operator, we may define subgroups of $C_{i}$ : the cycle group $\operatorname{ker} \partial_{i}$ and the the boundary group $\operatorname{im} \partial_{i+1}$ . Since $\partial_{i} \circ \partial_{i+1} \equiv 0$ , then $\operatorname{im} \partial_{i+1} \subseteq \operatorname{ker} \partial_{i} \subseteq C_{i}(K)$ . The $i$ th homology group is

H_{i}(K)=\operatorname{ker} \partial_{i} / \operatorname{im} \partial_{i+1}

and the $i$ th Betti number is $\beta_{i}(K)=\operatorname{rank} H_{i}(K)$ . Over field coefficients $k, H_{i}$ is a $k$ -vector space of dimension $\beta_{i}$ .

2.3 Persistent Homology

Given a multifiltration $\left\{X_{u}\right\}_{u}$ , the homology of each subspace $X_{u}$ over a field $k$ is a vector space. For instance, the bifiltered complex in Figure 1 has zeroth homology vector spaces isomorphic to the commutative diagram

where the dimension of the vector space counts the number of components of the complex, and the maps between the homology vector spaces are induced by the inclusion maps relating the subspaces. Persistent homology captures information contained in the induced maps. There are two equivalent definitions that we use in this paper. The first definition was originally for filtrations only [8], but was later extended to multifiltrations [2]. The key idea is to relate the homologies of a pair of complexes. For each pair $u, v \in \mathbb{N}^{n}$ with $u \leq v$ , $X_{u} \subseteq X_{v}$ by definition, so $X_{u} \hookrightarrow X_{v}$ . This inclusion, in turn, induces a linear map $\iota_{i}(u, v)$ at the $i$ th homology level $H_{i}\left(X_{u}\right) \rightarrow H_{i}\left(X_{v}\right)$ that maps a homology class within $X_{u}$ to the one that contains it within $X_{v}$ . The $i$ th persistent homology is $\operatorname{im} \iota_{i}$ , the image of $\iota_{i}$ for all pairs $u \leq v$ . This definition also enables the definition of an incomplete invariant. The $i$ th rank invariant is

\rho_{i}(u, v)=\operatorname{rank} \iota_{i}(u, v)

for all pairs $u \leq v \in \mathbb{N}^{n}$ , where $\iota_{i}$ is defined above [2]. While this definition provides intuition, it is inexpedient for theoretical development. For most of our paper, we use a second definition of persistence that is grounded in algebraic topology, allowing us to utilize tools from commutative algebra for computation [23, 2].

2.4 Multidimensional Persistence

The key insight for the second definition below is that the persistent homology of a multifiltration is the homology of a single algebraic entity. This object encodes all the complexes via polynomials that track cells through the multifiltration. To define our algebraic structure, we need to first review graded modules over polynomials. A monomial in $x_{1}, \ldots, x_{n}$ is a product of the form

x_{1}^{v_{1}} \cdot x_{2}^{v_{2}} \cdots x_{n}^{v_{n}}

with $v_{i} \in \mathbb{N}$ . We denote it $x^{v}$ , where $v=\left(v_{1}, \ldots, v_{n}\right) \in \mathbb{N}^{n}$ . A polynomial $f$ in $x_{1}, \ldots, x_{n}$ and coefficients in field $k$ is a finite linear combination of monomials, $f=\sum_{v} c_{v} x^{v}$ , with $c_{v} \in k$ . The set of all such polynomials is denoted $k\left[x_{1}, \ldots, x_{n}\right]$ . For instance, $5 x_{1} x_{2}^{2}-7 x_{3}^{3} \in$

$k\left[x_{1}, x_{2}, x_{3}\right]$ . An $n$ -graded ring is a ring $R$ equipped with a decomposition of Abelian groups $R \cong \oplus_{v} R_{v}, v \in \mathbb{N}^{n}$ so that multiplication has the property $R_{u} \cdot R_{v} \subseteq R_{u+v}$ . Elements in a single group $R_{u}$ are called homogeneous. The set of polynomials form the $n$ -graded polynomial ring, denoted $A^{n}$ . This ring is graded by $A_{v}^{n}=k x^{v}, v \in \mathbb{N}^{n}$ . An $n$ -graded module over an $n$ -graded ring $R$ is an Abelian group $M$ equipped with a decomposition $M \cong \oplus_{v} M_{v}, v \in \mathbb{N}^{n}$ together with a $R$ -module structure so that $R_{u} \cdot M_{v} \subseteq M_{u+v}$ . An $n$ -graded module is finitely generated if it admits a finite generating set. Also, recall the notion of a free module on an $n$ -graded set and a basis for such a module [2].

Given a multifiltration $\left\{X_{u}\right\}_{u}$ , the $i$ th dimensional homology is the following $n$ graded module over $A^{n}$

\bigoplus_{u} H_{i}\left(X_{u}\right)

where the $k$ -module structure is the direct sum structure and $x^{v-u}: H_{i}\left(X_{u}\right) \rightarrow H_{i}\left(X_{v}\right)$ is the induced homomorphism $\iota_{i}(u, v)$ we described in the previous section. This view of homology yields two important results. In one dimension, the persistent homology of a filtration is easily classified and parameterized by the barcode, and there is an efficient algorithm for its computation [23]. In higher dimensions, no similar classification exists [2]. Instead, we may utilize an incomplete invariant. One such invariant, the rank invariant defined above, is provably equivalent to the barcode, and therefore complete, in one dimension, but it is incomplete in higher dimensions.

3 One-Critical Multifiltrations

We are interested in persistent homology as a tool for analyzing the topology of scientific data. In this section, we begin by formalizing such data. We then show that topological analysis of scientific data naturally generates multifiltrations. In particular, the process generates multifiltrations with the following property.

Definition 1 (one-critical). A multifiltered complex $K$ where each cell $\sigma$ has a unique critical coordinate $u_{\sigma}$ is one-critical.

In the rest of this paper, we assume that our input multifiltrations are one-critical. General multifiltrations, however, may not have this property. Therefore, we end this section by describing a classic construction that eliminates multiple critical coordinates in such input.

3.1 Model for Scientific Data

We are often given scientific data in the form of a set of noisy samples from some underlying geometric space. At each sample point, we may also have measurements from the ambient space. For example, a fundamental goal in graphics is to render objects under different lighting from different camera positions. One approach is to construct a digitized model using data from a range scanner, which employs multiple cameras to sense 3D positions on an object’s surface, as well as estimated normals and texture information [19]. An alternate

approach samples the four-dimensional light field of a surface directly and interpolates to render the object without explicit surface reconstruction [15]. Either approach gives us a set of noisy samples with measurements. Similarly, a node in a wireless sensor network has sensors on board that measure physical attributes of the local environment, such as pressure and temperature [21]. The GPS coordinates of the nodes constitute a set of samples at which several functions are sampled.

Formally then, we assume we have a manifold $X$ with $n-1$ Morse functions defined on it [16]. In practice, $X$ is often embedded within a high-dimensional Euclidean space $\mathbb{R}^{d}$ , although this is not required. As such, we model the data using the following definition.

Definition 2 (multifiltered dataset). A multifiltered dataset is $\left(S,\left\{f_{j}\right\}_{j}\right)$ , where $S$ is a finite set of $d$ -dimensional points with $n-1$ real-valued functions $f_{j}: S \rightarrow \mathbb{R}$ defined on it, for $n>1$ .

3.2 Construction

We now assume our data is a multifiltered dataset $\left(S,\left\{f_{j}\right\}_{j}\right)$ . We begin by approximating the underlying space of $S$ with a combinatorial representation, a complex, built on $S$ . There are a variety of methods for building such complexes, all of which have a scale parameter $\epsilon$ [22]. As we increase $\epsilon$ , a complex grows larger, and fixing a maximum scale $\epsilon_{\max }$ gives us a filtered complex $K$ . Each cell $\sigma \in K$ enters $K$ at scale $\epsilon(\sigma)$ . We formalize this type of complex next.

Definition 3 (scale-filtered complex). A scale-filtered complex is the tuple $(K, \epsilon)$ , where $K$ is a finite complex, $\epsilon: K \rightarrow \mathbb{R}$ , and the complexes $K_{\mu}=\{\sigma \mid \epsilon(\sigma) \leq \mu\}$ form a onedimensional filtration for $K$ .

We assume we have a scale-filtered complex $(K, \epsilon)$ defined on our input point set $S$ . To incorporate the functions $f_{j}$ into our data analysis, we first extend them to the cells in the complex. For $\sigma \in K$ and $f_{j}$ , let $f_{j}(\sigma)$ be the maximum value $f_{j}$ takes on $\sigma$ 's vertices; that is, $f_{j}(\sigma)=\max _{v \in \sigma} f_{j}(v)$ , where $v \in S$ . This extension defines $n-1$ functions on the complex, $f_{j}: K \rightarrow \mathbb{R}$ . We combine all filtration functions into a single multivariate function $F: K \rightarrow \mathbb{R}^{n}$ , where

F(\sigma)=\left(f_{1}(\sigma), f_{2}(\sigma), \ldots, f_{n-1}(\sigma), \epsilon(\sigma)\right)

We multifilter $K$ via the excursion sets $\left\{K_{u}\right\}_{u}$ of $F$ for $u \in \mathbb{R}^{n}$ :

K_{u}=\{\sigma \in K \mid F(\sigma) \leq u\}

Each simplex $\sigma$ enters $K_{u}$ at $u=F(\sigma)$ and will remain in the complex for all $u \geq F(\sigma)$ . Equivalently, $F(\sigma)$ is the unique critical coordinate at which $\sigma$ enters the filtered complex. That is, the multifiltrations built by the above process are always one-critical.

Example 1 (bifiltration criticals). The bifiltration in Figure 1 is one-critical, since each simplex enters at a single critical coordinate. For instance, $F(a)=(1,1), F(c d e)=(3,1)$ , and $F(a f)=(1,2)$ .

Since $K$ is finite, we have a finite set of critical coordinates that we may project on each dimension $j$ to get a finite set of critical values $C_{j}$ . We restrict ourselves to the Cartesian product $C_{1} \times \ldots \times C_{n}$ of the critical values, parameterizing the resulting discrete grid using $\mathbb{N}$ in each dimension. This parameterization yields a a $d$ -dimensional multifiltration $\left\{K_{v}\right\}_{v}$ with $v \in \mathbb{N}^{n}$ .

We end by noting that one-critical multifiltrations may be represented compactly by the set of tuples

\{(\sigma, F(\sigma)) \mid \sigma \in K\}

This representation is the main input to our algorithms in Section 4.3.

3.3 Mapping Telescope

In general, multifiltrations are not one-critical since a cell may enter at multiple incomparable critical coordinates, viewing $\leq$ as a partial order on $\mathbb{N}^{n}$ . For example, in Figure 1, the vertex $d$ that enters at $(1,0)$ may also enter at $(0,1)$ as the two coordinates are incomparable. For such multifiltrations, we may utilize the mapping telescope, a standard algebraic construction, to ensure that each cell has a unique critical coordinate [13]. Intuitively, this construction introduces additional shadow cells into the multifiltration without changing its topology. We will not detail this construction here as none of the multifiltrations we encounter in practice require the conversion. We should note, however, that the mapping telescope increases the size of the multifiltration, depending on the number of cells with multiple critical points. In the worst case, the growth is exponential.

4 Using Computational Commutative Algebra

Having described our input, we next recast the problem of computing multidimensional persistence as a problem within computational commutative algebra. We then describe standard algorithms from this area that solve our problem. While this process gives us a solution, this solution is not practical as the algorithms are computationally intractable. In the next section, we refine them to derive polynomial-time algorithms.

4.1 Multigraded Homology

We begin by extending homology to multifiltered cell complexes. We then convert the computation of the latter to standard questions in computational commutative algebra.

Definition 4 (chain module). Given a multifiltered cell complex $\left\{K_{u}\right\}_{u}$ , the ith chain module is the $n$ -graded module over $A^{n}$

C_{i}=\bigoplus_{u} C_{i}\left(K_{u}\right)

where the $k$ -module structure is the direct sum structure and $x^{v-u}: C_{i}\left(K_{u}\right) \rightarrow C_{i}\left(K_{v}\right)$ is the inclusion $K_{u} \hookrightarrow K_{v}$ .

Note that we overload notation to reduce complexity by having $C_{i}=C_{i}\left(\left\{K_{u}\right\}_{u}\right)$ when the multifiltration is clear from context. The module $C_{i}$ is $n$ -graded as for any $u \in \mathbb{N}^{n},\left(C_{i}\right)_{u}=C_{i}\left(K_{u}\right)$ . That is, the chain complex in grade $u$ of the module is the chain complex of $K_{u}$ , the cell complex with coordinate $u$ .
Example 2 (bifiltration module). Consider the vertex $d$ in the bifiltered complex in Figure 1. This vertex has critical coordinate $(1,0)$ , so copies of this vertex exist in 9 complexes $K_{u}$ for $u \geq(1,0)$ . The inclusion maps relate these copies within the complexes. In turn, polynomials relate the chain groups in the different grades of the module. Let $d$ be the copy of the vertex in coordinate $(1,0)$ . Then, within $C_{i}$ , we have $d$ in grade $(1,0), x_{1} d$ in grade $(2,0), x_{2} d$ in grade $(1,1), x_{1}^{2} x_{2}^{2} d$ in grade $(3,2)$ and so on, as required by the definition of an $n$ -graded module. In other words, a simplex has different names in different grades.

The graded chain modules $C_{i}$ are finitely generated, so we may choose bases for them.
Definition 5 (standard basis). The standard basis for the $i^{\text {th }}$ chain module $C_{i}$ is the set of $i$ -simplices in critical grades.
Example 3 (bifiltration bases). For our bifiltration in Figure 1, the highlighted and named simplices constitute the standard bases. For example, the standard basis for $C_{0}$ is

grade	$(0,0)$	$(1,0)$	$(1,1)$
simplices	$b, c, e, f$	$d$	$a$

Note that in doing so, we have made a choice of ordered basis. Unlike for chain groups, this choice has an important consequence: Our resulting calculations will not be invariant but depend on the initial ordered basis.

Recall that our multifiltrations are one-critical. The graded chain groups of onecritical multifiltrations are free: Since each cell enters only once, the resulting chain groups do not require any relations. Since our graded chain groups are free, the boundary operator is simply a homomorphism between free graded modules. Given standard bases, we may write the boundary operator $\partial_{i}: C_{i} \rightarrow C_{i-1}$ explicitly as a matrix with polynomial entries.
Example 4 (boundary matrix). For the bifiltration in Figure 1, $\partial_{1}$ has the matrix

	$a b$	$b c$	$c d$	$d e$	$e f$	$a f$	$b f$	$c e$
$a$	$x_{2}$	0	0	0	0	$x_{2}$	0	0
$b$	$x_{1} x_{2}^{2}$	$x_{1}^{2} x_{2}^{2}$	0	0	0	0	$x_{2}^{2}$	0
$c$	0	$x_{1}^{2} x_{2}^{2}$	$x_{1}$	0	0	0	0	$x_{2}$
$d$	0	0	1	1	0	0	0	0
$e$	0	0	0	$x_{1}$	$x_{1}^{2}$	0	0	$x_{2}$
$f$	0	0	0	$0$	$x_{1}^{2}$	$x_{1} x_{2}^{2}$	$x_{2}^{2}$	0

where we assume we are computing over $\mathbb{Z}_{2}$ .
As in standard homology, the boundary operator connects the graded chain modules into a chain complex $C_{*}$ (Equation (1)) and the ith homology module is defined exactly as before (Equation (2)):

H_{i}=\operatorname{ker} \partial_{i} / \operatorname{im} \partial_{i+1}

4.2 Recasting the Problem

Our goal is to compute homology modules. Following the definition, we have three tasks:

Compute the boundary module $\operatorname{im} \partial_{i+1}$ .
Compute the cycle module $\operatorname{ker} \partial_{i}$ .
Compute the quotient $H_{i}$ .

We next translate these three tasks into problems in computational commutative algebra. Both the boundary and cycle modules turn out to be submodules of free and finitely generated modules that consist of vectors of polynomials. For the rest of this paper, we assume that we are computing homology over the field $k$ . Recall from Section 2.4 that our module is defined over the $n$ -graded polynomial ring $A^{n}=k\left[x_{1}, \ldots, x_{n}\right]$ with standard grading $A_{v}^{n}=k x^{v}, v \in \mathbb{N}^{n}$ . For notational simplicity, we will use $R=A^{n}$ to denote this ring for the remainder of this section. Let $R^{m}$ be the Cartesian product of $m$ copies of $R$ . In other words, $R^{m}$ consists of all column $m$ -vectors of polynomials:

R^{m}=\left\{\left[f_{1}, \ldots, f_{m}\right]^{\mathrm{T}} \mid f_{i} \in R, 1 \leq i \leq m\right\}

To distinguish elements of $R^{m}$ from polynomials, we adopt the standard practice of placing them in bold format, so that $\mathbf{f} \in R^{m}$ is a vector of polynomials, but $f \in R$ is a polynomial. We use this practice exclusively for elements of $R^{m}$ and not for other vectors, such as elements of $\mathbb{N}^{n}$ . We now recast the three problems:

The boundary module is a submodule of the polynomial module. The matrix $M_{i+1}$ for $\partial_{i+1}$ has $m_{i}$ rows and $m_{i+1}$ columns, where $m_{j}$ denotes the number of $j$ -simplices in the complex. Let $F=\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{m}_{\mathbf{i+1}}}\right), \mathbf{f}_{\mathbf{i}} \in R^{m_{i}}$ , where $\mathbf{f}_{\mathbf{i}}$ is the $i$ th column in $M_{i+1}$ . This tuple of polynomial vectors generate a submodule of $R^{m_{i}}$ :

\langle F\rangle=\left\{\sum_{j=1}^{m_{i+1}} q_{j} \mathbf{f}_{\mathbf{j}} \mid q_{j} \in R\right\}

The Submodule Membership Problem asks whether a polynomial vector $\mathbf{f}$ is in a submodule $M$ , such as $\langle F\rangle$ . That is, the problem asks whether we may write $\mathbf{f}$ in terms of some basis $F$ as above. A solution to this problem would complete our first task.
2. The cycle submodule is also a submodule of the polynomial module. The matrix for $\partial_{i}$ has $m_{i-1}$ rows and $m_{i}$ columns. Let $F=\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{m}_{\mathbf{i}}}\right), \mathbf{f}_{\mathbf{i}} \in R^{m_{i-1}}$ , where $\mathbf{f}_{\mathbf{i}}$ is the $i$ th column in the matrix. Given $F$ , the set of all $\left[q_{1}, \ldots, q_{m_{i}}\right]^{\mathrm{T}}, q_{i} \in R$ such that

\sum_{i=1}^{m_{i}} q_{i} \mathbf{f}_{\mathbf{i}}=\mathbf{0}

is a $R$ -submodule of $R^{m_{i}}$ called the (first) syzygy module of $\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{m}_{\mathbf{i}}}\right)$ , denoted $\operatorname{Syz}\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{m}_{\mathbf{i}}}\right)$ . A set of generators for this submodule would complete our second task.

Our final task is simple, once we have completed the first two tasks. All we need to do is test whether the generators of the syzygy submodule, our cycles, are in the boundary submodule. As we shall see, the tools which allow us to complete the first two tasks also resolve this question.

4.3 Algorithms

In this section, we begin by reviewing concepts from commutative algebra that involve the polynomial module $R^{m}$ We then look at algorithms for solving the submodule membership problem and computing generators for the syzygy submodule. In our treatment, we follow Chapter 5 of Cox, Little, and O’Shea [6].

The standard basis for $R^{m}$ is $\left\{\mathbf{e}_{\mathbf{1}}, \ldots, \mathbf{e}_{\mathbf{m}}\right\}$ , where $\mathbf{e}_{\mathbf{i}}$ is the standard basis vector with constant polynomials 0 in all positions except 1 in position $i$ . We use the “top down” order on the standard basis vectors, so that $\mathbf{e}_{\mathbf{i}}>\mathbf{e}_{\mathbf{j}}$ whenever $i<j$ . A monomial $\mathbf{m}$ in $R^{m}$ is an element of the form $x^{u} \mathbf{e}_{\mathbf{i}}$ for some $i$ and we say $\mathbf{m}$ contains $\mathbf{e}_{\mathbf{i}}$ .

For algorithms, we need to order monomials in both $R$ and $R^{m}$ . For $u, v \in \mathbb{N}^{n}$ , we say $u>_{\text {lex }} v$ if the vector difference $u-v \in \mathbb{Z}^{n}$ , the leftmost nonzero entry is positive. The lexicographic order $>_{\text {lex }}$ is a total order on $\mathbb{N}^{n}$ . For example, $(1,3,0)>_{\text {lex }}(1,2,1)$ since $(1,3,0)-(1,2,1)=(0,1,-1)$ and the leftmost nonzero entry is 1 . Now, suppose $x^{u}$ and $x^{v}$ are monomials in $R$ . We say $x^{u}>_{\text {lex }} x^{v}$ if $u>_{\text {lex }} v$ . This gives us a monomial order on $R$ . We next extend $>_{\text {lex }}$ to a monomial order on $R^{m}$ using the “position-over-term” (POT) rule: $x^{u} \mathbf{e}_{\mathbf{i}}>x^{v} \mathbf{e}_{\mathbf{j}}$ if $i<j$ , or if $i=j$ and $x^{u}>_{\text {lex }} x^{v}$ . Every element $\mathbf{f} \in R^{m}$ may be written, in a unique way, as a $k$ -linear combination of monomials $\mathbf{m}_{\mathbf{i}}$ ,

\mathbf{f}=\sum_{i} c_{i} \mathbf{m}_{\mathbf{i}}

where $c_{i} \in k, c_{i} \neq 0$ and the monomials $\mathbf{m}_{\mathbf{i}}$ are ordered according to the monomial order. We define:

Each $c_{i} \mathbf{m}_{\mathbf{i}}$ is a term of $\mathbf{f}$ .
The leading coefficient of $\mathbf{f}$ is $\operatorname{LC}(\mathbf{f})=c_{1} \in k$ .
The leading monomial of $\mathbf{f}$ is $\operatorname{LM}(\mathbf{f})=\mathbf{m}_{\mathbf{1}}$ .
The leading term of $\mathbf{f}$ is $\operatorname{LT}(\mathbf{f})=c_{1} \mathbf{m}_{\mathbf{1}}$ .

Example 5. Let $\mathbf{f}=\left[5 x_{1} x_{2}^{2}, 2 x_{1}-7 x_{3}^{3}\right]^{\mathrm{T}} \in R^{2}$ . Then, we may write $\mathbf{f}$ in terms of the standard basis (Equation (5)):

\begin{aligned} \mathbf{f} & =5\left[x_{1} x_{2}^{2}, 0\right]^{\mathrm{T}}-7\left[0, x_{3}^{3}\right]^{\mathrm{T}}+2\left[0, x_{1}\right]^{\mathrm{T}} \\ & =5 x_{1} x_{2}^{3} \mathbf{e}_{1}-7 x_{3}^{3} \mathbf{e}_{2}+2 x_{1} \mathbf{e}_{2} \end{aligned}

From the second line, the monomials corresponding to sum (5) are $\mathbf{m}_{\mathbf{1}}=x_{1} x_{2} \mathbf{e}_{\mathbf{1}}, \mathbf{m}_{\mathbf{2}}=$ $x_{3}^{3} \mathbf{e}_{2}$ , and $\mathbf{m}_{\mathbf{3}}=x_{1} \mathbf{e}_{2}$ . The second term of $\mathbf{f}$ is $7\left[0, x_{3}^{3}\right]$ and we have $\operatorname{LC}(\mathbf{f})=5, \operatorname{LM}(\mathbf{f})=x_{1} x_{2}^{2}$ , and $\operatorname{LT}(\mathbf{f})=5 x_{1} x_{2}^{2}$ .

Finally, we extend division and least common multiple to monomials in $R$ and $R^{m}$ . Given monomials $x^{u}, x^{v} \in R$ , if $v \leq u$ , then $x^{v}$ divides $x^{u}$ with quotient $x^{u} / x^{v}=x^{u-v}$ . Now, let $w \in \mathbb{N}^{n}$ by $w_{i}=\max \left(u_{i}, v_{i}\right)$ and define the monomial $x^{w}$ to be the least common multiple of $x^{u}$ and $x^{v}$ , denoted $\operatorname{LCM}\left(x^{u}, x^{v}\right)=x^{w}$ . Next, given monomials $\mathbf{m}=x^{u} \mathbf{e}_{\mathbf{i}}$ and $\mathbf{n}=x^{v} \mathbf{e}_{\mathbf{j}}$ in $R^{m}$ , we say $\mathbf{n}$ divides $\mathbf{m}$ iff $i=j$ and $x^{v}$ divides $x^{u}$ , and define the quotient to be $\mathbf{m} / \mathbf{n}=x^{u} / x^{v}=x^{u-v}$ . In addition, we define

\operatorname{LCM}\left(x^{u} \mathbf{e}_{\mathbf{i}}, x^{v} \mathbf{e}_{\mathbf{j}}\right)= \begin{cases}\operatorname{LCM}\left(x^{u}, x^{v}\right) \mathbf{e}_{\mathbf{i}}, & i=j \\ 0, & \text { otherwise }\end{cases}

Clearly, the LCM of two monomials is a monomial in $R$ and $R^{m}$ , respectively.
Example 6. Let $\mathbf{f}=\left[x_{1}, x_{1} x_{2}\right]^{\mathrm{T}}$ and $\mathbf{g}=\left[x_{2}, 0\right]^{\mathrm{T}}$ be elements of $R^{2}$ . Then, the LCM of their leading monomials is:

\begin{aligned} \operatorname{LCM}(\operatorname{LM}(\mathbf{f}), \operatorname{LM}(\mathbf{g})) & =\operatorname{LCM}\left(x_{1} \mathbf{e}_{1}, x_{2} \mathbf{e}_{1}\right) \\ & =x_{1} x_{2} \mathbf{e}_{1} \end{aligned}

Recall the Submodule Membership Problem: Given a polynomial vector $\mathbf{f}$ and a set of $t$ polynomials $F$ , is $\mathbf{f} \in\langle F\rangle$ ? We may divide $\mathbf{f}$ by $F$ using the division algorithm Divide in Figure 2. After division, we have $\mathbf{f}=\left(\sum_{i=1}^{t} q_{i} \mathbf{f}_{\mathbf{i}}\right)+\mathbf{r}$ , so if the remainder $\mathbf{r}=\mathbf{0}$ , then $\mathbf{f} \in\langle F\rangle$ . This condition, however, is not necessary for modules over multivariate polynomials as we may get a non-zero remainder even when $f \in\langle F\rangle$ .

Let $M$ be an submodule and $\operatorname{LT}(M)$ be the set of leading terms of elements of $M$ . A Gröbner basis is a basis $G \subseteq M$ such that $\langle\operatorname{LT}(G)\rangle=\langle\operatorname{LT}(M)\rangle$ . If $\mathbf{f} \in\langle F\rangle$ , we always get $r=0$ after division of $\mathbf{f}$ by a Gröbner basis for $\langle F\rangle$ , so we have solved the membership problem. The Buchberger algorithm in Figure 3 computes a Gröbner basis $G$ starting from any basis $F$ . The algorithm utilizes $S$ -polynomials on line 4 to eliminate the leading

Divide \(\left(\mathbf{f},\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right)\right)\)
    \(1 \mathbf{p} \leftarrow \mathbf{f}, \mathbf{r} \leftarrow 0\)
    2 for \(i \leftarrow 1\) to \(t\)
        do \(q_{i} \leftarrow 0\)
    while \(\mathbf{p} \neq \mathbf{0}\)
        do if \(\operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\) divides \(\operatorname{LT}(\mathbf{p})\) for some \(i\)
            then \(q_{i} \leftarrow q_{i}+\operatorname{LT}(\mathbf{p}) / \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\)
            \(\mathbf{p} \leftarrow \mathbf{p}-\left(\operatorname{LT}(\mathbf{p}) / \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\right) \mathbf{f}_{\mathbf{i}}\)
            else \(\mathbf{r} \leftarrow \mathbf{r}+\operatorname{LT}(\mathbf{p})\)
            \(\mathbf{p} \leftarrow \mathbf{p}-\operatorname{LT}(\mathbf{p})\)
    return \(\left(\left(q_{1}, \ldots, q_{t}\right), \mathbf{r}\right)\)

Figure 2: The Divide algorithm divides $\mathbf{f} \in R^{m}$ by an $t$ -tuple $\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right), \mathbf{f}_{\mathbf{i}} \in R^{m}$ to get $\mathbf{f}=\left(\sum_{i=1}^{m} q_{i} \mathbf{f}_{\mathbf{i}}\right)+\mathbf{r}$ , where $q_{i} \in R$ and $\mathbf{r} \in R^{m}$ .

\(\operatorname{BuChBerger}(F)\)
    \(G \leftarrow F\)
    repeat \(G^{\prime} \leftarrow G\)
        foreach pair \(\mathbf{f} \neq \mathbf{g} \in G\)
            \(\left(\left(q_{1}, \ldots, q_{t}\right), \mathbf{r}\right) \leftarrow \operatorname{Divide}(S(\mathbf{f}, \mathbf{g}), G)\)
            if \(\mathbf{r} \neq \mathbf{0}\)
                then \(G \leftarrow G \cup\{\mathbf{r}\}\)
            until \(G=G^{\prime}\)
    return \(G\)

Figure 3: The algorithm Buchberger completes a given basis $F$ to a Gröbner basis $G$ by incrementally adding the remainders of $S$ -polynomials (Equation (7)) divided by the current basis.
terms of polynomial vectors and complete the given basis to a Gröbner basis. The syzygy polynomial vector or $S$ -polynomial $S(\mathbf{f}, \mathbf{g}) \in R^{m}$ of $\mathbf{f}$ and $\mathbf{g}$ is

\begin{aligned} S(\mathbf{f}, \mathbf{g}) & =\frac{\mathbf{h}}{\mathrm{LT}(\mathbf{f})} \mathbf{f}-\frac{\mathbf{h}}{\mathrm{LT}(\mathbf{g})} \mathbf{g}, \quad \text { where } \\ \mathbf{h} & =\operatorname{LCM}(\operatorname{LM}(\mathbf{f}), \operatorname{LM}(\mathbf{g})) \end{aligned}

A Gröbner basis generated by the algorithm is neither minimal nor unique. A reduced Gröbner basis is a Gröbner basis $G$ such that for all $\mathbf{g} \in G, \operatorname{LC}(\mathbf{g})=1$ and no monomial of $\mathbf{g}$ lies in $\langle\mathrm{LT}(G-\{\mathbf{g}\}\rangle$ . A reduced Gröbner basis is both minimal and unique. We may compute a reduced Gröbner basis by reducing each polynomial in $G$ in turn, replacing $g \in G$ with the remainder of $\operatorname{Divide}(g, G-\{g\})$ . Since the algorithm is rather simple, we do not present pseudo-code for it. The Divide, Buchberger, and the reduction algorithms together solve the submodule membership problem and, in turn, our first task of computing $\operatorname{im} \partial_{i+1}$ .

We next compute generators for the syzygy submodule to complete our second task. We begin by computing a Gröbner basis $G=\left\{\mathbf{g}_{1}, \ldots, \mathbf{g}_{s}\right\}$ for $\langle F\rangle$ , where the vectors are ordered by monomial order $>_{\text {lex }}$ . We then compute $\operatorname{Divide}\left(S\left(\mathbf{g}_{\mathbf{i}}, \mathbf{g}_{\mathbf{j}}\right), G\right)$ for each pair of Gröbner basis elements. Since $G$ is a Gröbner basis, the remainder of this division is $\mathbf{0}$ , giving us

S\left(\mathbf{g}_{\mathbf{i}}, \mathbf{g}_{\mathbf{j}}\right)=\sum_{k=1}^{s} q_{i j k} \mathbf{g}_{\mathbf{k}}

Let $\epsilon_{1}, \ldots, \epsilon_{\mathbf{s}}$ be the standard basis vectors in $R^{s}$ and let

\begin{aligned} & \mathbf{h}_{\mathbf{i j}}=\operatorname{LCM}\left(\operatorname{LT}\left(\mathbf{g}_{\mathbf{i}}, \mathbf{g}_{\mathbf{j}}\right)\right) \\ & \mathbf{q}_{\mathbf{i j}}=\sum_{k=1}^{s} q_{i j k} \epsilon_{\mathbf{k}} \in R^{s} \end{aligned}

For pairs $(i, j)$ such that $\mathbf{h}_{\mathbf{i j}} \neq 0$ , we define $\mathbf{s}_{\mathbf{i j}} \in R^{s}$ by

\mathbf{s}_{\mathbf{i j}}=\frac{\mathbf{h}_{\mathbf{i j}}}{\operatorname{LT}\left(\mathbf{g}_{\mathbf{i}}\right)} \epsilon_{\mathbf{i}}-\frac{\mathbf{h}_{\mathbf{i j}}}{\operatorname{LT}\left(\mathbf{g}_{\mathbf{j}}\right)} \epsilon_{\mathbf{j}}-\mathbf{q}_{\mathbf{i j}} \in R^{s}

with $\mathbf{s}_{\mathbf{i j}}=\mathbf{0}$ , otherwise. Schreyer’s Theorem states that the set $\left\{\mathbf{s}_{\mathbf{i j}}\right\}_{i j}$ form a Gröbner basis for $\operatorname{Syz}\left(\mathbf{g}_{\mathbf{1}}, \ldots, \mathbf{g}_{\mathbf{s}}\right)$ [6, Chapter 5, Theorem 3.3]. Clearly, we may compute this basis using Divide. We use this basis to find generators for $\operatorname{Syz}\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right)$ .

Let $M_{F}$ and $M_{G}$ be the $m \times t$ and $m \times s$ matrices in which the $\mathbf{f}_{\mathbf{i}}$ 's and $\mathbf{g}_{\mathbf{i}}$ 's are columns, respectively. As both bases generate the same module, there is a $t \times s$ matrix $A$ and an $s \times t$ matrix $B$ such that $M_{G}=M_{F} A$ and $M_{F}=M_{G} B$ . To compute $A$ , we initialize $A$ to be the identity matrix and add a column to $A$ for each division on line 4 of Buchberger that records the pair involved in the $S$ -polynomial. The matrix $B$ may be computed by using the division algorithm. To see how, notice that each column of $M_{F}$ is divisible by $M_{G}$ since $M_{G}$ is a Gröbner basis for $M_{F}$ . Now there is a column in $B$ for each column $\mathbf{f}_{\mathbf{i}} \in M_{F}$ , which is obtained by division of $\mathbf{f}_{\mathbf{i}}$ by $M_{G}$ . Let $\mathbf{s}_{\mathbf{1}}, \ldots, \mathbf{s}_{\mathbf{t}}$ be the columns of the $t \times t$ matrix $I_{t}-A B$ . Then,

\operatorname{Syz}\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right)=\left\langle A \mathbf{s}_{\mathbf{i j}}, \mathbf{s}_{\mathbf{1}}, \ldots, \mathbf{s}_{\mathbf{t}}\right\rangle

giving us the syzygy generators [6, Chapter 5, Proposition 3.8]. We refer to the algorithm sketched above as Schreyer’s algorithm. This algorithm completes the second task.

The third task is to compute the quotient $H_{i}$ given $\operatorname{im} \partial_{i+1}=\langle G\rangle$ and $\operatorname{ker} \partial_{i}=$ $\operatorname{Syz}\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right)$ . We simply need to find whether the columns of $\operatorname{ker} \partial_{i}$ can be represented as a combination of the basis for $\operatorname{im} \partial_{i+1}$ . The modules $H_{i}$ may be computed using the division algorithm. We divide every column in $\operatorname{ker} \partial_{i}$ by $\operatorname{im} \partial_{i+1}$ using the Divide algorithm. If the remainder is non-zero, we add the remainder both to $\operatorname{im} \partial_{i+1}$ and $H_{i}$ so as to count only unique cycles.

A Gröbner basis of a module depends on the choice of the ordered basis, so our resulting specification of homology is not unique up to the module, and therefore, not an invariant. This means, for instance, that we cannot compare two Gröbner bases to determine if they represent the same module. That is, while our solution is complete, it is not an invariant. For this reason, we give polynomial time algorithms to read off a discrete invariant in Section 6 from our results. This invariant is, however, incomplete as predicted by prior work [2].

While the above algorithms solve the membership problem, they have not been used in practice due to their complexity. The submodule membership problem is a generalization of the Polynomial Ideal Membership Problem (PIMP) which is EXPSPACE-complete, requiring exponential space and time [17, 20]. Indeed, the Buchberger algorithm, in its original form, is doubly-exponential and is therefore not practical.

5 Multigraded Algorithms

In this section, we show that multifiltrations provide additional structure that may be exploited to simplify the algorithms from the previous section. These simplifications convert

these intractable algorithms into polynomial time algorithms. Throughout this section, the field $k$ of coefficients is the field with two elements $\mathbb{Z}_{2}$ , for simplicity. Our treatment, however, generalizes to any arbitrary field.

5.1 Exploiting Homogeneity

The key property that we exploit for simplification is homogeneity.
Definition 6 (homogeneous). Let $M$ be an $m \times n$ matrix. The matrix $M$ is homogeneous iff

every column (row) $\mathbf{f}$ of $M$ is associated with a coordinate $u_{\mathbf{f}}$ and corresponding monomial $x^{u_{\mathbf{f}}}$ ,
every non-zero element $M_{j k}$ may be expressed as the quotient of the monomials associated with column $k$ and row $j$ , respectively.

Any vector $\mathbf{f}$ endowed with a coordinate $u_{\mathbf{f}}$ that may be written as above is homogeneous, e.g. the columns of $M$ .

If the field $k$ is not $\mathbb{Z}_{2}$ , we insert an element of $k$ as a coefficient for each monomial in the matrix. Our approach is as follows. We will show that all boundary matrices $\partial_{i}$ may be written as homogeneous matrices initially, and the algorithms for computing persistence only produce homogeneous matrices and vectors. That is, we maintain homogeneity as an invariant throughout the computation. We begin with our first task.

Lemma 1. For a one-critical multifiltration, the matrix of $\partial_{i}: C_{i} \rightarrow C_{i-1}$ written in terms of the standard bases is homogeneous.

Proof. Recall that we may write the boundary operator $\partial_{i}: C_{i} \rightarrow C_{i-1}$ explicitly as a $m_{i-1} \times m_{i}$ matrix $M$ in terms of the standard bases for $C_{i}$ and $C_{i-1}$ , as shown in matrix (4) for $\partial_{1}$ . From Definition 5, the standard basis for $C_{i}$ is the set of $i$ -simplices in critical grades. In a one-critical multifiltration, each simplex $\sigma$ has a unique critical coordinate $u_{\sigma}$ by Definition 1. In turn, we may represent this coordinate by the monomial $x^{u_{\sigma}}$ . For instance, simplex $a$ in Figure 1 has critical grade $(1,1)$ and monomial $x^{(1,1)}=x_{1} x_{2}$ . We order these monomials using $>_{\text {lex }}$ and use this ordering to rewrite the matrix for $\partial_{i}$ . The matrix entry $M_{j k}$ relates $\sigma_{k}$ , the $k$ th basis element for $C_{i}$ to $\hat{\sigma}_{j}$ , the $j$ th basis element for $C_{i-1}$ . If $\hat{\sigma}_{j}$ is not a face of $\sigma_{k}$ , then $M_{j k}=0$ . Otherwise, $\hat{\sigma}_{j}$ is a face of $\sigma_{k}$ . Since a face must precede a co-face in a multifiltration, we have $u_{\hat{\sigma}_{j}} \leq u_{\sigma_{k}}$ , so $x^{u_{\hat{\sigma}_{j}}}$ divides $x^{u_{\sigma_{k}}}$ and $M_{j k}=x^{u_{\sigma_{k}}} / x^{u_{\hat{\sigma}_{j}}}=x^{u_{\sigma_{k}}-u_{\hat{\sigma}_{j}}}$ . That is, the matrix is homogeneous.

Corollary 1. For a one-critical multifiltration, the boundary matrix $\partial_{i}$ in terms of the standard bases has monomial entries.

Proof. The result is immediate from the proof of the previous lemma. The matrix entry is either 0 , a monomial, or $x^{u\left(\sigma_{k}\right)-u\left(\hat{\sigma}_{j}\right)}$ , a monomial.

Example 7. We show the homogeneous matrix for $\partial_{1}$ below, where we augment the matrix with the simplices and their associated monomials. For example, $\hat{\sigma}_{1}=a$ is a face of $\sigma_{1}=a b$ , so $M_{11}=x_{1} x_{2}^{2} / x_{1} x_{2}=x_{2}$ . Again, we assume we are computing over $\mathbb{Z}_{2}$ :

\left[\begin{array}{ccccc|ccc} a b & b c & c d & d e & e f & a f & b f & c e \\ & x_{1} x_{2}^{2} & x_{1}^{2} x_{2}^{2} & x_{1} & x_{1} & x_{1}^{2} & x_{1} x_{2}^{2} & x_{2}^{2} & x_{2} \\ \hline a & x_{1} x_{2} & x_{2} & 0 & 0 & 0 & 0 & x_{2} & 0 & 0 \\ d & x_{1} & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ b & 1 & x_{1} x_{2}^{2} & x_{1}^{2} x_{2}^{2} & 0 & 0 & 0 & 0 & x_{2}^{2} & 0 \\ c & 1 & 0 & x_{1}^{2} x_{2}^{2} & x_{1} & 0 & 0 & 0 & 0 & x_{2} \\ e & 1 & 0 & 0 & 0 & x_{1} & x_{1}^{2} & 0 & 0 & x_{2} \\ f & 1 & 0 & 0 & 0 & 0 & x_{1}^{2} & x_{1} x_{2}^{2} & x_{2}^{2} & 0 \end{array}\right]

We next focus on the second task, showing that given a homogeneous matrix as input, the algorithms produce homogeneous vectors and matrices. Let $F$ be an $m \times n$ homogeneous matrix. Let $\left\{\mathbf{e}_{1}, \ldots, \mathbf{e}_{\mathbf{m}}\right\}$ and $\left\{\hat{\mathbf{e}}_{\mathbf{1}}, \ldots, \hat{\mathbf{e}}_{\mathbf{n}}\right\}$ be the standard bases for $R^{m}$ and $R^{n}$ , respectively. A homogeneous matrix associates a coordinate and monomial to the row and column basis elements. For example, since $x_{1}$ is the monomial for row 2 of matrix (9), we have $u_{\mathbf{e}_{2}}=(1,0)$ and $x^{u_{\mathbf{e}_{2}}}=x_{1}$ . Each column $\mathbf{f}$ in $F$ is homogeneous and may be written in terms of rows:

\mathbf{f}=\sum_{i=1}^{m} c_{i} \frac{x^{u_{\mathbf{f}}}}{x^{u_{\mathbf{e}_{1}}}} \mathbf{e}_{\mathbf{i}}

where $c_{i} \in k$ and we allow $c_{i}=0$ when a row is not used. For instance, column $\mathbf{g}$ representing the edge $a b$ in the bifiltration shown in Figure 1 may be written as:

\begin{aligned} \mathbf{g} & =x_{2} \mathbf{e}_{1}+x_{2} x_{2}^{2} \mathbf{e}_{3} \\ & =\frac{x_{2} x_{3}^{2}}{x_{1} x_{2}} \mathbf{e}_{1}+\frac{x_{2} x_{3}^{2}}{1} \mathbf{e}_{3} \\ & =\frac{x^{u_{\mathbf{g}}}}{x^{u_{\mathbf{e}_{1}}}} \mathbf{e}_{1}+\frac{x^{u_{\mathbf{g}}}}{x^{u_{\mathbf{e}_{3}}}} \mathbf{e}_{3}=\sum_{i \in\{1,3\}} \frac{x^{u_{\mathbf{g}}}}{x^{u_{\mathbf{e}_{1}}}} \mathbf{e}_{\mathbf{i}} \end{aligned}

Consider the Buchberger algorithm in Figure 3. The algorithm repeatedly computes $S$ -polynomials of homogeneous vectors on line 4.

Lemma 2. The $S$ -polynomial $S(\mathbf{f}, \mathbf{g})$ of homogeneous vectors $\mathbf{f}$ and $\mathbf{g}$ is homogeneous.

Proof. A zero $S$ -polynomial is trivially homogeneous. A non-zero $S$ -polynomial $S(\mathbf{f}, \mathbf{g})$ implies that $\mathbf{h}$ in Equation (8) is non-zero. By the definition of LCM in Equation (6), $\mathbf{h}$ being non-zero implies that the leading monomials of $\mathbf{f}$ and $\mathbf{g}$ contain the same basis

element $\mathbf{e}_{\mathbf{j}}$ . We have:

ParseError: KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: … \end{aligned}

Let $x^{\ell}=\operatorname{LCM}\left(x^{u_{\mathbf{f}}}, x^{u_{\mathbf{g}}}\right)=x^{\operatorname{LCM}\left(u_{\mathbf{f}}, u_{\mathbf{g}}\right)}$ , giving us $\mathbf{h}=\frac{x^{\ell}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}}$ . We now have

\begin{aligned} \frac{\mathbf{h}}{\operatorname{LT}(\mathbf{f})} & =\frac{\frac{x^{\ell}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}}}{c_{\mathbf{f}} \frac{x^{u_{\mathbf{f}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}}} \\ & =\frac{x^{\ell}}{c_{\mathbf{f}} x^{u_{\mathbf{f}}}} \end{aligned}

where $c_{\mathbf{f}} \neq 0$ is the field constant in the leading term of $\mathbf{f}$ . Similarly, we get

\frac{\mathbf{h}}{\operatorname{LT}(g)}=\frac{x^{\ell}}{c_{\mathbf{g}} x^{u_{\mathbf{g}}}}, \quad c_{\mathbf{g}} \neq 0

Putting it together, we have

ParseError: KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: … \end{aligned}

where $d_{i}=c_{i} / c_{\mathbf{f}}-c_{\mathrm{i}}^{\prime} / c_{\mathbf{g}}$ . Comparing with Equation (10), we see that $S(\mathbf{f}, \mathbf{g})$ is homogeneous with $u_{S(\mathbf{f}, \mathbf{g})}=\ell$ .

Having computed the $S$ -polynomial, Buchberger next divides it by the current homogeneous basis $G$ on line 4 using a call to the Divide algorithm in Figure 2.

Lemma 3. Divide $\left(\mathbf{f},\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right)\right)$ returns a homogeneous remainder vector $\mathbf{r}$ for homogeneous vectors $\mathbf{f}, \mathbf{f}_{\mathbf{i}} \in R^{m}$ .

Proof. On line 1, $\mathbf{r}$ and $\mathbf{p}$ are initialized to be $\mathbf{0}$ and $\mathbf{f}$ , respectively, and are both trivially homogeneous. We will show that each iteration of the while loop starting on line 4 maintains the homogeneity of these two vectors. On line 5, since both $\mathbf{f}_{\mathbf{i}}$ and $\mathbf{p}$ are homogeneous, we have

\begin{aligned} & \mathbf{f}_{\mathbf{i}}=\sum_{j=1}^{m} c_{i j} \frac{x^{u_{\mathbf{f}_{\mathbf{i}}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}} \\ & \mathbf{p}=\sum_{j=1}^{m} d_{j} \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}} \end{aligned}

Since $\operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)$ divides $\operatorname{LT}(\mathbf{p})$ , the terms must share basis element $\mathbf{e}_{\mathbf{k}}$ and we have

\begin{aligned} \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right) & =c_{i k} \frac{x^{u_{\mathbf{f}_{\mathbf{i}}}}}{x^{u_{\mathbf{e}_{\mathbf{k}}}} \mathbf{e}_{\mathbf{k}}} \\ \operatorname{LT}(\mathbf{p}) & =d_{k} \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{e}_{\mathbf{k}}}} \mathbf{e}_{\mathbf{k}}} \\ \operatorname{LT}(\mathbf{p}) / \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right) & =\frac{d_{k}}{c_{i k}} \cdot \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{f}_{\mathbf{i}}}}} \end{aligned}

On line $7, \mathbf{p}$ is assigned to

\begin{aligned} \mathbf{p}-\left(\operatorname{LT}(\mathbf{p}) / \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\right) \mathbf{f}_{\mathbf{i}} & =\sum_{j=1}^{m} d_{j} \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}}-\left(\frac{d_{k}}{c_{i k}} \cdot \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{f}_{\mathbf{i}}}} \mathbf{e}_{\mathbf{j}}}\right) \sum_{j=1}^{m} c_{i j} \frac{x^{u_{\mathbf{f}_{\mathbf{i}}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}} \\ & =\sum_{j=1}^{m}\left(d_{j}-\frac{d_{k} \cdot c_{i j}}{c_{i k}}\right) \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}} \\ & =\sum_{j=1}^{m} d_{j}^{\prime} \frac{x^{u_{\mathbf{p}}}}{x^{u_{\mathbf{e}_{\mathbf{j}}}} \mathbf{e}_{\mathbf{j}}} \end{aligned}

where $d_{j}^{\prime}=d_{j}-d_{k} \cdot c_{i j} / c_{i k}$ and $d_{k}^{\prime}=0$ , so the subtraction eliminates the $k$ th term. The final sum means that $\mathbf{p}$ is a new homogeneous polynomial with the same coordinate $u_{\mathbf{p}}$ as before. Similarly, LT $(p)$ is added to $\mathbf{r}$ on line 8 and subtracted from $\mathbf{p}$ on line 9 , and neither action changes the homogeneity of either vector. Both remain homogeneous with coordinate $u_{\mathbf{p}}$ .

The lemmas combine to give us the desired result.
Theorem 1 (homogeneous Gröbner). Given a homogeneous basis, the BuchBERGER algorithm computes a homogeneous Gröbner basis.

Proof. Initially, the algorithm sets $G$ to be the input basis $F$ , which is homogeneous. On line 4, it computes the $S$ -polynomial of homogeneous vectors $\mathbf{f}, \mathbf{g} \in G$ . By Lemma 2, the $S$ -polynomial is homogeneous. It then divides the $S$ -polynomial by $G$ . Since the input is homogeneous, Divide produces a homogeneous remainder $\mathbf{r}$ by Lemma 3. Since only homogeneous vectors are added to $G$ on line $6, G$ remains homogeneous.

We may extend this result easily to the reduced Gröbner basis. Using similar arguments, we may show the following result, whose proof we omit here.

Theorem 2 (homogenous syzygy). For a homogeneous matrix, all matrices encountered in the computation of the syzygy module are homogeneous.

5.2 Data Structures and Optimizations

We have shown that the structure inherent in a multifiltration allows us to compute using homogeneous vectors and matrices whose entries are monomials only. We next explore the consequences of this restriction on both the data structures and complexity of the algorithms.

By Definition (6), an $m \times n$ homogeneous matrix naturally associates monomials to the standard bases for $R^{m}$ and $R^{n}$ . Moreover, every non-zero entry of the matrix is a quotient of these monomials as the matrix is homogeneous. Therefore, we do not need to store the matrix entries, but simply the field elements of the matrix along with the monomials for the bases. We may modify two standard data structures to represent the matrix.

linked list: Each column stores its monomial as well as a linked-list of its non-zero entries in sorted order. The non-zero entries are represented by the row index and the field element. The matrix is simply a list of these columns in sorted order. Figure 4 displays matrix (9) in this data structure.
matrix: Each column stores its monomial as well as the column of field coefficients. If we are computing over a finite field, we may pack bits for space efficiency.

The linked-list representation is appropriate for sparse matrices as it is space-efficient at the price of linear access time. This is essentially the representation used for computing in the one-dimensional setting [23]. In contrast, the matrix representation is appropriate for dense matrices as it provides constant access time at the cost of storing all zero entries. The multidimensional setting provides us with denser matrices, as we shall see, so the matrix representation becomes a viable structure.

In addition, the matrix representation is optimally suited to computing over the field $\mathbb{Z}_{2}$ , the field often commonly employed in topological data analysis. The matrix entries each

Figure 4: The linked list representation of the boundary matrix $\partial_{1}$ of Equation (4), for the bifiltration shown in Figure 1, in column sorted order. Note that the columns in Equation (4) are not ordered while they are sorted correctly here.

take one bit and the column entries may be packed into machine words. Moreover, the only operation required by the algorithms is symmetric difference which may be implemented as a binary XOR operation provided by the chip. This approach gives us bit-level parallelism for free: On a 64-bit machine, we perform symmetric difference 64 times faster than on the list. The combination of these techniques allow the matrix structure to perform better than the linked-list representation in practice.

We may also exploit homogeneity to speed up the computation of new vectors and their insertion into the basis. We demonstrate this briefly using the Buchberger algorithm. We order the columns of input matrix $G$ using the POT rule for vectors as introduced in Section 4. Suppose we have $\mathbf{f}, \mathbf{g} \in G$ with $\mathbf{f}>\mathbf{g}$ . If $S(\mathbf{f}, \mathbf{g}) \neq 0$ , LT $(\mathbf{f})$ and $\operatorname{LT}(\mathbf{g})$ contain the same basis element, which the $S$ -polynomial eliminates. So, we have $S(\mathbf{f}, \mathbf{g})<\mathbf{g}<\mathbf{f}$ . This implies that when dividing $S(\mathbf{f}, \mathbf{g})$ by the vectors in $G$ , we need only consider vectors that are smaller than $\mathbf{g}$ . Since the vectors are in sorted order, we consider each in turn until we can no longer divide. By the POT rule, we may insert the new remainder column here into the basis $G$ . This gives us a constant time insertion operation for maintaining the ordering, as well as faster computation of the Gröbner basis.

5.3 Complexity

In this section, we give simple polynomial bounds on our multigraded algorithms. These bounds imply that we may compute multidimensional persistence in polynomial time.

Lemma 4. Let $F$ be an $m \times n$ homogeneous matrix of monomials. The Gröbner basis $G$ contains $O\left(n^{2} m\right)$ vectors in the worst case. We may compute $G$ using Buchberger in $O\left(n^{4} m^{3}\right)$ worst-case time.

Proof. In the worst case, $F$ contains $n m$ unique monomials. Each column $\mathbf{f} \in F$ may have any of the $n m$ monomials as its monomial when included in the Gröbner basis $G$ . Therefore, the total number of columns in $G$ is $O\left(n^{2} m\right)$ . In computing the Gröbner basis, we compare all columns pairwise, so the total number of comparisons is $O\left(n^{4} m^{2}\right)$ . Dividing the $S$ -polynomial takes $O(m)$ time. Therefore, the worst-case running time is $O\left(n^{4} m^{3}\right)$ .

In practice, the number of unique monomials in the matrix is lower than the worst case. In computing persistence, for example, we may control the number of unique monomials by ignoring close pairs of gradings. The following lemma bounds the basis size and running time in this case.

Lemma 5. Let $F$ be an $m \times n$ homogeneous matrix with $h$ of unique monomials. The Gröbner basis $G$ contains $O(h n)$ vectors and may be computed in time $O\left(n^{3} h^{2}\right)$ .

The proof is identical to the previous lemma.
Lemma 6. Let $F$ be an $m \times n$ homogeneous matrix of monomials and $G$ be the Gröbner basis of $F$ . The syzygy module $S$ for $G$ may be computed using Schreyer’s algorithm in $O\left(n^{4} m^{2}\right)$ worst-case time.

Proof. In computing the syzygy Module, we compare all columns of $G$ pairwise, so the total number of comparisons is $O\left(n^{4} m^{2}\right)$ . Dividing the $S$ -polynomial takes $O(m)$ time. Therefore, the worst-case running time is $O\left(n^{4} m^{3}\right)$ .

Theorem 3. Multidimensional persistence may be computed in polynomial time.

Proof. Multidimensional persistence is represented by the Gröbner bases and the syzygy moduli of all the homogeneous boundary matrices $\partial_{i}$ for a given multifiltration. In the previous lemmas, we have shown that both the Gröbner basis and the syzygy module can be computed in polynomial time. Therefore, one can compute multidimensional persistence in polynomial time.

In other words, our optimizations in this section turn the exponential-algorithms from the last section into polynomial-time algorithms.

6 Computing the Rank Invariant

Having described our algorithms, in this section we discuss the computation of the rank invariant. Recall that our solution is complete, but not an invariant. In contrast, the rank invariant is incomplete, but is an invariant and may be used, for instance, as a descriptor in order to compare and match multifiltrations. We begin with direct computation that computes the invariant for each pair independently, giving us an intractable algorithm. We then discuss alternate approaches using posets and vineyards. We end this section by giving a polynomial time algorithm for reading off the rank invariant from the solution computed using our multigraded algorithms.

6.1 Direct Computation

We assume we are given a $n$ -dimensional multifiltration of a cell complex $K$ with $m$ cells. Recall the rank invariant, Equation (3), from Section 2. Observe that any pair $u \leq v \in \mathbb{N}^{n}$ defines a one-dimensional filtration with a new parameter $t$ , where we map $u$ to $t=0, v$ to $t=1$ , obtaining a two-level filtration. We then use the persistence algorithm to obtain barcodes [23]. The invariant $\rho_{i}(u, v)$ may be read off from the $\beta_{i}$ -barcode: It is the number of intervals that contain both 0 and 1 . The persistence algorithm is $\Theta\left(m^{3}\right)$ in the worstcase, so we have a cubic time algorithm for computing the rank invariant for a single pair of coordinates.

To fully compute the rank invariant, we need to consider all distinct pairs of complexes in a multifiltration. It may seem, at first, that we need to only consider critical coordinates, such as $(1,1)$ and $(2,0)$ in the bifiltration in Figure 1. However, note that the complex at coordinate $(2,1)$ is also distinct even though no simplex is introduced at that coordinate. Inspired by this example, we may devise the following worst-case construction: We place $m / n$ cells on each of the $n$ axis to generate $(m / n)^{n}=\Theta\left(m^{n}\right)$ distinct complexes.

Simple calculation shows that there are $\Theta\left(m^{2 n}\right)$ comparable coordinates with distinct complexes. For each pair, we may compute the rank invariant using our method above for a total of $O\left(m^{2 n+3}\right)$ running time. To store the rank invariant, we also require $\Theta\left(m^{2 n}\right)$ space.

6.2 Alternate Approaches

Our naive algorithm above computes the invariant for each pair of coordinates independently. In practice, we may read off multiple ranks from the same barcode for faster calculation. Any monotonically increasing path from the origin to the coordinate of the full complex is a one-dimensional filtration, such as the following path in Figure 1.

(0,0) \rightarrow(1,1) \rightarrow(2,2) \rightarrow(3,2)

Having computed persistence, we may read off the ranks for all six comparable pairs within this path. We may formalize this approach using language from the theory of partially ordered sets. The path described above is a maximal chain in the multifiltration poset: a maximal set of pairwise comparable complexes. We require a set of maximal chains such that each pair of comparable elements (here, complexes) are in at least one chain. Each maximal chain requires a single one-dimensional persistence computation. We now require an algorithm that computes the smallest set of such chains. We know of no algorithm for this computation. Furthermore, it is not clear whether this approach would be faster than the direct approach in the worst case.

Another approach is to use vineyards as introduced in [4]. The vineyards method applies to the specific situation of a function of the form $h(t, x)=(t, t f(x)+(1-t) g(x))$ , where $x$ is a point in a manifold or space. One then considers the two variable persistence based on the function $h$ . The rank invariants are then computed based for pairs of points using single variable method. The method does not permit the computation of the full 2-dimensional persistence.

6.3 Multigraded Approach

Full computation of the rank invariant is hampered by the exponential storage requirement. Instead, we may first compute multidimensional persistence using our multigraded algorithms in Section 5. We then simply read off the rank invariant using the Rank algorithm, as shown in Figure 5. We describe the algorithm in the proof of the following theorem.

Theorem 4. $\operatorname{Rank}(Z, B, u, v)$ computes the rank invariant $\rho_{i}(u, v)$ , if $Z$ is the syzygies of $\partial_{i}$ and $B$ is the Gröbner basis for $\partial_{i+1}$ .

Proof. The algorithm uses two simple helper procedures. The procedure Promote takes a matrix $M$ and coordinate $u$ as input. It then finds the columns $\mathbf{f} \in M$ whose associated coordinate $u_{\mathbf{f}}$ precedes $u$ , and promotes them to coordinate $u$ by a simple shift. The procedure Quotient finds the quotient of the input matrices by division: If the remainder $\mathbf{r}$ is non-zero, it adds $\mathbf{r}$ to the quotient $Q$ , also adding it to $B$ so it only find unique cycles.

RANK $(Z, B, u, v)$
$1 Z_{u} \leftarrow \operatorname{PromOte}(Z, u)$
$2 B_{u} \leftarrow \operatorname{PromOte}(B, u)$
$3 H_{u} \leftarrow \operatorname{Quotient}\left(Z_{u}, B_{u}\right)$
$4 Z_{u v} \leftarrow \operatorname{PromOte}\left(H_{u}, v\right)$
$5 B_{v} \leftarrow \operatorname{PromOte}(B, v)$
$6 \quad H_{u v} \leftarrow \operatorname{Quotient}\left(Z_{u v}, B_{v}\right)$
7 return $\left|H_{u v}\right|$
QUotient $(Z, B)$
$1 Q \leftarrow \emptyset$
2 foreach $\mathbf{f} \in Z$
$3 \quad$ do $\left(\left(q_{1}, \ldots, q_{t}\right), \mathbf{r}\right) \leftarrow \operatorname{Divide}(\mathbf{f}, B)$
$4 \quad$ if $\mathbf{r} \neq \mathbf{0}$
$5 \quad$ then $Q \leftarrow Q \cup\{\mathbf{r}\}$
$6 \quad B \leftarrow B \cup\{\mathbf{r}\}$
7 return $Q$
$\operatorname{PromOte}(M, u)$
1 return $\left\{\frac{u}{u_{\mathbf{f}}} \mathbf{f} \mid \mathbf{f} \in M, u_{\mathbf{f}} \leq u\right\}$

Figure 5: The algorithm RANK computes the rank invariant $\rho_{i}(u, v)$ if $Z$ is the set of syzygies of $\partial_{i}$ and $B$ is the Gröbner basis for $\partial_{i+1}$ . The procedure Quotient finds the quotient of $Z$ by $B$ using the Divide algorithm. The procedure Promote promotes cycles that exist before time $u$ to that time.

Now assume the input are as in the statement of the theorem. By the definition of the rank invariant, we need to count homology cycles that exist at $u$ and persist until $v$ . The RANK algorithm implements this. We compute homology $H_{u}$ at $u$ on the first three lines. On line 4, we promote these cycles to coordinate $v$ . We then quotient with boundaries $B_{v}$ at $v$ to find homology cycles $H_{u v}$ that exist at $u$ and persist until $v$ . The cardinality of this set is the rank invariant by definition.

7 Experiments

In this section, we describe our implementation as well as initial quantitative experiments that show the performance of our algorithms in practice. We end with a last look at our example bifiltration in Figure 1: computing its rank invariant using our multigraded algorithms.

7.1 Implementation

We initially used software packages CoGaA[3] and Macaulay [11], which contain standard implementations of the algorithms. These packages were immensely helpful during our software development as they allowed for quick and convenient testing of the basic algorithms. In practice, there are two problems in using these packages for large datasets. First, these packages are slow since they are general and not customized for homogeneous matrices. Second, these packages produce verbose output that must be parsed for further computation.

Our experience led us to implement our algorithms for computation over $\mathbb{Z}_{2}$ , optimizing the code for this field. Our implementation is in Java and and was tested under Mac OS X 10.5.6 running on a 2.3 Ghz Intel Quad-Core Xeon MacPro computer with 6 GB RAM.

7.2 Data

We generate $n \times n$ , random, bifiltered, homogeneous matrices, to simulate the boundary matrix $\partial_{k-1}$ of a random bifiltered complex with $n$ simplices in dimensions $k-1$ and $k-2$ . We use the following procedure:

Randomly generate $n$ monomials $\left\{m_{1}, \ldots, m_{n}\right\}$ corresponding to the monomials associated with the basis elements of the rows.
For each column $\mathbf{f}$ generate $k$ integers indexing the non-zero rows.
Set the column monomial to be $\operatorname{LCM}\left(m_{j}\right)$ , where $\left\{m_{j}\right\}_{j}$ are the monomials of rows with non-zero

Each column in this matrix has $k$ non-zero elements and is homogeneous by construction. We also generate random matrices but limit the number of unique monomials in the matrix to be $O\left(h^{2}\right)$ for different values of $h$ . The basic idea behind these tests is that the range of the filtrations in a cell complex can typically be divided into smaller discrete intervals. For generation, we replace the first step of the procedure above with the following two steps:
0 . Randomly generate $h$ unique monomials $\left\{l_{1}, \ldots, l_{h}\right\}$ .

Generate $n$ monomials $\left\{m_{1}, \ldots, m_{n}\right\}$ corresponding to the monomials associated with the basis elements of the rows such that $m_{i} \in\left\{l_{1}, \ldots, l_{h}\right\}$ .

After executing step 2 and 3 above, our resulting matrix has homogeneous columns with $k$ non-zero elements and at most $h^{2}$ unique monomials.

7.3 Size & Timings

According to Lemma 4, the number of columns in the Gröbner basis for a random matrix may grow $O\left(n^{3}\right)$ as we have $n=m$ here. Figure 6(a) shows that the growth of the Gröbner

Figure 6: Random $n \times n$ matrices with $k$ non-zero entries in each column. (a) The number of columns $|G|$ in Gröbner basis $G$ (b) Running time in seconds for computing multidimensional persistence using list (l) or matrix (m) data structures.
basis is less in practice, about linear for $k=2$ and quadratic for $k=4$ , and increases as the matrix becomes denser. Similarly, the theoretical running time for this matrix is $O\left(n^{7}\right)$ . Figure 6(b) demonstrates that the actual running time matches this bound quite well: about $O\left(n^{6}\right)$ in these tests. The matrix method, however, is considerably more efficient, about 20 times faster for our largest test here.

We next limit the number of unique monomials in the input matrices. Figures 7 and 8 give the size and running time for matrices with at most $h^{2}$ monomials for $h=20$ and $h=100$ , respectively. We see that the growth of the basis is about linear for different values of $k$ and $h$ , and the running time matches the theoretical $O\left(n^{3}\right)$ bound in Lemma 5 quite well.

7.4 Rank Invariant

We end this paper by revisiting our motivating bifiltration from Figure 1 and computing its multidimensional persistence and rank invariants using our algorithms. Using the natural ordering on the simplices, one can write the boundary matrices $M_{1}$ and $M_{2}$ for $\partial_{1}$ and $\partial_{2}$ , respectively, as:

M_{1}=\left[\begin{array}{cccccccc} 0 & x_{1} x_{2}^{2} & 0 & 0 & 0 & x_{2}^{2} & 0 & x_{1}^{2} x_{2}^{2} \\ 0 & 0 & 0 & x_{1} & 0 & 0 & x_{2} & x_{1}^{2} x_{2}^{2} \\ 0 & 0 & x_{1} & 0 & x_{1}^{2} & 0 & x_{2} & 0 \\ x_{1} x_{2}^{2} & 0 & 0 & 1 & x_{1}^{2} & x_{2}^{2} & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ x_{2} & x_{2} & 0 & 0 & 0 & 0 & 0 & 0 \end{array}\right]

Figure 7: Random $n \times n$ matrices with $k$ non-zero entries in each column and a total of $h^{2}$ monomials for $h=20$ . (a) The number of columns $|G|$ in Gröbner basis $G$ . (b) Running time in seconds for computing multidimensional persistence using list (l) or matrix (m) data structures.

M_{2}=\left[\begin{array}{cc} 0 & x_{1}^{3} \\ x_{1}^{2} & 0 \\ 0 & x_{1}^{2} x_{2} \\ 0 & x_{1}^{2} x_{2} \\ x_{1} & 0 \\ x_{1} & 0 \\ 0 & 0 \\ 0 & 0 \end{array}\right]

The Gröbner basis $\left(G_{1}\right)$ and the set of syzygies $\left(Z_{1}\right)$ for $\partial_{1}$ are:

G_{1}=\left[\begin{array}{cccccccc} 0 & x_{1} x_{2}^{2} & 0 & 0 & 0 & x_{2}^{2} & 0 & 0 & x_{1}^{2} x_{2}^{2} \\ 0 & 0 & 0 & x_{1} & 0 & 0 & x_{1} & x_{2} & x_{1}^{2} x_{2}^{2} \\ 0 & 0 & x_{1} & 0 & x_{1}^{2} & 0 & x_{1} & x_{2} & 0 \\ x_{1} x_{2}^{2} & 0 & 0 & 0 & x_{1}^{2} & x_{2}^{2} & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ x_{2} & x_{2} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{array}\right]

Z_{1}=\left[\begin{array}{ccc} 0 & 0 & x_{1} \\ x_{1}^{2} & x_{1} & 0 \\ x_{1} x_{2}^{2} & 0 & x_{2} \\ x_{1} x_{2}^{2} & 0 & x_{2} \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ x_{2}^{2} & 0 & 0 \\ 1 & 0 & 0 \end{array}\right]

Figure 8: Random $n \times n$ matrices with $k$ non-zero entries in each column and a total of $h^{2}$ monomials for $h=100$ . (a) The number of columns $|G|$ in Gröbner basis $G$ . (b) Running time in seconds for computing multidimensional persistence using list (l) or matrix (m) data structures.

Note that each row in the syzygy matrix corresponds to an edge in the appropriate order. Finally, the Gröbner basis for $\partial_{2}$ is $G_{2}=M_{2}$ , as the only possible $S$ -polynomial is identically 0 .

Using $G_{1}, Z_{1}$ and $G_{2}$ , one can read off the rank invariants for various $u$ and $v$ using the Rank algorithm in Section 6.3. A few interesting rank invariants for this example are:

$u$	$v$	$\rho_{0}(u, v)$	$\rho_{1}(u, v)$
$[0,0]$	$[1,1]$	3	0
$[1,0]$	$[2,1]$	2	0
$[1,1]$	$[1,2]$	2	1
$[2,2]$	$[3,2]$	1	1

8 Conclusion

In this paper, we fully examine the computation of multidimensional persistence, from theory to algorithms, implementation, and experiments. We develop polynomial time algorithms by recasting the problem into computational commutative algebra. Although the recast problem is EXPSPACE-complete, we exploit the multigraded setting to develop practical algorithms. The Gröbner bases we construct allow us to reconstruct the entire multidimensional persistence vector space, providing us a convenient way to compute the rank invariant. We implement all algorithms in the paper and show that the calculations are feasible due to the sparsity of the boundary matrices.

For additional speedup, we plan to parallelize the computation by batching and threading the XOR operations. We also plan to apply our algorithms toward studying scientific data. For instance, for zero-dimensional homology, multidimensional persistence

corresponds to clustering multiparameterized data, This fresh perspective, as well as a new arsenal of computational tools, allows us to attack an old and significant problem in data analysis.

References

[1] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian. On the local behavior of spaces of natural images. International Journal of Computer Vision, 76(1):1-12, 2008.
[2] G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. Discrete $\mathcal{E}$ Computational Geometry, 42(1):71-93, 2009.
[3] CoCoATeam. CoCoA: a system for doing Computations in Commutative Algebra. http://cocoa.dima.unige.it.
[4] D. Cohen-Steiner, H. Edelsbrunner, and D. Morozov. Vines and vineyards by updating persistence in linear time. In Proc. ACM Symposium on Computational Geometry, pages $119-126,2006$ .
[5] A. Collins, A. Zomorodian, G. Carlsson, and L. Guibas. A barcode shape descriptor for curve point cloud data. Computers and Graphics, 28:881-894, 2004.
[6] D. A. Cox, J. Little, and D. O’Shea. Using algebraic geometry, volume 185 of Graduate Texts in Mathematics. Springer, New York, second edition, 2005.
[7] V. de Silva, R. Ghrist, and A. Muhammad. Blind swarms for coverage in 2-D. In Proceedings of Robotics: Science and Systems, 2005. http://www.roboticsproceedings.org/rss01/.
[8] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Discrete $\mathcal{E}$ Computational Geometry, 28:511-533, 2002.
[9] S. Eilenberg and J. A. Zilber. Semi-simplicial complexes and singular homology. Annals of Mathematics, 51(3):499-513, 1950.
[10] P. Frosini and M. Mulazzani. Size homotopy groups for computation of natural size distances. Bull. Belg. Math. Soc. Simon Stevin, 6(3):455-464, 1999.
[11] D. R. Grayson and M. E. Stillman. Macaulay 2, a software system for research in algebraic geometry. http://www.math.uiuc.edu/Macaulay2/.
[12] A. Gyulassy, V. Natarajan, V. Pascucci, P. T. Bremer, and B. Hamann. Topology-based simplification for feature extraction from 3D scalar fields. In Proc. IEEE Visualization, pages $275-280,2005$ .
[13] A. Hatcher. Algebraic Topology. Cambridge University Press, New York, NY, 2002. http://www.math.cornell.edu/ hatcher/AT/ATpage.html.

[14] T. Kaczynski, K. Mischaikow, and M. Mrozek. Computational Homology. SpringerVerlag, New York, NY, 2004.
[15] M. Levoy and P. Hanrahan. Light field rendering. In Proc. SIGGRAPH, pages 31-42, 1996.
[16] Y. Matsumoto. An Introduction to Morse Theory, volume 208 of Iwanami Series in Modern Mathematics. American Mathematical Society, Providence, RI, 2002.
[17] E. W. Mayr. Some complexity results for polynomial ideals. Journal of Complexity, 13(3):303-325, 1997.
[18] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L. Ringach. Topological analysis of population activity in visual cortex. Journal of Vision, 8(8):1-18, 6 2008.
[19] G. Turk and M. Levoy. Zippered polygon meshes from range images. In Proc. SIGGRAPH, pages 311-318, 1994.
[20] J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, Cambridge, UK, second edition, 2003.
[21] F. Zhao and L. J. Guibas. Wireless Sensor Networks: An Information Processing Approach. Morgan-Kaufmann, San Francisco, CA, 2004.
[22] A. Zomorodian. Computational topology. In M. Atallah and M. Blanton, editors, Algorithms and Theory of Computation Handbook, volume 2, chapter 3. Chapman & Hall/CRC Press, Boca Raton, FL, second edition, 2010.
[23] A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete $\mathcal{E}$ Computational Geometry, 33(2):249-274, 2005.