Computing multidimensional persistence
2010, Journal of Computational Geometry
Sign up for access to the world's latest research
Abstract
The theory of multidimensional persistence captures the topology of a multifiltration-a multiparameter family of increasing spaces. Multifiltrations arise naturally in the topological analysis of scientific data. In this paper, we give a polynomial time algorithm for computing multidimensional persistence. We recast this computation as a problem within computational commutative algebra, and utilize algorithms from this area to solve it. While the resulting problem is Expspace-complete and the standard algorithms take doubly-exponential time, we exploit the structure inherent within multifiltrations to yield practical algorithms. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.
Related papers
ArXiv, 2015
An algorithm is presented that constructs an acyclic partial matching on the cells of a given simplicial complex from a vector-valued function defined on the vertices and extended to each simplex by taking the least common upper bound of the values on its vertices. The resulting acyclic partial matching may be used to construct a reduced filtered c omplex with the same multidimensional persistent homology as the original simplicial complex filtered by the sublevel sets of the function. Numeri cal tests show that in practical cases the rate of reduction in the number of cells achieved by the algorithm is substantial. This promises to be useful for the computation of multidimensional persistent homology of simplicial complexes filtered by sublevel sets of vector-valued functions.
Foundations of Computational Mathematics, 2016
In this paper we study multidimensional persistence modules via what we call tame functors and noise systems. A noise system leads to a pseudo-metric topology on the category of tame functors. We show how this pseudo-metric can be used to identify persistent features of compact multidimensional persistence modules. To count such features we introduce the feature counting invariant and prove that assigning this invariant to compact tame functors is a 1-Lipschitz operation. For 1-dimensional persistence, we explain how, by choosing an appropriate noise system, the feature counting invariant identifies the same persistent features as the classical barcode construction.
2011
Topological Persistence has proven to be a promising framework for dealing with problems concerning shape analysis and comparison. In this contexts, it was originally introduced by taking into account 1-dimensional properties of shapes, modeled by real-valued functions. More recently, Topological Persistence has been generalized to consider multidimensional properties of shapes, coded by vector-valued functions.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013
Multidimensional persistent modules do not admit a concise representation analogous to that provided by persistence diagrams for real-valued functions. However, there is no obstruction for multidimensional persistent Betti numbers to admit one. Therefore, it is reasonable to look for a generalization of persistence diagrams concerning those properties that are related only to persistent Betti numbers. In this paper, the persistence space of a vector-valued continuous function is introduced to generalize the concept of persistence diagram in this sense. Furthermore, it is presented a method to visualize topological features of a shape via persistence spaces. Finally, it is shown that this method is resistant to perturbations of the input data.
2017
Given a simplicial complex and a vector-valued function on its vertices, we present an algorithmic construction of an acyclic partial matching on the cells of the complex. This construction is used to build a reduced filtered complex with the same multidimensional persistent homology as of the original one filtered by the sublevel sets of the function. A number of numerical experiments show a substantial rate of reduction in the number of cells achieved by the algorithm.
ArXiv, 2019
Persistence has proved to be a valuable tool to analyze real world data robustly. Several approaches to persistence have been attempted over time, some topological in flavor, based on the vector space-valued homology functor, other combinatorial, based on arbitrary set-valued functors. To unify the study of topological and combinatorial persistence in a common categorical framework, we give axioms for a generalized rank function on objects in a target category, so that functors to that category induce persistence functions. We port the interleaving and bottleneck distances to this novel framework and generalize classical equalities and inequalities. Unlike sets and vector spaces, in many categories the rank of an object does not identify it up to isomorphism: to preserve information about the structure of persistence modules, we define colorable ranks, persistence diagrams and prove the equality between multicolored bottleneck distance and interleaving distance in semisimple Abelian...
2012
The extraction of significant structures in arbitrary high-dimensional data sets is a challenging task. Moreover, classifying data points as noise in order to reduce a data set bears special relevance for many application domains. Standard methods such as clustering serve to reduce problem complexity by providing the user with classes of similar entities. However, they usually do not highlight relations between different entities and require a stopping criterion, e.g. the number of clusters to be detected. In this paper, we present a visualization pipeline based on recent advancements in algebraic topology. More precisely, we employ methods from persistent homology that enable topological data analysis on high-dimensional data sets. Our pipeline inherently copes with noisy data and data sets of arbitrary dimensions. It extracts central structures of a data set in a hierarchical manner by using a persistence-based filtering algorithm that is theoretically well-founded. We furthermore introduce persistence rings, a novel visualization technique for a class of topological features-the persistence intervals-of large data sets. Persistence rings provide a unique topological signature of a data set, which helps in recognizing similarities. In addition, we provide interactive visualization techniques that assist the user in evaluating the parameter space of our method in order to extract relevant structures. We describe and evaluate our analysis pipeline by means of two very distinct classes of data sets: First, a class of synthetic data sets containing topological objects is employed to highlight the interaction capabilities of our method. Second, in order to affirm the utility of our technique, we analyse a class of high-dimensional real-world data sets arising from current research in cultural heritage.
The intrinsic connection between lattice theory and topology is fairly well established, For instance, the collection of open subsets of a topological subspace always forms a distributive lattice. Persistent homology has been one of the most prominent areas of research in computational topology in the past 20 years. In this paper we will introduce an alternative interpretation of persistence based on the study of the order structure of its correspondent lattice. Its algorithmic construction leads to two operations on homology groups which describe a diagram of spaces as a complete Heyting algebra, which is a generalization of a Boolean algebra. We investigate some of the properties of this lattice, the algorithmic implications of it, and some possible applications.
2019
Nowadays, data generation, representation and analysis occupy central roles in human society. Therefore, it is necessary to develop frameworks of analysis able of adapting to diverse data structures with minimal effort, much as guaranteeing robustness and stability. While topological persistence allows to swiftly study simplicial complexes paired with continuous functions, we propose a new theory of persistence that is easily generalizable to categories other than topological spaces and functors other than homology. Thus, in this framework, it is possible to study complex objects such as networks and quivers without the need of auxiliary topological constructions. We define persistence functions by directly considering relevant features (even discrete) of the objects of the category of interest, while maintaining the properties of topological persistence and persistent homology that are essential for a robust, stable and agile data analysis.
We introduce a mathematical framework for encoding dimensional information that preserves structural properties beyond numerical values. The framework addresses a specific limitation in persistent homology: the inability to distinguish between topological features of the same dimension but different geometric nature. We define dimension objects that encode both numerical dimension and structural invariants, provide a metric structure on the space of such objects, and prove that this leads to enhanced persistence diagrams with improved discriminatory power. We demonstrate the framework's utility through explicit computations on standard datasets.
COMPUTING MULTIDIMENSIONAL PERSISTENCE*
Gunnar Carlsson, † Gurjeet Singh, ‡ and Afra Zomorodian §
Abstract. The theory of multidimensional persistence captures the topology of a multifiltration - a multiparameter family of increasing spaces. Multifiltrations arise naturally in the topological analysis of scientific data. In this paper, we give a polynomial time algorithm for computing multidimensional persistence. We recast this computation as a problem within computational commutative algebra, and utilize algorithms from this area to solve it. While the resulting problem is EXPSPACE-complete and the standard algorithms take doubly-exponential time, we exploit the structure inherent within multifiltrations to yield practical algorithms. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.
1 Introduction
In this paper, we give a polynomial time algorithm for computing the persistent homology of a multifiltration. The computed solution is compact and complete, but not a topological invariant. Theoretically, this is the best one may hope for because compact complete invariants do not exist for multidimensional persistence [2]. We discuss computing an incomplete invariant, the rank invariant, and give an algorithm for reading off this invariant from the complete solution. We implement all algorithms in the paper and provide statistical experiments to demonstrate their feasibility.
1.1 Motivation
Intuitively, a multifiltration models a growing space that is parameterized along multiple dimensions. For example, the complex with coordinate (3,2) in Figure 1 is filtered along the horizontal and vertical dimensions, giving rise to a bifiltration. Multifiltrations arise naturally in topological analysis of scientific data. Often, such data is in the form of a finite set of noisy samples from some underlying topological space. Our goal is to robustly recover the lost connectivity of the underlying space. If the sampling is dense enough, we may approximate the space as a union of balls by placing ϵ-balls around each point. As we increase ϵ, we obtain a growing family of spaces, a 1-dimensional multifiltration,
*The authors were partially supported by the following grants: G. C. by NSF DMS-0354543; A. Z. by DARPA HR 0011-06-1-0038, ONR N 00014-08-1-0908, and NSF CCF-0845716; all by DARPA HR 0011-05-1-0007. A portion of the work was done while the second author was at Stanford University.
† Stanford University, gunnar@math.stanford.edu
‡ Ayasdi, Inc., gurjeet@ayasdi.com
§Dartmouth College, afra@cs.dartmouth.edu ↩︎
Figure 1: A bifiltration. The complex with labeled vertices is at coordinate (3,2). Simplices are highlighted and named at the critical coordinates that they appear.
also called a filtration. This approximation is the central idea behind many methods for computing the topology of a point set [22]. Often, however, the input point set is filtered via multiple functions. For instance, in analyzing the structure of natural images, we filter the data according to density [1]. We now have multiple dimensions along which our space is filtered. That is, we have a multifiltration.
We characterize a multifiltration through invariants. An invariant is a function that assigns identical objects to isomorphic structures. The trivial invariant assigns the same object to all structures, and is useless. The complete invariant assigns different objects to non-isomorphic structures, and is powerful. We want to obtain a discrete invariant: an invariant that yields a finite description and is not dependent on the underlying field of computation [2]. Therefore, in the ideal setting, we would like a complete discrete invariant for the structure of a multifiltration.
1.2 Prior Work
For one-dimensional filtrations, the theory of persistent homology provides a complete discrete invariant called a barcode, a multiset of intervals [23]. Each interval in the barcode corresponds to the lifetime of a single topological feature within the filtration. Since intrinsic features have long lives, while noise is short-lived, a quick examination of the intervals gives a robust estimation of the topology. The existence of a complete discrete invariant, as well as efficient algorithms and fast implementations have led to successful applications of persistent homology to a variety of problems, such as shape description [5], denoising volumetric density data [12], detecting holes in sensor networks [7], analyzing neural activity in the visual cortex [18], and analyzing the structure of natural images [1], to name a few.
For multifiltrations of dimension higher than one, the situation is much more complicated. The theory of multidimensional persistence shows that no complete discrete invariant exists. Instead, the authors propose an incomplete invariant, the rank invariant, which captures some persistent information. Unfortunately, this invariant is not compact, requiring
large storage, so its direct computation using the one-dimensional persistence algorithm is not feasible. A variant of the problem of multidimensional persistence appears in computer vision [10]. There is also a partial solution called vineyards [4]. A full solution, however, has not been attempted by any prior work.
1.3 Contributions
In this paper, we provide a complete solution to the problem of computing multidimensional persistence.
- We recast the computation to a problem within computational commutative algebra, allowing us to utilize algorithms from this area.
- We exploit the structure provided by a multifiltration to greatly simplify the algorithms.
- We show that the resulting algorithms are polynomial time, unlike their original counterparts, which are EXPSPACE-complete, requiring exponential space and time.
- We also provide algorithms to read off the rank invariant from our solution.
- We implement all algorithms and show that the multidimensional setting requires different data structures than the one-dimensional case for efficient computation. In particular, the change in approach allows for parallelization.
- We analyze the running time of our implementations with a suite of statistical tests with random multifiltrations.
As we shall see, our approach gives a specification of multidimensional persistence in terms of a set of polynomials. While this specification is complete, it is not an invariant, so our results are in line with the previous result that showed the non-existence of a complete invariant [2]. The lack of invariance means that we may not use our solution to compare multifiltrations directly. Instead, we give polynomial-time algorithms for reading off the rank invariant from our solution. As expected, the rank invariant is incomplete. Moreover, its direct computation requires exponential time and space.
We begin with background on multidimensional persistence in Section 2. In Section 3, we formalize the input to our algorithms and justify it. In Section 4, we reinterpret the problem of computing multidimensional persistence within computational commutative algebra. Having recast the problem, we use algorithms from this area to solve the problem. This gives us a computationally intractable solution. In Section 5, we simplify the algorithms by using the structure within multifiltrations. This simplification allows us to derive a polynomial time algorithm from the original doubly-exponential time algorithms. In Section 7, we describe our implementations of and furnish experiments to validate our work in practice.
2 Background
In this section, we review the theory of multidimensional persistence. We begin by formalizing multifiltrations. We then review a sequence of theories of homology: simplicial, persistent, and multidimensional. We end with a description of the rank invariant.
2.1 Multifiltrations
Let N⊆Z be the set of non-negative integers and R be the set of real numbers. For vectors in Nn or Rn, we say u≤v if ui≤vi for all 1≤i≤n, and define the ≥ relation similarly. The relations ≤,≥ form partial orders on Nn and Rn. A topological space X is multifiltered if we are given a family of subspaces {Xu}u, where u∈Nn, so that Xu⊆Xv whenever u≤v. We call the family of subspaces {Xu}u a multifiltration. A one-dimensional multifiltration is called a filtration.
If X is a cell complex, all subsets Xu must also be cell complexes, as shown for a bifiltered simplicial complex in Figure 1. A critical coordinate u for cell σ∈X is a minimal coordinate, with respect to the partial order ≤, such that σ∈Xu. In a multifiltration, any path with monotonically increasing coordinates is a filtration, such as any row or column in the figure. Multifiltrations constitute the input to our algorithms. We motivate their use as a model for scientific data as well as formalize them in the next section.
2.2 Homology
Given a topological space, homology is a topological invariant that is often used in practice as it is easily computable. Here, we describe simplicial homology briefly, referring the reader to Hatcher [13] as a resource. We assume our input is a simplicial complex K, such as the complexes in Figure 1. We note, however, that our results carry over to arbitrary cell complexes, such as simplicial sets [9], Δ-complexes [13], and cubical complexes [14].
The i th chain group Ci(K) of K is the free Abelian group on K 's set of oriented i simplices. An element c∈Ci(K) is an i-chain, c=∑jnj[σj],σj∈K with coefficients nj∈ Z. Given such a chain c, the boundary operator ∂i:Ci(K)→Ci−1(K) is a homomorphism defined linearly by its action on any simplex σ=[v0,v1,…,vi]∈c,
∂iσ=j∑(−1)j[v0,…,v^j,…,vi]
where v^j indicates that vj is deleted from the vertex sequence. The boundary operator connects the chain groups into a chain complex C∗ :
⋯→Ci+1(K)∂i+1Ci(K)∂iCi−1(K)→⋯
Using the boundary operator, we may define subgroups of Ci : the cycle group ker∂i and the the boundary group im∂i+1. Since ∂i∘∂i+1≡0, then im∂i+1⊆ker∂i⊆Ci(K). The i th homology group is
Hi(K)=ker∂i/im∂i+1
and the i th Betti number is βi(K)=rankHi(K). Over field coefficients k,Hi is a k-vector space of dimension βi.
2.3 Persistent Homology
Given a multifiltration {Xu}u, the homology of each subspace Xu over a field k is a vector space. For instance, the bifiltered complex in Figure 1 has zeroth homology vector spaces isomorphic to the commutative diagram
where the dimension of the vector space counts the number of components of the complex, and the maps between the homology vector spaces are induced by the inclusion maps relating the subspaces. Persistent homology captures information contained in the induced maps. There are two equivalent definitions that we use in this paper. The first definition was originally for filtrations only [8], but was later extended to multifiltrations [2]. The key idea is to relate the homologies of a pair of complexes. For each pair u,v∈Nn with u≤v, Xu⊆Xv by definition, so Xu↪Xv. This inclusion, in turn, induces a linear map ιi(u,v) at the i th homology level Hi(Xu)→Hi(Xv) that maps a homology class within Xu to the one that contains it within Xv. The i th persistent homology is imιi, the image of ιi for all pairs u≤v. This definition also enables the definition of an incomplete invariant. The i th rank invariant is
ρi(u,v)=rankιi(u,v)
for all pairs u≤v∈Nn, where ιi is defined above [2]. While this definition provides intuition, it is inexpedient for theoretical development. For most of our paper, we use a second definition of persistence that is grounded in algebraic topology, allowing us to utilize tools from commutative algebra for computation [23, 2].
2.4 Multidimensional Persistence
The key insight for the second definition below is that the persistent homology of a multifiltration is the homology of a single algebraic entity. This object encodes all the complexes via polynomials that track cells through the multifiltration. To define our algebraic structure, we need to first review graded modules over polynomials. A monomial in x1,…,xn is a product of the form
x1v1⋅x2v2⋯xnvn
with vi∈N. We denote it xv, where v=(v1,…,vn)∈Nn. A polynomial f in x1,…,xn and coefficients in field k is a finite linear combination of monomials, f=∑vcvxv, with cv∈k. The set of all such polynomials is denoted k[x1,…,xn]. For instance, 5x1x22−7x33∈
k[x1,x2,x3]. An n-graded ring is a ring R equipped with a decomposition of Abelian groups R≅⊕vRv,v∈Nn so that multiplication has the property Ru⋅Rv⊆Ru+v. Elements in a single group Ru are called homogeneous. The set of polynomials form the n-graded polynomial ring, denoted An. This ring is graded by Avn=kxv,v∈Nn. An n-graded module over an n-graded ring R is an Abelian group M equipped with a decomposition M≅⊕vMv,v∈Nn together with a R-module structure so that Ru⋅Mv⊆Mu+v. An n-graded module is finitely generated if it admits a finite generating set. Also, recall the notion of a free module on an n-graded set and a basis for such a module [2].
Given a multifiltration {Xu}u, the i th dimensional homology is the following n graded module over An
u⨁Hi(Xu)
where the k-module structure is the direct sum structure and xv−u:Hi(Xu)→Hi(Xv) is the induced homomorphism ιi(u,v) we described in the previous section. This view of homology yields two important results. In one dimension, the persistent homology of a filtration is easily classified and parameterized by the barcode, and there is an efficient algorithm for its computation [23]. In higher dimensions, no similar classification exists [2]. Instead, we may utilize an incomplete invariant. One such invariant, the rank invariant defined above, is provably equivalent to the barcode, and therefore complete, in one dimension, but it is incomplete in higher dimensions.
3 One-Critical Multifiltrations
We are interested in persistent homology as a tool for analyzing the topology of scientific data. In this section, we begin by formalizing such data. We then show that topological analysis of scientific data naturally generates multifiltrations. In particular, the process generates multifiltrations with the following property.
Definition 1 (one-critical). A multifiltered complex K where each cell σ has a unique critical coordinate uσ is one-critical.
In the rest of this paper, we assume that our input multifiltrations are one-critical. General multifiltrations, however, may not have this property. Therefore, we end this section by describing a classic construction that eliminates multiple critical coordinates in such input.
3.1 Model for Scientific Data
We are often given scientific data in the form of a set of noisy samples from some underlying geometric space. At each sample point, we may also have measurements from the ambient space. For example, a fundamental goal in graphics is to render objects under different lighting from different camera positions. One approach is to construct a digitized model using data from a range scanner, which employs multiple cameras to sense 3D positions on an object’s surface, as well as estimated normals and texture information [19]. An alternate
approach samples the four-dimensional light field of a surface directly and interpolates to render the object without explicit surface reconstruction [15]. Either approach gives us a set of noisy samples with measurements. Similarly, a node in a wireless sensor network has sensors on board that measure physical attributes of the local environment, such as pressure and temperature [21]. The GPS coordinates of the nodes constitute a set of samples at which several functions are sampled.
Formally then, we assume we have a manifold X with n−1 Morse functions defined on it [16]. In practice, X is often embedded within a high-dimensional Euclidean space Rd, although this is not required. As such, we model the data using the following definition.
Definition 2 (multifiltered dataset). A multifiltered dataset is (S,{fj}j), where S is a finite set of d-dimensional points with n−1 real-valued functions fj:S→R defined on it, for n>1.
3.2 Construction
We now assume our data is a multifiltered dataset (S,{fj}j). We begin by approximating the underlying space of S with a combinatorial representation, a complex, built on S. There are a variety of methods for building such complexes, all of which have a scale parameter ϵ [22]. As we increase ϵ, a complex grows larger, and fixing a maximum scale ϵmax gives us a filtered complex K. Each cell σ∈K enters K at scale ϵ(σ). We formalize this type of complex next.
Definition 3 (scale-filtered complex). A scale-filtered complex is the tuple (K,ϵ), where K is a finite complex, ϵ:K→R, and the complexes Kμ={σ∣ϵ(σ)≤μ} form a onedimensional filtration for K.
We assume we have a scale-filtered complex (K,ϵ) defined on our input point set S. To incorporate the functions fj into our data analysis, we first extend them to the cells in the complex. For σ∈K and fj, let fj(σ) be the maximum value fj takes on σ 's vertices; that is, fj(σ)=maxv∈σfj(v), where v∈S. This extension defines n−1 functions on the complex, fj:K→R. We combine all filtration functions into a single multivariate function F:K→Rn, where
F(σ)=(f1(σ),f2(σ),…,fn−1(σ),ϵ(σ))
We multifilter K via the excursion sets {Ku}u of F for u∈Rn :
Ku={σ∈K∣F(σ)≤u}
Each simplex σ enters Ku at u=F(σ) and will remain in the complex for all u≥F(σ). Equivalently, F(σ) is the unique critical coordinate at which σ enters the filtered complex. That is, the multifiltrations built by the above process are always one-critical.
Example 1 (bifiltration criticals). The bifiltration in Figure 1 is one-critical, since each simplex enters at a single critical coordinate. For instance, F(a)=(1,1),F(cde)=(3,1), and F(af)=(1,2).
Since K is finite, we have a finite set of critical coordinates that we may project on each dimension j to get a finite set of critical values Cj. We restrict ourselves to the Cartesian product C1×…×Cn of the critical values, parameterizing the resulting discrete grid using N in each dimension. This parameterization yields a a d-dimensional multifiltration {Kv}v with v∈Nn.
We end by noting that one-critical multifiltrations may be represented compactly by the set of tuples
{(σ,F(σ))∣σ∈K}
This representation is the main input to our algorithms in Section 4.3.
3.3 Mapping Telescope
In general, multifiltrations are not one-critical since a cell may enter at multiple incomparable critical coordinates, viewing ≤ as a partial order on Nn. For example, in Figure 1, the vertex d that enters at (1,0) may also enter at (0,1) as the two coordinates are incomparable. For such multifiltrations, we may utilize the mapping telescope, a standard algebraic construction, to ensure that each cell has a unique critical coordinate [13]. Intuitively, this construction introduces additional shadow cells into the multifiltration without changing its topology. We will not detail this construction here as none of the multifiltrations we encounter in practice require the conversion. We should note, however, that the mapping telescope increases the size of the multifiltration, depending on the number of cells with multiple critical points. In the worst case, the growth is exponential.
4 Using Computational Commutative Algebra
Having described our input, we next recast the problem of computing multidimensional persistence as a problem within computational commutative algebra. We then describe standard algorithms from this area that solve our problem. While this process gives us a solution, this solution is not practical as the algorithms are computationally intractable. In the next section, we refine them to derive polynomial-time algorithms.
4.1 Multigraded Homology
We begin by extending homology to multifiltered cell complexes. We then convert the computation of the latter to standard questions in computational commutative algebra.
Definition 4 (chain module). Given a multifiltered cell complex {Ku}u, the ith chain module is the n-graded module over An
Ci=u⨁Ci(Ku)
where the k-module structure is the direct sum structure and xv−u:Ci(Ku)→Ci(Kv) is the inclusion Ku↪Kv.
Note that we overload notation to reduce complexity by having Ci=Ci({Ku}u) when the multifiltration is clear from context. The module Ci is n-graded as for any u∈Nn,(Ci)u=Ci(Ku). That is, the chain complex in grade u of the module is the chain complex of Ku, the cell complex with coordinate u.
Example 2 (bifiltration module). Consider the vertex d in the bifiltered complex in Figure 1. This vertex has critical coordinate (1,0), so copies of this vertex exist in 9 complexes Ku for u≥(1,0). The inclusion maps relate these copies within the complexes. In turn, polynomials relate the chain groups in the different grades of the module. Let d be the copy of the vertex in coordinate (1,0). Then, within Ci, we have d in grade (1,0),x1d in grade (2,0),x2d in grade (1,1),x12x22d in grade (3,2) and so on, as required by the definition of an n-graded module. In other words, a simplex has different names in different grades.
The graded chain modules Ci are finitely generated, so we may choose bases for them.
Definition 5 (standard basis). The standard basis for the ith chain module Ci is the set of i-simplices in critical grades.
Example 3 (bifiltration bases). For our bifiltration in Figure 1, the highlighted and named simplices constitute the standard bases. For example, the standard basis for C0 is
grade | (0,0) | (1,0) | (1,1) |
---|---|---|---|
simplices | b,c,e,f | d | a |
Note that in doing so, we have made a choice of ordered basis. Unlike for chain groups, this choice has an important consequence: Our resulting calculations will not be invariant but depend on the initial ordered basis.
Recall that our multifiltrations are one-critical. The graded chain groups of onecritical multifiltrations are free: Since each cell enters only once, the resulting chain groups do not require any relations. Since our graded chain groups are free, the boundary operator is simply a homomorphism between free graded modules. Given standard bases, we may write the boundary operator ∂i:Ci→Ci−1 explicitly as a matrix with polynomial entries.
Example 4 (boundary matrix). For the bifiltration in Figure 1, ∂1 has the matrix
ab | bc | cd | de | ef | af | bf | ce | |
---|---|---|---|---|---|---|---|---|
a | x2 | 0 | 0 | 0 | 0 | x2 | 0 | 0 |
b | x1x22 | x12x22 | 0 | 0 | 0 | 0 | x22 | 0 |
c | 0 | x12x22 | x1 | 0 | 0 | 0 | 0 | x2 |
d | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
e | 0 | 0 | 0 | x1 | x12 | 0 | 0 | x2 |
f | 0 | 0 | 0 | 0 | x12 | x1x22 | x22 | 0 |
where we assume we are computing over Z2.
As in standard homology, the boundary operator connects the graded chain modules into a chain complex C∗ (Equation (1)) and the ith homology module is defined exactly as before (Equation (2)):
Hi=ker∂i/im∂i+1
4.2 Recasting the Problem
Our goal is to compute homology modules. Following the definition, we have three tasks:
- Compute the boundary module im∂i+1.
- Compute the cycle module ker∂i.
- Compute the quotient Hi.
We next translate these three tasks into problems in computational commutative algebra. Both the boundary and cycle modules turn out to be submodules of free and finitely generated modules that consist of vectors of polynomials. For the rest of this paper, we assume that we are computing homology over the field k. Recall from Section 2.4 that our module is defined over the n-graded polynomial ring An=k[x1,…,xn] with standard grading Avn=kxv,v∈Nn. For notational simplicity, we will use R=An to denote this ring for the remainder of this section. Let Rm be the Cartesian product of m copies of R. In other words, Rm consists of all column m-vectors of polynomials:
Rm={[f1,…,fm]T∣fi∈R,1≤i≤m}
To distinguish elements of Rm from polynomials, we adopt the standard practice of placing them in bold format, so that f∈Rm is a vector of polynomials, but f∈R is a polynomial. We use this practice exclusively for elements of Rm and not for other vectors, such as elements of Nn. We now recast the three problems:
- The boundary module is a submodule of the polynomial module. The matrix Mi+1 for ∂i+1 has mi rows and mi+1 columns, where mj denotes the number of j-simplices in the complex. Let F=(f1,…,fmi+1),fi∈Rmi, where fi is the i th column in Mi+1. This tuple of polynomial vectors generate a submodule of Rmi :
⟨F⟩={j=1∑mi+1qjfj∣qj∈R}
The Submodule Membership Problem asks whether a polynomial vector f is in a submodule M, such as ⟨F⟩. That is, the problem asks whether we may write f in terms of some basis F as above. A solution to this problem would complete our first task.
2. The cycle submodule is also a submodule of the polynomial module. The matrix for ∂i has mi−1 rows and mi columns. Let F=(f1,…,fmi),fi∈Rmi−1, where fi is the i th column in the matrix. Given F, the set of all [q1,…,qmi]T,qi∈R such that
i=1∑miqifi=0
is a R-submodule of Rmi called the (first) syzygy module of (f1,…,fmi), denoted Syz(f1,…,fmi). A set of generators for this submodule would complete our second task.
- Our final task is simple, once we have completed the first two tasks. All we need to do is test whether the generators of the syzygy submodule, our cycles, are in the boundary submodule. As we shall see, the tools which allow us to complete the first two tasks also resolve this question.
4.3 Algorithms
In this section, we begin by reviewing concepts from commutative algebra that involve the polynomial module Rm We then look at algorithms for solving the submodule membership problem and computing generators for the syzygy submodule. In our treatment, we follow Chapter 5 of Cox, Little, and O’Shea [6].
The standard basis for Rm is {e1,…,em}, where ei is the standard basis vector with constant polynomials 0 in all positions except 1 in position i. We use the “top down” order on the standard basis vectors, so that ei>ej whenever i<j. A monomial m in Rm is an element of the form xuei for some i and we say m contains ei.
For algorithms, we need to order monomials in both R and Rm. For u,v∈Nn, we say u>lex v if the vector difference u−v∈Zn, the leftmost nonzero entry is positive. The lexicographic order >lex is a total order on Nn. For example, (1,3,0)>lex (1,2,1) since (1,3,0)−(1,2,1)=(0,1,−1) and the leftmost nonzero entry is 1 . Now, suppose xu and xv are monomials in R. We say xu>lex xv if u>lex v. This gives us a monomial order on R. We next extend >lex to a monomial order on Rm using the “position-over-term” (POT) rule: xuei>xvej if i<j, or if i=j and xu>lex xv. Every element f∈Rm may be written, in a unique way, as a k-linear combination of monomials mi,
f=i∑cimi
where ci∈k,ci=0 and the monomials mi are ordered according to the monomial order. We define:
- Each cimi is a term of f.
- The leading coefficient of f is LC(f)=c1∈k.
- The leading monomial of f is LM(f)=m1.
- The leading term of f is LT(f)=c1m1.
Example 5. Let f=[5x1x22,2x1−7x33]T∈R2. Then, we may write f in terms of the standard basis (Equation (5)):
f=5[x1x22,0]T−7[0,x33]T+2[0,x1]T=5x1x23e1−7x33e2+2x1e2
From the second line, the monomials corresponding to sum (5) are m1=x1x2e1,m2= x33e2, and m3=x1e2. The second term of f is 7[0,x33] and we have LC(f)=5,LM(f)=x1x22, and LT(f)=5x1x22.
Finally, we extend division and least common multiple to monomials in R and Rm. Given monomials xu,xv∈R, if v≤u, then xv divides xu with quotient xu/xv=xu−v. Now, let w∈Nn by wi=max(ui,vi) and define the monomial xw to be the least common multiple of xu and xv, denoted LCM(xu,xv)=xw. Next, given monomials m=xuei and n=xvej in Rm, we say n divides m iff i=j and xv divides xu, and define the quotient to be m/n=xu/xv=xu−v. In addition, we define
LCM(xuei,xvej)={LCM(xu,xv)ei,0,i=j otherwise
Clearly, the LCM of two monomials is a monomial in R and Rm, respectively.
Example 6. Let f=[x1,x1x2]T and g=[x2,0]T be elements of R2. Then, the LCM of their leading monomials is:
LCM(LM(f),LM(g))=LCM(x1e1,x2e1)=x1x2e1
Recall the Submodule Membership Problem: Given a polynomial vector f and a set of t polynomials F, is f∈⟨F⟩ ? We may divide f by F using the division algorithm Divide in Figure 2. After division, we have f=(∑i=1tqifi)+r, so if the remainder r=0, then f∈⟨F⟩. This condition, however, is not necessary for modules over multivariate polynomials as we may get a non-zero remainder even when f∈⟨F⟩.
Let M be an submodule and LT(M) be the set of leading terms of elements of M. A Gröbner basis is a basis G⊆M such that ⟨LT(G)⟩=⟨LT(M)⟩. If f∈⟨F⟩, we always get r=0 after division of f by a Gröbner basis for ⟨F⟩, so we have solved the membership problem. The Buchberger algorithm in Figure 3 computes a Gröbner basis G starting from any basis F. The algorithm utilizes S-polynomials on line 4 to eliminate the leading
Divide \(\left(\mathbf{f},\left(\mathbf{f}_{\mathbf{1}}, \ldots, \mathbf{f}_{\mathbf{t}}\right)\right)\)
\(1 \mathbf{p} \leftarrow \mathbf{f}, \mathbf{r} \leftarrow 0\)
2 for \(i \leftarrow 1\) to \(t\)
do \(q_{i} \leftarrow 0\)
while \(\mathbf{p} \neq \mathbf{0}\)
do if \(\operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\) divides \(\operatorname{LT}(\mathbf{p})\) for some \(i\)
then \(q_{i} \leftarrow q_{i}+\operatorname{LT}(\mathbf{p}) / \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\)
\(\mathbf{p} \leftarrow \mathbf{p}-\left(\operatorname{LT}(\mathbf{p}) / \operatorname{LT}\left(\mathbf{f}_{\mathbf{i}}\right)\right) \mathbf{f}_{\mathbf{i}}\)
else \(\mathbf{r} \leftarrow \mathbf{r}+\operatorname{LT}(\mathbf{p})\)
\(\mathbf{p} \leftarrow \mathbf{p}-\operatorname{LT}(\mathbf{p})\)
return \(\left(\left(q_{1}, \ldots, q_{t}\right), \mathbf{r}\right)\)
Figure 2: The Divide algorithm divides f∈Rm by an t-tuple (f1,…,ft),fi∈Rm to get f=(∑i=1mqifi)+r, where qi∈R and r∈Rm.
\(\operatorname{BuChBerger}(F)\)
\(G \leftarrow F\)
repeat \(G^{\prime} \leftarrow G\)
foreach pair \(\mathbf{f} \neq \mathbf{g} \in G\)
\(\left(\left(q_{1}, \ldots, q_{t}\right), \mathbf{r}\right) \leftarrow \operatorname{Divide}(S(\mathbf{f}, \mathbf{g}), G)\)
if \(\mathbf{r} \neq \mathbf{0}\)
then \(G \leftarrow G \cup\{\mathbf{r}\}\)
until \(G=G^{\prime}\)
return \(G\)
Figure 3: The algorithm Buchberger completes a given basis F to a Gröbner basis G by incrementally adding the remainders of S-polynomials (Equation (7)) divided by the current basis.
terms of polynomial vectors and complete the given basis to a Gröbner basis. The syzygy polynomial vector or S-polynomial S(f,g)∈Rm of f and g is
S(f,g)h=LT(f)hf−LT(g)hg, where =LCM(LM(f),LM(g))
A Gröbner basis generated by the algorithm is neither minimal nor unique. A reduced Gröbner basis is a Gröbner basis G such that for all g∈G,LC(g)=1 and no monomial of g lies in ⟨LT(G−{g}⟩. A reduced Gröbner basis is both minimal and unique. We may compute a reduced Gröbner basis by reducing each polynomial in G in turn, replacing g∈G with the remainder of Divide(g,G−{g}). Since the algorithm is rather simple, we do not present pseudo-code for it. The Divide, Buchberger, and the reduction algorithms together solve the submodule membership problem and, in turn, our first task of computing im∂i+1.
We next compute generators for the syzygy submodule to complete our second task. We begin by computing a Gröbner basis G={g1,…,gs} for ⟨F⟩, where the vectors are ordered by monomial order >lex . We then compute Divide(S(gi,gj),G) for each pair of Gröbner basis elements. Since G is a Gröbner basis, the remainder of this division is 0, giving us
S(gi,gj)=k=1∑sqijkgk
Let ϵ1,…,ϵs be the standard basis vectors in Rs and let
hij=LCM(LT(gi,gj))qij=k=1∑sqijkϵk∈Rs
For pairs (i,j) such that hij=0, we define sij∈Rs by
sij=LT(gi)hijϵi−LT(gj)hijϵj−qij∈Rs
with sij=0, otherwise. Schreyer’s Theorem states that the set {sij}ij form a Gröbner basis for Syz(g1,…,gs) [6, Chapter 5, Theorem 3.3]. Clearly, we may compute this basis using Divide. We use this basis to find generators for Syz(f1,…,ft).
Let MF and MG be the m×t and m×s matrices in which the fi 's and gi 's are columns, respectively. As both bases generate the same module, there is a t×s matrix A and an s×t matrix B such that MG=MFA and MF=MGB. To compute A, we initialize A to be the identity matrix and add a column to A for each division on line 4 of Buchberger that records the pair involved in the S-polynomial. The matrix B may be computed by using the division algorithm. To see how, notice that each column of MF is divisible by MG since MG is a Gröbner basis for MF. Now there is a column in B for each column fi∈MF, which is obtained by division of fi by MG. Let s1,…,st be the columns of the t×t matrix It−AB. Then,
Syz(f1,…,ft)=⟨Asij,s1,…,st⟩
giving us the syzygy generators [6, Chapter 5, Proposition 3.8]. We refer to the algorithm sketched above as Schreyer’s algorithm. This algorithm completes the second task.
The third task is to compute the quotient Hi given im∂i+1=⟨G⟩ and ker∂i= Syz(f1,…,ft). We simply need to find whether the columns of ker∂i can be represented as a combination of the basis for im∂i+1. The modules Hi may be computed using the division algorithm. We divide every column in ker∂i by im∂i+1 using the Divide algorithm. If the remainder is non-zero, we add the remainder both to im∂i+1 and Hi so as to count only unique cycles.
A Gröbner basis of a module depends on the choice of the ordered basis, so our resulting specification of homology is not unique up to the module, and therefore, not an invariant. This means, for instance, that we cannot compare two Gröbner bases to determine if they represent the same module. That is, while our solution is complete, it is not an invariant. For this reason, we give polynomial time algorithms to read off a discrete invariant in Section 6 from our results. This invariant is, however, incomplete as predicted by prior work [2].
While the above algorithms solve the membership problem, they have not been used in practice due to their complexity. The submodule membership problem is a generalization of the Polynomial Ideal Membership Problem (PIMP) which is EXPSPACE-complete, requiring exponential space and time [17, 20]. Indeed, the Buchberger algorithm, in its original form, is doubly-exponential and is therefore not practical.
5 Multigraded Algorithms
In this section, we show that multifiltrations provide additional structure that may be exploited to simplify the algorithms from the previous section. These simplifications convert
these intractable algorithms into polynomial time algorithms. Throughout this section, the field k of coefficients is the field with two elements Z2, for simplicity. Our treatment, however, generalizes to any arbitrary field.
5.1 Exploiting Homogeneity
The key property that we exploit for simplification is homogeneity.
Definition 6 (homogeneous). Let M be an m×n matrix. The matrix M is homogeneous iff
- every column (row) f of M is associated with a coordinate uf and corresponding monomial xuf,
- every non-zero element Mjk may be expressed as the quotient of the monomials associated with column k and row j, respectively.
Any vector f endowed with a coordinate uf that may be written as above is homogeneous, e.g. the columns of M.
If the field k is not Z2, we insert an element of k as a coefficient for each monomial in the matrix. Our approach is as follows. We will show that all boundary matrices ∂i may be written as homogeneous matrices initially, and the algorithms for computing persistence only produce homogeneous matrices and vectors. That is, we maintain homogeneity as an invariant throughout the computation. We begin with our first task.
Lemma 1. For a one-critical multifiltration, the matrix of ∂i:Ci→Ci−1 written in terms of the standard bases is homogeneous.
Proof. Recall that we may write the boundary operator ∂i:Ci→Ci−1 explicitly as a mi−1×mi matrix M in terms of the standard bases for Ci and Ci−1, as shown in matrix (4) for ∂1. From Definition 5, the standard basis for Ci is the set of i-simplices in critical grades. In a one-critical multifiltration, each simplex σ has a unique critical coordinate uσ by Definition 1. In turn, we may represent this coordinate by the monomial xuσ. For instance, simplex a in Figure 1 has critical grade (1,1) and monomial x(1,1)=x1x2. We order these monomials using >lex and use this ordering to rewrite the matrix for ∂i. The matrix entry Mjk relates σk, the k th basis element for Ci to σ^j, the j th basis element for Ci−1. If σ^j is not a face of σk, then Mjk=0. Otherwise, σ^j is a face of σk. Since a face must precede a co-face in a multifiltration, we have uσ^j≤uσk, so xuσ^j divides xuσk and Mjk=xuσk/xuσ^j=xuσk−uσ^j. That is, the matrix is homogeneous.
Corollary 1. For a one-critical multifiltration, the boundary matrix ∂i in terms of the standard bases has monomial entries.
Proof. The result is immediate from the proof of the previous lemma. The matrix entry is either 0 , a monomial, or xu(σk)−u(σ^j), a monomial.
Example 7. We show the homogeneous matrix for ∂1 below, where we augment the matrix with the simplices and their associated monomials. For example, σ^1=a is a face of σ1=ab, so M11=x1x22/x1x2=x2. Again, we assume we are computing over Z2 :
abadbcefbcx1x22x1x2x11111cdx12x22x20x1x22000dex100x12x22x12x2200efx1010x100afx120100x10bfx1x220000x12x12cex22x20000x1x22x200x2200x22000x2x20
We next focus on the second task, showing that given a homogeneous matrix as input, the algorithms produce homogeneous vectors and matrices. Let F be an m×n homogeneous matrix. Let {e1,…,em} and {e^1,…,e^n} be the standard bases for Rm and Rn, respectively. A homogeneous matrix associates a coordinate and monomial to the row and column basis elements. For example, since x1 is the monomial for row 2 of matrix (9), we have ue2=(1,0) and xue2=x1. Each column f in F is homogeneous and may be written in terms of rows:
f=i=1∑mcixue1xufei
where ci∈k and we allow ci=0 when a row is not used. For instance, column g representing the edge ab in the bifiltration shown in Figure 1 may be written as:
g=x2e1+x2x22e3=x1x2x2x32e1+1x2x32e3=xue1xuge1+xue3xuge3=i∈{1,3}∑xue1xugei
Consider the Buchberger algorithm in Figure 3. The algorithm repeatedly computes S-polynomials of homogeneous vectors on line 4.
Lemma 2. The S-polynomial S(f,g) of homogeneous vectors f and g is homogeneous.
Proof. A zero S-polynomial is trivially homogeneous. A non-zero S-polynomial S(f,g) implies that h in Equation (8) is non-zero. By the definition of LCM in Equation (6), h being non-zero implies that the leading monomials of f and g contain the same basis
element ej. We have:
ParseError: KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: … \end{aligned}
Let xℓ=LCM(xuf,xug)=xLCM(uf,ug), giving us h=xuejejxℓ. We now have
LT(f)h=cfxuejejxufxuejejxℓ=cfxufxℓ
where cf=0 is the field constant in the leading term of f. Similarly, we get
LT(g)h=cgxugxℓ,cg=0
Putting it together, we have
ParseError: KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: … \end{aligned}
where di=ci/cf−ci′/cg. Comparing with Equation (10), we see that S(f,g) is homogeneous with uS(f,g)=ℓ.
Having computed the S-polynomial, Buchberger next divides it by the current homogeneous basis G on line 4 using a call to the Divide algorithm in Figure 2.
Lemma 3. Divide (f,(f1,…,ft)) returns a homogeneous remainder vector r for homogeneous vectors f,fi∈Rm.
Proof. On line 1, r and p are initialized to be 0 and f, respectively, and are both trivially homogeneous. We will show that each iteration of the while loop starting on line 4 maintains the homogeneity of these two vectors. On line 5, since both fi and p are homogeneous, we have
fi=j=1∑mcijxuejejxufip=j=1∑mdjxuejejxup
Since LT(fi) divides LT(p), the terms must share basis element ek and we have
LT(fi)LT(p)LT(p)/LT(fi)=cikxuekekxufi=dkxuekekxup=cikdk⋅xufixup
On line 7,p is assigned to
p−(LT(p)/LT(fi))fi=j=1∑mdjxuejejxup−(cikdk⋅xufiejxup)j=1∑mcijxuejejxufi=j=1∑m(dj−cikdk⋅cij)xuejejxup=j=1∑mdj′xuejejxup
where dj′=dj−dk⋅cij/cik and dk′=0, so the subtraction eliminates the k th term. The final sum means that p is a new homogeneous polynomial with the same coordinate up as before. Similarly, LT (p) is added to r on line 8 and subtracted from p on line 9 , and neither action changes the homogeneity of either vector. Both remain homogeneous with coordinate up.
The lemmas combine to give us the desired result.
Theorem 1 (homogeneous Gröbner). Given a homogeneous basis, the BuchBERGER algorithm computes a homogeneous Gröbner basis.
Proof. Initially, the algorithm sets G to be the input basis F, which is homogeneous. On line 4, it computes the S-polynomial of homogeneous vectors f,g∈G. By Lemma 2, the S-polynomial is homogeneous. It then divides the S-polynomial by G. Since the input is homogeneous, Divide produces a homogeneous remainder r by Lemma 3. Since only homogeneous vectors are added to G on line 6,G remains homogeneous.
We may extend this result easily to the reduced Gröbner basis. Using similar arguments, we may show the following result, whose proof we omit here.
Theorem 2 (homogenous syzygy). For a homogeneous matrix, all matrices encountered in the computation of the syzygy module are homogeneous.
5.2 Data Structures and Optimizations
We have shown that the structure inherent in a multifiltration allows us to compute using homogeneous vectors and matrices whose entries are monomials only. We next explore the consequences of this restriction on both the data structures and complexity of the algorithms.
By Definition (6), an m×n homogeneous matrix naturally associates monomials to the standard bases for Rm and Rn. Moreover, every non-zero entry of the matrix is a quotient of these monomials as the matrix is homogeneous. Therefore, we do not need to store the matrix entries, but simply the field elements of the matrix along with the monomials for the bases. We may modify two standard data structures to represent the matrix.
- linked list: Each column stores its monomial as well as a linked-list of its non-zero entries in sorted order. The non-zero entries are represented by the row index and the field element. The matrix is simply a list of these columns in sorted order. Figure 4 displays matrix (9) in this data structure.
- matrix: Each column stores its monomial as well as the column of field coefficients. If we are computing over a finite field, we may pack bits for space efficiency.
The linked-list representation is appropriate for sparse matrices as it is space-efficient at the price of linear access time. This is essentially the representation used for computing in the one-dimensional setting [23]. In contrast, the matrix representation is appropriate for dense matrices as it provides constant access time at the cost of storing all zero entries. The multidimensional setting provides us with denser matrices, as we shall see, so the matrix representation becomes a viable structure.
In addition, the matrix representation is optimally suited to computing over the field Z2, the field often commonly employed in topological data analysis. The matrix entries each
Figure 4: The linked list representation of the boundary matrix ∂1 of Equation (4), for the bifiltration shown in Figure 1, in column sorted order. Note that the columns in Equation (4) are not ordered while they are sorted correctly here.
take one bit and the column entries may be packed into machine words. Moreover, the only operation required by the algorithms is symmetric difference which may be implemented as a binary XOR operation provided by the chip. This approach gives us bit-level parallelism for free: On a 64-bit machine, we perform symmetric difference 64 times faster than on the list. The combination of these techniques allow the matrix structure to perform better than the linked-list representation in practice.
We may also exploit homogeneity to speed up the computation of new vectors and their insertion into the basis. We demonstrate this briefly using the Buchberger algorithm. We order the columns of input matrix G using the POT rule for vectors as introduced in Section 4. Suppose we have f,g∈G with f>g. If S(f,g)=0, LT (f) and LT(g) contain the same basis element, which the S-polynomial eliminates. So, we have S(f,g)<g<f. This implies that when dividing S(f,g) by the vectors in G, we need only consider vectors that are smaller than g. Since the vectors are in sorted order, we consider each in turn until we can no longer divide. By the POT rule, we may insert the new remainder column here into the basis G. This gives us a constant time insertion operation for maintaining the ordering, as well as faster computation of the Gröbner basis.
5.3 Complexity
In this section, we give simple polynomial bounds on our multigraded algorithms. These bounds imply that we may compute multidimensional persistence in polynomial time.
Lemma 4. Let F be an m×n homogeneous matrix of monomials. The Gröbner basis G contains O(n2m) vectors in the worst case. We may compute G using Buchberger in O(n4m3) worst-case time.
Proof. In the worst case, F contains nm unique monomials. Each column f∈F may have any of the nm monomials as its monomial when included in the Gröbner basis G. Therefore, the total number of columns in G is O(n2m). In computing the Gröbner basis, we compare all columns pairwise, so the total number of comparisons is O(n4m2). Dividing the S-polynomial takes O(m) time. Therefore, the worst-case running time is O(n4m3).
In practice, the number of unique monomials in the matrix is lower than the worst case. In computing persistence, for example, we may control the number of unique monomials by ignoring close pairs of gradings. The following lemma bounds the basis size and running time in this case.
Lemma 5. Let F be an m×n homogeneous matrix with h of unique monomials. The Gröbner basis G contains O(hn) vectors and may be computed in time O(n3h2).
The proof is identical to the previous lemma.
Lemma 6. Let F be an m×n homogeneous matrix of monomials and G be the Gröbner basis of F. The syzygy module S for G may be computed using Schreyer’s algorithm in O(n4m2) worst-case time.
Proof. In computing the syzygy Module, we compare all columns of G pairwise, so the total number of comparisons is O(n4m2). Dividing the S-polynomial takes O(m) time. Therefore, the worst-case running time is O(n4m3).
Theorem 3. Multidimensional persistence may be computed in polynomial time.
Proof. Multidimensional persistence is represented by the Gröbner bases and the syzygy moduli of all the homogeneous boundary matrices ∂i for a given multifiltration. In the previous lemmas, we have shown that both the Gröbner basis and the syzygy module can be computed in polynomial time. Therefore, one can compute multidimensional persistence in polynomial time.
In other words, our optimizations in this section turn the exponential-algorithms from the last section into polynomial-time algorithms.
6 Computing the Rank Invariant
Having described our algorithms, in this section we discuss the computation of the rank invariant. Recall that our solution is complete, but not an invariant. In contrast, the rank invariant is incomplete, but is an invariant and may be used, for instance, as a descriptor in order to compare and match multifiltrations. We begin with direct computation that computes the invariant for each pair independently, giving us an intractable algorithm. We then discuss alternate approaches using posets and vineyards. We end this section by giving a polynomial time algorithm for reading off the rank invariant from the solution computed using our multigraded algorithms.
6.1 Direct Computation
We assume we are given a n-dimensional multifiltration of a cell complex K with m cells. Recall the rank invariant, Equation (3), from Section 2. Observe that any pair u≤v∈Nn defines a one-dimensional filtration with a new parameter t, where we map u to t=0,v to t=1, obtaining a two-level filtration. We then use the persistence algorithm to obtain barcodes [23]. The invariant ρi(u,v) may be read off from the βi-barcode: It is the number of intervals that contain both 0 and 1 . The persistence algorithm is Θ(m3) in the worstcase, so we have a cubic time algorithm for computing the rank invariant for a single pair of coordinates.
To fully compute the rank invariant, we need to consider all distinct pairs of complexes in a multifiltration. It may seem, at first, that we need to only consider critical coordinates, such as (1,1) and (2,0) in the bifiltration in Figure 1. However, note that the complex at coordinate (2,1) is also distinct even though no simplex is introduced at that coordinate. Inspired by this example, we may devise the following worst-case construction: We place m/n cells on each of the n axis to generate (m/n)n=Θ(mn) distinct complexes.
Simple calculation shows that there are Θ(m2n) comparable coordinates with distinct complexes. For each pair, we may compute the rank invariant using our method above for a total of O(m2n+3) running time. To store the rank invariant, we also require Θ(m2n) space.
6.2 Alternate Approaches
Our naive algorithm above computes the invariant for each pair of coordinates independently. In practice, we may read off multiple ranks from the same barcode for faster calculation. Any monotonically increasing path from the origin to the coordinate of the full complex is a one-dimensional filtration, such as the following path in Figure 1.
(0,0)→(1,1)→(2,2)→(3,2)
Having computed persistence, we may read off the ranks for all six comparable pairs within this path. We may formalize this approach using language from the theory of partially ordered sets. The path described above is a maximal chain in the multifiltration poset: a maximal set of pairwise comparable complexes. We require a set of maximal chains such that each pair of comparable elements (here, complexes) are in at least one chain. Each maximal chain requires a single one-dimensional persistence computation. We now require an algorithm that computes the smallest set of such chains. We know of no algorithm for this computation. Furthermore, it is not clear whether this approach would be faster than the direct approach in the worst case.
Another approach is to use vineyards as introduced in [4]. The vineyards method applies to the specific situation of a function of the form h(t,x)=(t,tf(x)+(1−t)g(x)), where x is a point in a manifold or space. One then considers the two variable persistence based on the function h. The rank invariants are then computed based for pairs of points using single variable method. The method does not permit the computation of the full 2-dimensional persistence.
6.3 Multigraded Approach
Full computation of the rank invariant is hampered by the exponential storage requirement. Instead, we may first compute multidimensional persistence using our multigraded algorithms in Section 5. We then simply read off the rank invariant using the Rank algorithm, as shown in Figure 5. We describe the algorithm in the proof of the following theorem.
Theorem 4. Rank(Z,B,u,v) computes the rank invariant ρi(u,v), if Z is the syzygies of ∂i and B is the Gröbner basis for ∂i+1.
Proof. The algorithm uses two simple helper procedures. The procedure Promote takes a matrix M and coordinate u as input. It then finds the columns f∈M whose associated coordinate uf precedes u, and promotes them to coordinate u by a simple shift. The procedure Quotient finds the quotient of the input matrices by division: If the remainder r is non-zero, it adds r to the quotient Q, also adding it to B so it only find unique cycles.
RANK (Z,B,u,v)
1Zu←PromOte(Z,u)
2Bu←PromOte(B,u)
3Hu←Quotient(Zu,Bu)
4Zuv←PromOte(Hu,v)
5Bv←PromOte(B,v)
6Huv←Quotient(Zuv,Bv)
7 return ∣Huv∣
QUotient (Z,B)
1Q←∅
2 foreach f∈Z
3 do ((q1,…,qt),r)←Divide(f,B)
4 if r=0
5 then Q←Q∪{r}
6B←B∪{r}
7 return Q
PromOte(M,u)
1 return {ufuf∣f∈M,uf≤u}
Figure 5: The algorithm RANK computes the rank invariant ρi(u,v) if Z is the set of syzygies of ∂i and B is the Gröbner basis for ∂i+1. The procedure Quotient finds the quotient of Z by B using the Divide algorithm. The procedure Promote promotes cycles that exist before time u to that time.
Now assume the input are as in the statement of the theorem. By the definition of the rank invariant, we need to count homology cycles that exist at u and persist until v. The RANK algorithm implements this. We compute homology Hu at u on the first three lines. On line 4, we promote these cycles to coordinate v. We then quotient with boundaries Bv at v to find homology cycles Huv that exist at u and persist until v. The cardinality of this set is the rank invariant by definition.
7 Experiments
In this section, we describe our implementation as well as initial quantitative experiments that show the performance of our algorithms in practice. We end with a last look at our example bifiltration in Figure 1: computing its rank invariant using our multigraded algorithms.
7.1 Implementation
We initially used software packages CoGaA[3] and Macaulay [11], which contain standard implementations of the algorithms. These packages were immensely helpful during our software development as they allowed for quick and convenient testing of the basic algorithms. In practice, there are two problems in using these packages for large datasets. First, these packages are slow since they are general and not customized for homogeneous matrices. Second, these packages produce verbose output that must be parsed for further computation.
Our experience led us to implement our algorithms for computation over Z2, optimizing the code for this field. Our implementation is in Java and and was tested under Mac OS X 10.5.6 running on a 2.3 Ghz Intel Quad-Core Xeon MacPro computer with 6 GB RAM.
7.2 Data
We generate n×n, random, bifiltered, homogeneous matrices, to simulate the boundary matrix ∂k−1 of a random bifiltered complex with n simplices in dimensions k−1 and k−2. We use the following procedure:
- Randomly generate n monomials {m1,…,mn} corresponding to the monomials associated with the basis elements of the rows.
- For each column f generate k integers indexing the non-zero rows.
- Set the column monomial to be LCM(mj), where {mj}j are the monomials of rows with non-zero
Each column in this matrix has k non-zero elements and is homogeneous by construction. We also generate random matrices but limit the number of unique monomials in the matrix to be O(h2) for different values of h. The basic idea behind these tests is that the range of the filtrations in a cell complex can typically be divided into smaller discrete intervals. For generation, we replace the first step of the procedure above with the following two steps:
0 . Randomly generate h unique monomials {l1,…,lh}.
- Generate n monomials {m1,…,mn} corresponding to the monomials associated with the basis elements of the rows such that mi∈{l1,…,lh}.
After executing step 2 and 3 above, our resulting matrix has homogeneous columns with k non-zero elements and at most h2 unique monomials.
7.3 Size & Timings
According to Lemma 4, the number of columns in the Gröbner basis for a random matrix may grow O(n3) as we have n=m here. Figure 6(a) shows that the growth of the Gröbner
Figure 6: Random n×n matrices with k non-zero entries in each column. (a) The number of columns ∣G∣ in Gröbner basis G (b) Running time in seconds for computing multidimensional persistence using list (l) or matrix (m) data structures.
basis is less in practice, about linear for k=2 and quadratic for k=4, and increases as the matrix becomes denser. Similarly, the theoretical running time for this matrix is O(n7). Figure 6(b) demonstrates that the actual running time matches this bound quite well: about O(n6) in these tests. The matrix method, however, is considerably more efficient, about 20 times faster for our largest test here.
We next limit the number of unique monomials in the input matrices. Figures 7 and 8 give the size and running time for matrices with at most h2 monomials for h=20 and h=100, respectively. We see that the growth of the basis is about linear for different values of k and h, and the running time matches the theoretical O(n3) bound in Lemma 5 quite well.
7.4 Rank Invariant
We end this paper by revisiting our motivating bifiltration from Figure 1 and computing its multidimensional persistence and rank invariants using our algorithms. Using the natural ordering on the simplices, one can write the boundary matrices M1 and M2 for ∂1 and ∂2, respectively, as:
M1=000x1x220x2x1x220000x200x10100x1010000x12x1200x2200x22000x2x2000x12x22x12x220000
Figure 7: Random n×n matrices with k non-zero entries in each column and a total of h2 monomials for h=20. (a) The number of columns ∣G∣ in Gröbner basis G. (b) Running time in seconds for computing multidimensional persistence using list (l) or matrix (m) data structures.
M2=0x1200x1x100x130x12x2x12x20000
The Gröbner basis (G1) and the set of syzygies (Z1) for ∂1 are:
G1=000x1x220x2x1x220000x200x10100x1001000x12x1200x2200x22000x1x10000x2x2000x12x22x12x220000
Z1=0x12x1x22x1x2200x2210x1001100x10x2x20000
Figure 8: Random n×n matrices with k non-zero entries in each column and a total of h2 monomials for h=100. (a) The number of columns ∣G∣ in Gröbner basis G. (b) Running time in seconds for computing multidimensional persistence using list (l) or matrix (m) data structures.
Note that each row in the syzygy matrix corresponds to an edge in the appropriate order. Finally, the Gröbner basis for ∂2 is G2=M2, as the only possible S-polynomial is identically 0 .
Using G1,Z1 and G2, one can read off the rank invariants for various u and v using the Rank algorithm in Section 6.3. A few interesting rank invariants for this example are:
u | v | ρ0(u,v) | ρ1(u,v) |
---|---|---|---|
[0,0] | [1,1] | 3 | 0 |
[1,0] | [2,1] | 2 | 0 |
[1,1] | [1,2] | 2 | 1 |
[2,2] | [3,2] | 1 | 1 |
8 Conclusion
In this paper, we fully examine the computation of multidimensional persistence, from theory to algorithms, implementation, and experiments. We develop polynomial time algorithms by recasting the problem into computational commutative algebra. Although the recast problem is EXPSPACE-complete, we exploit the multigraded setting to develop practical algorithms. The Gröbner bases we construct allow us to reconstruct the entire multidimensional persistence vector space, providing us a convenient way to compute the rank invariant. We implement all algorithms in the paper and show that the calculations are feasible due to the sparsity of the boundary matrices.
For additional speedup, we plan to parallelize the computation by batching and threading the XOR operations. We also plan to apply our algorithms toward studying scientific data. For instance, for zero-dimensional homology, multidimensional persistence
corresponds to clustering multiparameterized data, This fresh perspective, as well as a new arsenal of computational tools, allows us to attack an old and significant problem in data analysis.
References
[1] G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian. On the local behavior of spaces of natural images. International Journal of Computer Vision, 76(1):1-12, 2008.
[2] G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. Discrete E Computational Geometry, 42(1):71-93, 2009.
[3] CoCoATeam. CoCoA: a system for doing Computations in Commutative Algebra. http://cocoa.dima.unige.it.
[4] D. Cohen-Steiner, H. Edelsbrunner, and D. Morozov. Vines and vineyards by updating persistence in linear time. In Proc. ACM Symposium on Computational Geometry, pages 119−126,2006.
[5] A. Collins, A. Zomorodian, G. Carlsson, and L. Guibas. A barcode shape descriptor for curve point cloud data. Computers and Graphics, 28:881-894, 2004.
[6] D. A. Cox, J. Little, and D. O’Shea. Using algebraic geometry, volume 185 of Graduate Texts in Mathematics. Springer, New York, second edition, 2005.
[7] V. de Silva, R. Ghrist, and A. Muhammad. Blind swarms for coverage in 2-D. In Proceedings of Robotics: Science and Systems, 2005. http://www.roboticsproceedings.org/rss01/.
[8] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification. Discrete E Computational Geometry, 28:511-533, 2002.
[9] S. Eilenberg and J. A. Zilber. Semi-simplicial complexes and singular homology. Annals of Mathematics, 51(3):499-513, 1950.
[10] P. Frosini and M. Mulazzani. Size homotopy groups for computation of natural size distances. Bull. Belg. Math. Soc. Simon Stevin, 6(3):455-464, 1999.
[11] D. R. Grayson and M. E. Stillman. Macaulay 2, a software system for research in algebraic geometry. http://www.math.uiuc.edu/Macaulay2/.
[12] A. Gyulassy, V. Natarajan, V. Pascucci, P. T. Bremer, and B. Hamann. Topology-based simplification for feature extraction from 3D scalar fields. In Proc. IEEE Visualization, pages 275−280,2005.
[13] A. Hatcher. Algebraic Topology. Cambridge University Press, New York, NY, 2002. http://www.math.cornell.edu/ hatcher/AT/ATpage.html.
[14] T. Kaczynski, K. Mischaikow, and M. Mrozek. Computational Homology. SpringerVerlag, New York, NY, 2004.
[15] M. Levoy and P. Hanrahan. Light field rendering. In Proc. SIGGRAPH, pages 31-42, 1996.
[16] Y. Matsumoto. An Introduction to Morse Theory, volume 208 of Iwanami Series in Modern Mathematics. American Mathematical Society, Providence, RI, 2002.
[17] E. W. Mayr. Some complexity results for polynomial ideals. Journal of Complexity, 13(3):303-325, 1997.
[18] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L. Ringach. Topological analysis of population activity in visual cortex. Journal of Vision, 8(8):1-18, 6 2008.
[19] G. Turk and M. Levoy. Zippered polygon meshes from range images. In Proc. SIGGRAPH, pages 311-318, 1994.
[20] J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, Cambridge, UK, second edition, 2003.
[21] F. Zhao and L. J. Guibas. Wireless Sensor Networks: An Information Processing Approach. Morgan-Kaufmann, San Francisco, CA, 2004.
[22] A. Zomorodian. Computational topology. In M. Atallah and M. Blanton, editors, Algorithms and Theory of Computation Handbook, volume 2, chapter 3. Chapman & Hall/CRC Press, Boca Raton, FL, second edition, 2010.
[23] A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete E Computational Geometry, 33(2):249-274, 2005.
References (24)
- of Computational Geometry jocg.org corresponds to clustering multiparameterized data, This fresh perspective, as well as a new arsenal of computational tools, allows us to attack an old and significant problem in data analysis.
- G. Carlsson, T. Ishkhanov, V. de Silva, and A. Zomorodian. On the local behavior of spaces of natural images. International Journal of Computer Vision, 76(1):1-12, 2008.
- G. Carlsson and A. Zomorodian. The theory of multidimensional persistence. Discrete & Computational Geometry, 42(1):71-93, 2009.
- CoCoATeam. CoCoA: a system for doing Computations in Commutative Algebra. http://cocoa.dima.unige.it.
- D. Cohen-Steiner, H. Edelsbrunner, and D. Morozov. Vines and vineyards by updating persistence in linear time. In Proc. ACM Symposium on Computational Geometry, pages 119 -126, 2006.
- A. Collins, A. Zomorodian, G. Carlsson, and L. Guibas. A barcode shape descriptor for curve point cloud data. Computers and Graphics, 28:881-894, 2004.
- D. A. Cox, J. Little, and D. O'Shea. Using algebraic geometry, volume 185 of Graduate Texts in Mathematics. Springer, New York, second edition, 2005.
- V. de Silva, R. Ghrist, and A. Muhammad. Blind swarms for cov- erage in 2-D. In Proceedings of Robotics: Science and Systems, 2005. http://www.roboticsproceedings.org/rss01/.
- H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and sim- plification. Discrete & Computational Geometry, 28:511-533, 2002.
- S. Eilenberg and J. A. Zilber. Semi-simplicial complexes and singular homology. Annals of Mathematics, 51(3):499-513, 1950.
- P. Frosini and M. Mulazzani. Size homotopy groups for computation of natural size distances. Bull. Belg. Math. Soc. Simon Stevin, 6(3):455-464, 1999.
- D. R. Grayson and M. E. Stillman. Macaulay 2, a software system for research in algebraic geometry. http://www.math.uiuc.edu/Macaulay2/.
- A. Gyulassy, V. Natarajan, V. Pascucci, P. T. Bremer, and B. Hamann. Topology-based simplification for feature extraction from 3D scalar fields. In Proc. IEEE Visualization, pages 275-280, 2005.
- A. Hatcher. Algebraic Topology. Cambridge University Press, New York, NY, 2002. http://www.math.cornell.edu/~hatcher/AT/ATpage.html.
- T. Kaczynski, K. Mischaikow, and M. Mrozek. Computational Homology. Springer- Verlag, New York, NY, 2004.
- M. Levoy and P. Hanrahan. Light field rendering. In Proc. SIGGRAPH, pages 31-42, 1996.
- Y. Matsumoto. An Introduction to Morse Theory, volume 208 of Iwanami Series in Modern Mathematics. American Mathematical Society, Providence, RI, 2002.
- E. W. Mayr. Some complexity results for polynomial ideals. Journal of Complexity, 13(3):303-325, 1997.
- G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, and D. L. Ringach. Topo- logical analysis of population activity in visual cortex. Journal of Vision, 8(8):1-18, 6 2008.
- G. Turk and M. Levoy. Zippered polygon meshes from range images. In Proc. SIG- GRAPH, pages 311-318, 1994.
- J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, Cambridge, UK, second edition, 2003.
- F. Zhao and L. J. Guibas. Wireless Sensor Networks: An Information Processing Approach. Morgan-Kaufmann, San Francisco, CA, 2004.
- A. Zomorodian. Computational topology. In M. Atallah and M. Blanton, editors, Algorithms and Theory of Computation Handbook, volume 2, chapter 3. Chapman & Hall/CRC Press, Boca Raton, FL, second edition, 2010.
- A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete & Compu- tational Geometry, 33(2):249-274, 2005.