We consider minimum equivalent digraph problem, its maximum optimization variant and some non-tri... more We consider minimum equivalent digraph problem, its maximum optimization variant and some non-trivial extensions of these two types of problems motivated by biological and social network applications. We provide 3 2 -approximation algorithms for all the minimization problems and 2-approximation algorithms for all the maximization problems using appropriate primal-dual polytopes. We also show lower bounds on the integrality gap of the polytope to provide some intuition on the final limit of such approaches. Furthermore, we provide APXhardness result for all those problems even if the length of all simple cycles is bounded by 5.
Motivated by the widespread proliferation of wireless networks employing directional antennas, we... more Motivated by the widespread proliferation of wireless networks employing directional antennas, we study some capacitated covering problems arising in these networks. Geometrically, the area covered by a directional antenna with parameters α, ρ, r is a set of points with polar coordinates (r, θ) such that r ≤ r and α ≤ θ ≤ α + ρ. Given a set of customers, their positions on the plane and their bandwidth demands, the capacitated covering problem considered here is to cover all the customers with the minimum number of directional antennas such that the demands of customers assigned to an antenna stays within a bound. We consider two settings of this capacitated cover problem arising in wireless networks. In the first setting where the antennas have variable angular range, we present an approximation algorithm with ratio 3. In the setting where the angular range of antennas is fixed, we improve this approximation ratio to 1.5. These results also apply for a related problem of bin packing with deadlines. In this problem we are are given a set of items, each with a weight, arrival time and deadline, and we want to pack each item into a bin after it arrives but before its deadline. The objective is to minimize the number of bins used. We present a 3-approximation algorithm for this problem, and 1.5-approximation algorithm for the special case when each difference between a deadline and the corresponding arrival time is the same.
A feedback vertex set of a graph is a subset of vertices that contains at least one vertex from e... more A feedback vertex set of a graph is a subset of vertices that contains at least one vertex from every cycle in the graph. The problem considered is that of finding a minimum feedback vertex set given a weighted and undirected graph. We present a simple and efficient approximation algorithm with performance ratio of at most 2, improving previous best bounds for either weighted or unweighted cases of the problem. Any further improvement on this bound, matching the best constant factor known for the vertex cover problem, is deemed challenging. The approximation principle, underlying the algorithm, is based on a generalized form of the classical local ratio theorem, originally developed for approximation of the vertex cover problem, and a more flexible style of its application.
Given a strong match between regions of two sequences, how far can the match be meaningfully exte... more Given a strong match between regions of two sequences, how far can the match be meaningfully extended if gaps are allowed in the resulting alignment? The aim is to avoid searching beyond the point that a useful extension of the alignment is likely to be found. Without loss of generality, we can restrict attention to the suffixes of the sequences that follow the strong match, which leads to the following formal problem. Given two sequences and a fixed X > 0, align initial portions of the sequences subject to the constraint that no section of the alignment scores below -X. Our results indicate that computing an optimal alignment under this constraint is very expensive. However, less rigorous conditions on the alignment can be guaranteed by quite efficient algorithms. One of these variants has been implemented in a new release of the Blast suite of database search programs.
For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align ... more For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.
In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse... more In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical model 4 on 2D and 3D lattices [12, 25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy function Φ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of -1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:
We give a 1.25 approximation algorithm for the Steiner Tree Problem with distances one and two, i... more We give a 1.25 approximation algorithm for the Steiner Tree Problem with distances one and two, improving on the best known bound for that problem. We give a new approximation algorithm for the problem of finding a minimum Steiner tree for metric spaces with distances one and two. It improves over the best known approximation factor for that problem of 1.279 . Moreover, unlike the result of Robins and Zelikovsky, our methods yields a single algorithm, whereas gives an approximation scheme. A metric with distances 1 and 2 can be represented as a graph, so edges are pairs in distance 1 and non-edges are pairs in distance 2. We will denote by STP[1,2] the Steiner Tree Problem restricted to such metrics. The problem instance of STP[1,2] is a graph G = (V, E) that defines a metric in this way, and a set R ⊂ V of terminal nodes. A valid solution is a set unordered node pairs T such that R is contained in a connected component of (V, E). We minimize |T ∩ E| + 2|T -E|.
We give a 1.25 approximation algorithm for the Steiner Tree Problem with distances one and two, i... more We give a 1.25 approximation algorithm for the Steiner Tree Problem with distances one and two, improving on the best known bound for that problem. We give a new approximation algorithm for the problem of finding a minimum Steiner tree for metric spaces with distances one and two. It improves over the best known approximation factor for that problem of 1.279 . Moreover, unlike the result of Robins and Zelikovsky, our methods yields a single algorithm, whereas gives an approximation scheme. A metric with distances 1 and 2 can be represented as a graph, so edges are pairs in distance 1 and non-edges are pairs in distance 2. We will denote by STP[1,2] the Steiner Tree Problem restricted to such metrics. The problem instance of STP[1,2] is a graph G = (V, E) that defines a metric in this way, and a set R ⊂ V of terminal nodes. A valid solution is a set unordered node pairs T such that R is contained in a connected component of (V, E). We minimize |T ∩ E| + 2|T -E|.
We design a 3/2 approximation algorithm for the Generalized Steiner Tree problem (GST) in metrics... more We design a 3/2 approximation algorithm for the Generalized Steiner Tree problem (GST) in metrics with distances 1 and 2. This is the first polynomial time approximation algorithm for a wide class of non-geometric metric GST instances with approximation factor below 2.
Threats on the stability of a financial system may severely affect the functioning of the entire ... more Threats on the stability of a financial system may severely affect the functioning of the entire economy, and thus considerable emphasis is placed on the analyzing the cause and effect of such threats. The financial crisis in the current and past decade has shown that one important cause of instability in global markets is the so-called financial contagion, namely the spreadings of instabilities or failures of individual components of the network to other, perhaps healthier, components. This leads to a natural question of whether the regulatory authorities could have predicted and perhaps mitigated the current economic crisis by effective computations of some stability measure of the banking networks. Motivated by such observations, we consider the problem of defining and evaluating stabilities of both homogeneous and heterogeneous banking networks against propagation of synchronous idiosyncratic shocks given to a subset of banks. We formalize the homogeneous banking network model of Nier et al. [46] and its corresponding heterogeneous version, formalize the synchronous shock propagation procedures outlined in , define two appropriate stability measures and investigate the computational complexities of evaluating these measures for various network topologies and parameters of interest. Our results and proofs also shed some light on the properties of topologies and parameters of the network that may lead to higher or lower stabilities. * Talks based on these results were given or will be given at the
In database searches for sequence similarity, matches to a distinct sequence region (e.g. protein... more In database searches for sequence similarity, matches to a distinct sequence region (e.g. protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N' is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.
Proceedings of the forty-sixth annual ACM symposium on Theory of computing, 2014
We initiate a systematic study of sublinear algorithms for approximately testing properties of re... more We initiate a systematic study of sublinear algorithms for approximately testing properties of real-valued data with respect to Lp distances. Such algorithms distinguish datasets which either have (or are close to having) a certain property from datasets which are far from having it with respect to Lp distance. For applications involving noisy realvalued data, using Lp distances allows algorithms to withstand noise of bounded Lp norm. While the classical property testing framework developed with respect to Hamming distance has been studied extensively, testing with respect to Lp distances has received little attention. We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also distance approximation to monotonicity. In particular, for functions over the hypergrid domains [n] d , the complexity of our algorithms for all these properties does not depend on the linear dimension n. This is impossible in the standard model. Most of our algorithms require minimal assumptions on the choice of sampled data: either uniform or easily samplable random queries suffice. We also show connections between the Lp-testing model and the standard framework of property testing with respect to Hamming distance. Some of our results improve existing bounds for Hamming distance.
In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse... more In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical model 4 on 2D and 3D lattices [12, 25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy function Φ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of -1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:
We consider the following problem motivated by applications to nonoverlapping local alignment pro... more We consider the following problem motivated by applications to nonoverlapping local alignment problems in computational molecular biology: we are a given a set of n positively weighted axis parallel rectangles such that, for each axis, the projection of a rectangle on this axis does not enclose that of another, and our goal is to select a subset of independent rectangles from the given set of rectangles of total maximum weight, where two rectangles are independent provided for each axis, the projection of one rectangle does not overlap that of another. We use the two-phase technique of [3] to provide a simple approximation algorithm for this problem that runs in O(n log n) time with a worstcase performance ratio of 3. We also discuss extension and analysis of the algorithm in d dimensions.
We consider the weighted feedback vertex set problem for undirected graphs. It is shown that a ge... more We consider the weighted feedback vertex set problem for undirected graphs. It is shown that a generalized local ratio strategy leads to an ecient approximation with the performance guarantee of twice the optimal, improving the previous results for both weighted and unweighted cases. We further elaborate our approach to treat the case when graphs are of bounded degree, and show that it achieves even better performance, 2 0 2 3102 , where 1 is the maximum degree of graphs.
For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align ... more For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.
Proceedings of the twenty-ninth annual ACM symposium on Theory of computing - STOC '97, 1997
The generalized Steiner tree problem is defined as follows. Given a graph with non-negative weigh... more The generalized Steiner tree problem is defined as follows. Given a graph with non-negative weights and a set of pairs of vertices find the minimum network of edges such that each pair of vertices is in the same connected component. We present an algorithm for the on-line Generalized Steiner Tree (GST) problem, and two other problems: Rectilinear Steiner Arborescence(RSA) and Symmetric Rectilinear Steiner Arborescence (SRSA). For each of these problems we provide polynomial time algorithms with performance ratios of O(log n). The constant factors hidden in the O-notation are small, in the case of the GST, we are within factor 2 from the proven lower bound. The previous best on-line GST algorithm (Awerbuch et at ) was 0(log2 n)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009
We give a 1.25 approximation algorithm for the Steiner Tree Problem with distances one and two, i... more We give a 1.25 approximation algorithm for the Steiner Tree Problem with distances one and two, improving on the best known bound for that problem. We give a new approximation algorithm for the problem of finding a minimum Steiner tree for metric spaces with distances one and two. It improves over the best known approximation factor for that problem of 1.279 . Moreover, unlike the result of Robins and Zelikovsky, our methods yields a single algorithm, whereas gives an approximation scheme. A metric with distances 1 and 2 can be represented as a graph, so edges are pairs in distance 1 and non-edges are pairs in distance 2. We will denote by STP[1,2] the Steiner Tree Problem restricted to such metrics. The problem instance of STP[1,2] is a graph G = (V, E) that defines a metric in this way, and a set R ⊂ V of terminal nodes. A valid solution is a set unordered node pairs T such that R is contained in a connected component of (V, E). We minimize |T ∩ E| + 2|T -E|.
Uploads
Papers by Piotr Berman