Papers by Mema Roussopoulos

Digital repositories, either digital preservation systems or archival systems, periodically check... more Digital repositories, either digital preservation systems or archival systems, periodically check the integrity of stored objects to assure users of their correctness. To do so, prior solutions calculate integrity metadata and require the repository to store it alongside the actual data objects. This integrity metadata is essential for regularly verifying the correctness of the stored data objects. To safeguard and detect damage to this metadata, prior solutions rely on widely visible media, that is unaffiliated third parties, to store and provide back digests of the metadata to verify it is intact. However, they do not address recovery of the integrity metadata in case of damage or attack by an adversary. In essence, they do not preserve this metadata. We introduce IntegrityCatalog, a system that collects all integrity related metadata in a single component, and treats them as first class objects, managing both their integrity and their preservation. We introduce a treap-based persistent authenticated dictionary managing arbitrary length key/value pairs, which we use to store all integrity metadata, accessible simply by object name. Additionally, IntegrityCatalog is a distributed system that includes a network protocol that manages both corruption detection and preservation of this metadata, using administrator-selected network peers with two possible roles. Verifiers store and offer attestations on digests and have minimal storage requirements, while preservers efficiently synchronize a complete copy of the catalog to assist in recovery in case of a detected catalog compromise on the local system. We describe our prototype implementation of IntegrityCatalog, measure its performance empirically, and demonstrate its effectiveness in real-world situations, with worst measured throughput of approximately 1K insertions per second, and 2K verified search operations per second.
Alleviating the Topology Mismatch Problem in Distributed Overlay Networks: A Survey
Journal of Systems and Software, 2015
Technology-induced challenges in Privacy and Data Protection in Europe

Lecture Notes in Computer Science, 2012
Byzantine Fault Tolerant (BFT) systems are considered by the systems research community to be sta... more Byzantine Fault Tolerant (BFT) systems are considered by the systems research community to be state of the art with regards to providing reliability in distributed systems. BFT systems provide safety and liveness guarantees with reasonable assumptions, amongst a set of nodes where at most f nodes display arbitrarily incorrect behaviors, known as Byzantine faults. Despite this, BFT systems are still rarely used in practice. In this paper we describe our experience, from an application developer's perspective, trying to leverage the publicly available and highly-tuned "PBFT" middleware (by Castro and Liskov), to provide provable reliability guarantees for an electronic voting application with high security and robustness needs. The PBFT middleware has been the focus of most BFT research efforts over the past twelve years; all direct descendent systems depend on its initial code base.
The LOCKSS system is a tool librarians can use to preserve long-term access to content published ... more The LOCKSS system is a tool librarians can use to preserve long-term access to content published on the web. It has three main functions. It collects the content by crawling the publisher's web sites, it distributes the content by acting as a proxy for reader's browsers, and it preserves the content through a cooperative process of damage detection and repair. The system uses the hard disk holding the copy used for access as a preservation medium; the cooperative damage detection and repair mechanism eliminates the need for off-line backups on removable media. We describe the LOCKSS system as an example of the techniques needed to use hard disks as a medium for longterm preservation.
The problem of digital preservation is widely acknowledged, but the underlying assumptions implic... more The problem of digital preservation is widely acknowledged, but the underlying assumptions implicit to the design of systems that address this problem have not been analyzed explicitly. We identify two basic approaches to address the problem of digital preservation using peer-to-peer systems: conservation and consensus. We highlight the design tradeoffs involved in using the two general approaches, and we provide a framework for analyzing the characteristics of peer-to-peer preservation systems in general. In addition, we propose a novel conservation-based protocol for achieving preservation and we analyze its effectiveness with respect to our framework.
Eprint Arxiv Cs 0411078, 2004
The design of the defenses Internet systems can deploy against attack, especially adaptive and re... more The design of the defenses Internet systems can deploy against attack, especially adaptive and resilient defenses, must start from a realistic model of the threat. This requires an assessment of the capabilities of the adversary. The design typically evolves through a process of simulating both the system and the adversary. This requires the design and implementation of a simulated adversary based on the capability assessment. Consensus on the capabilities of a suitable adversary is not evident. Part of the recent redesign of the protocol used by peers in the LOCKSS digital preservation system included a conservative assessment of the adversary's capabilities. We present our assessment and the implications we drew from it as a step towards a reusable adversary specification.
The LOCKSS project has developed and deployed in a world-wide test a peer-to-peer system for pres... more The LOCKSS project has developed and deployed in a world-wide test a peer-to-peer system for preserving access to journals and other archival information published on the Web. It consists of a large number of independent, low-cost, persistent Web caches that cooperate to detect and repair damage to their content by voting in "opinion polls." Based on this experience, we present a design for and simulations of a novel protocol for voting in systems of this kind. It incorporates rate limitation and intrusion detection to ensure that even some very powerful adversaries attacking over many years have only a small probability of causing irrecoverable damage before being detected.
The data cube is an aggregate operator which has been shown to be very powerful for On Line Analy... more The data cube is an aggregate operator which has been shown to be very powerful for On Line Analytical Processing (OLAP) in the context of data warehousing. It is, however, very expensive to compute, access, and maintain. In this paper we de ne the \cubetree" as a storage abstraction of the cube and realize it using packed R-trees for most ecient cube queries. We then reduce the problem of creation and maintenance of the cube to sorting and bulk incremental merge-packing of cubetrees. This merge-pack has been implemented to use separate storage for writing the updated cubetrees, therefore allowing cube queries to continue even during maintenance. Finally, we characterize the size of the delta increment for achieving good bulk update schedules for the cube. The paper includes experiments with various data sets measuring query and bulk update performance.

We are facing a growing user demand for ubiquitous Internet access. As a result, network ports an... more We are facing a growing user demand for ubiquitous Internet access. As a result, network ports and wireless LANs are becoming common in public spaces inside buildings such as lounges, conference rooms and lecture halls. This introduces the problem of protecting networks accessible through these public ports from unauthorized use. In this paper, we study the problem of access control through public network ports. We view this problem as a special case of the more general problem of access control for a service on a network. We present an access control model on which we base our solution. This model has three components: authentication, authorization, and access verification. We describe the design and implementation of a system that allows secure network access through public network ports and wireless LANs. Our design requires no special hardware or custom client software, resulting in minimal deployment cost and maintenance overhead. Our system has a user-friendly, web-based interface, offers good security, and scales to a campus-sized community.
Computing Research Repository, 2002
Recently the problem of indexing and locating content in peer-to-peer networks has received much ... more Recently the problem of indexing and locating content in peer-to-peer networks has received much attention. Previous work suggests caching index entries at intermediate nodes that lie on the paths taken by search queries, but until now there has been little focus on how to maintain these intermediate caches. This paper proposes CUP, a new comprehensive architecture for Controlled Update Propagation
The emergence of sensor networks and distributed applications that gen- erate data streams has cr... more The emergence of sensor networks and distributed applications that gen- erate data streams has created a need for Internet overlays designed for streaming data. Such stream-based overlay network (SBONs) consist of a set of Internet hosts that collect, process, and deliver stream-based data to multiple applications. A key challenge in the design and im- plementation of SBONs is efficient path
Cobra: Content-based Filtering and Aggregation of Blogs and RSS Feeds
Networked Systems Design and Implementation, 2007
Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 ... more Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 million user accounts, and according to one report, over 1.6 million postings are made to blogs every day. The "Blogosphere" is a new hotbed of Internet-based media that represents a shift from mostly static content to dynamic, continuously-updated discussions. The problem is that finding and
The emergence of computationally-enabled sensors and the applications that use sensor data introd... more The emergence of computationally-enabled sensors and the applications that use sensor data introduces the need for a software infrastructure designed specifically to enable the rapid development and deployment of applications that draw upon data from multiple, heterogeneous sensor networks. We present the Hourglass infrastructure, which addresses this need.
Predicting Adversary In ltration in the LOCKSS System
P2P or Not 2 P2P?
In the hope of stimulating discussion, we present a heuristic decision tree that designers can us... more In the hope of stimulating discussion, we present a heuristic decision tree that designers can use to judge how suitable a P2P solution might be for a particular problem. It is based on characteristics of a wide range of P2P systems from the literature, both proposed and deployed. These include budget, resource relevance, trust, rate of system change, and criticality.
Enabling ubiquitous reachability. Most people will con

IEEE INFOCOM 2008 - The 27th Conference on Computer Communications, 2008
In an n-way broadcast application each one of n overlay nodes wants to push its own distinct larg... more In an n-way broadcast application each one of n overlay nodes wants to push its own distinct large data file to all other n-1 destinations as well as download their respective data files. BitTorrent-like swarming protocols are ideal choices for handling such massive data volume transfers. The original BitTorrent targets one-to-many broadcasts of a single file to a very large number of receivers and thus, by necessity, employs an almost random overlay topology. n-way broadcast applications on the other hand, owing to their inherent n-squared nature, are realizable only in small to medium scale networks. In this paper, we show that we can leverage this scale constraint to construct optimized overlay topologies that take into consideration the end-to-end characteristics of the network and as a consequence deliver far superior performance compared to random and myopic (local) approaches. We present the Max-Min and Max-Sum peer-selection policies used by individual nodes to select their neighbors. The first one strives to maximize the available bandwidth to the slowest destination, while the second maximizes the aggregate output rate. We design a swarming protocol suitable for n-way broadcast and operate it on top of overlay graphs formed by nodes that employ Max-Min or Max-Sum policies. Using trace-driven simulation and measurements from a PlanetLab prototype implementation, we demonstrate that the performance of swarming on top of our constructed topologies is far superior to the performance of random and myopic overlays. Moreover, we show how to modify our swarming protocol to allow it to accommodate selfish nodes.
Tossing NoSQL-Databases Out to Public Clouds
2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, 2014
The Mobile People Architecture
Mobile Computing and Communications Review, 1996
People are the outsiders in the current communications revolution. Computer hosts, pager terminal... more People are the outsiders in the current communications revolution. Computer hosts, pager terminals, and telephones are addressable entities throughout the Internet and telephony systems. Human beings, however, still need application-specic tricks to be identied, like email addresses, telephone numbers, and ICQ IDs. The key challenge today is to nd people and communicate with them personally, as opposed to communicating merely
Uploads
Papers by Mema Roussopoulos