Papers by Matthias Wiesmann
Message from the Symposium Chair
... The PC meeting relied on the assistance of several Ph.D. students, including Vamsi Boppana, N... more ... The PC meeting relied on the assistance of several Ph.D. students, including Vamsi Boppana, Nuno Neves, Kwo-Feng Ssu, and Ching-Han Tsai. ... JB Dugan S. Dutt K. Echtle EN Elnozahy P. Ezhilchelvan J.-C. Fabre B. Fleisch G. Franks R. Friedman WK Fuchs ...
Database replication protocols based on group communication primitives have recently emerged as a... more Database replication protocols based on group communication primitives have recently emerged as a promising technology to improve database faulttolerance and performance. Roughly speaking, this approach consists in exploiting the order and atomicity properties provided by group communication primitives or, more specifically Atomic Broadcast, to guarantee transaction properties. This paper proposes a systematic classification of non voting database replication algorithms based on Atomic Broadcast.

Group communication provides communication primitives with various semantics and their use greatl... more Group communication provides communication primitives with various semantics and their use greatly simplifies the development of highly available services. However, despite tremendous advances in research and numerous prototypes, group communication stays confined to small niches and academic prototypes. In contrast, message-oriented middleware such as the Java message service (JMS) is widely used, and has become a de-facto standard. We believe that the lack of a well-defined and easily understandable standard is the reason that hinders the deployment of group communication systems. Since JMS is a well-established technology, an interesting solution is to extend JMS adding group communication primitives to it. Foremost, this requires extending the traditional semantics of group communication in order to take into account various features of JMS, e.g., durable/nondurable subscriptions and persistent/non-persistent messages. The resulting new group communication specification, together with the corresponding API, defines group communication primitives compatible with JMS.
This report describes the design of a Replication Framework that facilitates the implementation a... more This report describes the design of a Replication Framework that facilitates the implementation and comparison of database replication techniques. Furthermore, it discusses the implementation of a Database Replication Prototype and compares the performance measurements of two replication techniques based on the Atomic Broadcast communication primitive: pessimistic active replication and optimistic active replication.

IEEE Transactions on Knowledge and Data Engineering, 2005
In this paper, we present a performance comparison of database replication techniques based on to... more In this paper, we present a performance comparison of database replication techniques based on total order broadcast. While the performance of total order broadcast-based replication techniques has been studied in previous papers, this paper presents many new contributions. First, it compares with each other techniques that were presented and evaluated separately, usually by comparing them to a classical replication scheme like distributed locking. Second, the evaluation is done using a finer network model than previous studies. Third, the paper compares techniques that offer the same consistency criterion (one-copy serializability) in the same environment using the same settings. The paper shows that, while networking performance has little influence in a LAN setting, the cost of synchronizing replicas is quite high. Because of this, total order broadcast-based techniques are very promising as they minimize synchronization between replicas.

IEEE Transactions on Knowledge and Data Engineering, 2003
Atomic broadcast primitives are often proposed as a mechanism to allow fault-tolerant cooperation... more Atomic broadcast primitives are often proposed as a mechanism to allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. Recently, a new approach has been proposed for atomic broadcast which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery to the application. In this paper, we develop this idea further and show how applications can take even more advantage of the optimistic assumption by overlapping the coordination phase of the atomic broadcast algorithm with the processing of delivered messages. In particular, we present a replicated database architecture that employs the new atomic broadcast primitive in such a way that communication and transaction processing are fully overlapped, providing high performance without relaxing transaction correctness. * A preliminary version of this paper appeared in . In this paper we provide a more comprehensive protocol and study the performance through simulation.
for some time, they are still not used much in actual systems. We believe that one reason for thi... more for some time, they are still not used much in actual systems. We believe that one reason for this is the lack of standardisation of group communication system interfaces. The paper proposes an architecture, using the standard decomposition into services, were services are based on standard interfaces: both interactions between services and interactions with the application use existing, open standards. A decomposition of the group communication into services is presented, along with a description of applicable standards. As an example, a group membership service based on the LDAP standard is discussed.
The paper presents a fail-safe mobility management and a collision prevention platform for a grou... more The paper presents a fail-safe mobility management and a collision prevention platform for a group of asynchronous cooperative mobile robots. The fail-safe platform consists of a time-free collision prevention protocol, which guarantees that no collision can occur between robots, independently of timeliness properties of the system, and even in the presence of timing errors in the environment. The collision prevention protocol is based on a distributed path reservation system. Each robot in the system knows the composition of the group, and can communicate with all robots of the group. A performance analysis of the protocol provides insights for a proper dimensioning of system parameters in order to maximize the average effective speed of the robots.
Journal of Networks, 2007
This paper presents a fail-safe platform on which cooperative mobile robots rely for their motion... more This paper presents a fail-safe platform on which cooperative mobile robots rely for their motion. The platform consists of a collision prevention protocol for a dynamic group of cooperative mobile robots with asynchronous communications. The collision prevention protocol is timefree, in the sense that it never relies on physical time, which makes it extremely robust for timing uncertainty common in wireless networks. It guarantees that no two robots ever collide, regardless of the respective activities of the robots. The protocol is based on a fully distributed path reservation system.
This paper presents a distributed path reservation system for a group of "blind" mobile robots. T... more This paper presents a distributed path reservation system for a group of "blind" mobile robots. The protocol assumes a mobile ad hoc network formed by the robots themselves, and takes advantage of the inherent locality of the problem in order to reduce communication. In contrast with other work, our protocol requires neither initial nor complete knowledge of the composition of the group. The protocol makes only very weak timing assumptions regarding both communication and movement, and relies instead on a well-defined neighborhood discovery primitive.
In this paper, we present an anonymous, stable, communication efficient, stabilizing leader elect... more In this paper, we present an anonymous, stable, communication efficient, stabilizing leader election algorithm that works using anonymous communication primitives. The algorithm offers properties similar to that of the Ω failure detector, with the added property of totally ordering the sequence of proposed leaders. The algorithm does not need to know beforehand the identity or the number of processes in the system, and operates using a constant amount of memory. We present the algorithm, discuss performance issues and optimizations and present experimental results of a prototype implementation.
End-to-end consensus ensures delivery of the same value to the application layer running in distr... more End-to-end consensus ensures delivery of the same value to the application layer running in distributed processes. Deliveries that have not been acknowledged by the application before a failure are delivered again. End-to-end primitives are important for applications that need to enforce persistency. We present an algorithm that solves the end-to-end consensus problem. Our approach is to build end-to-end consensus using a new type of communication channels, endto-end channels.

In this paper, we present the SNMP-FD service, a novel failure detection service entirely based o... more In this paper, we present the SNMP-FD service, a novel failure detection service entirely based on the Simple Network Management Protocol (SNMP). This approach promises better interoperability with external tools and failure information sources, including network equipment and cluster management tools. We first show how the SNMP standard can be used to build a failure detection service. We describe the already standardized interfaces that can be reused and introduce the interfaces that need to be added. SNMP is used extensively in the service: for messaging, process status description, configuration, services statistics and delivering failure detection information to applications. We then present our implementation and an evaluation of performance and quality of service. management usage. Application level failure detection is typically handled using fixed timeouts, if it is handled at all.
Uploads
Papers by Matthias Wiesmann