Skip to main content

Peter Van Roy

UCLouvain (University of Louvain), ICTEAM, Faculty Member

Followers

40

Following

8

Co-authors

7

Public Views

Uploads

Papers by Peter Van Roy

Mobile objects in distributed Oz

ACM Transactions on Programming Languages and Systems, Sep 1, 1997

Some of the most difficult questions to answer when designing a distributed application are relat... more Some of the most difficult questions to answer when designing a distributed application are related to mobility: what information to transfer between sites and when and how to transfer it. Networktransparent distribution, the property that a program's behavior is independent of how it is partitioned among sites, does not directly address these questions. Therefore we propose to extend all language entities with a network behavior that enables efficient distributed programming by giving the programmer a simple and predictable control over network communication patterns. In particular, we show how to give objects an arbitrary mobility behavior that is independent of the object's definition. In this way, the syntax and semantics of objects are the same regardless of whether they are used as stationary servers, mobile agents, or simply as caches. These ideas have been implemented in Distributed Oz, a concurrent object-oriented language that is state aware and has dataflow synchronization. We prove that the implementation of objects in Distributed Oz is network transparent. To satisfy the predictability condition, the implementation avoids forwarding chains through intermediate sites. The implementation is an extension to the publicly available DFKI Oz 2.0 system.

Proceedings of the 2nd Workshop on Logic and Practice of Programming (LPOP)

arXiv (Cornell University), Nov 17, 2022

Three-Step Transformation of a Traditional University Course into a MOOC: a LouvainX Experience

This paper presents a practical approach to transform a traditional mature university course into... more This paper presents a practical approach to transform a traditional mature university course into a MOOC. The approach has been applied to LFSAB1402 Informatics 2, a secondyear bachelor university level course about programming paradigms of 5 credits (ECTS), taught at Université catholique de Louvain (UCL) to about 300 students in engineering. The transformation was done in three steps spread over two years. A SPOC limited to our students was first created, covering part of the material of the traditional course. It was then opened worldwide as a MOOC. Finally, two MOOCs followed at the same time by our students and worldwide learners and covering all the material of the traditional course have been created. In addition to our 300 students, we had about 7000 (resp. 4000) external students for the first (resp. second) MOOC. About 90% of on-site students and about 4% of registered external students got a certificate at the end of the course. This gradual transformation of the traditio...

How the Internet amplifies threats to society and what to do about it

Today's Internet technologies greatly exacerbate threats to the healthy functioning of a demo... more Today's Internet technologies greatly exacerbate threats to the healthy functioning of a democratic society. They enormously amplify the problems of alternative facts, i.e., false information, and echo chambers, i.e., where one only hears what one agrees with. The ease and rapidity with which any individual can disseminate their opinions and connect with others have never been greater, with the enormous success of social networks such as Facebook, micromessaging tools such as Twitter, and blogging software that allows anyone to publish anything. In the 1970s, the Internet was conceived by a community of idealists, who believed that an open and trusting environment would be to the benefit of everyone. Now we realize that this ideal was not realistic; that today many groups use the Internet as a terrain on which to battle ruthlessly to advance their own goals.

Algebraic Reasoning About Timeliness

Electronic Proceedings in Theoretical Computer Science

Designing distributed systems to have predictable performance under high load is difficult becaus... more Designing distributed systems to have predictable performance under high load is difficult because of resource exhaustion, non-linearity, and stochastic behaviour. Timeliness, i.e., delivering results within defined time bounds, is a central aspect of predictable performance. In this paper, we focus on timeliness using the ∆Q Systems Development paradigm (∆QSD, developed by PNSol), which computes timeliness by modelling systems observationally using so-called outcome expressions. An outcome expression is a compositional definition of a system's observed behaviour in terms of its basic operations. Given the behaviour of the basic operations, ∆QSD efficiently computes the stochastic behaviour of the whole system including its timeliness. This paper formally proves useful algebraic properties of outcome expressions w.r.t. timeliness. We prove the different algebraic structures the set of outcome expressions form with the different ∆QSD operators and demonstrate why those operators do not form richer structures. We prove or disprove the set of all possible distributivity results on outcome expressions. On our way for disproving 8 of those distributivity results, we develop a technique called properisation, which gives rise to the first body of maths for improper random variables. Finally, we also prove 14 equivalences that have been used in the past in the practice of ∆QSD. An immediate benefit is rewrite rules that can be used for design exploration under established timeliness equivalence. This work is part of an ongoing project to disseminate and build tool support for ∆QSD. The ability to rewrite outcome expressions is essential for efficient tool support.

Self Management for Large-Scale Distributed Systems: An Overview of the SELFMAN Project

Lecture Notes in Computer Science, 2008

As Internet applications become larger and more complex, the task of managing them becomes overwh... more As Internet applications become larger and more complex, the task of managing them becomes overwhelming. "Abnormal" events such as software updates, failures, attacks, and hotspots become frequent. The SELFMAN project is tackling this problem by combining two technologies, namely structured overlay networks and advanced component models, to make the system self managing. Structured overlay networks (SONs) developed out of peer-to-peer systems and provide robustness, scalability, communication guarantees, and efficiency. Component models provide the framework to extend the self-managing properties of SONs over the whole system. SELFMAN is building a self-managing transactional storage and using it for two application demonstrators: a distributed Wiki and an on-demand media streaming service. This paper provides an introduction and motivation for the ideas underlying SELF-MAN and a snapshot of its contributions midway through the project. We explain our methodology for building self-managing systems as networks of interacting feedback loops. We then summarize the work we have done to make SONs a practical basis for our architecture: using an advanced component model, handling network partitions, handling failure suspicions, and doing range queries with load balancing. Finally, we show the design of a self-managing transactional storage on a SON.

Logic programming in the context of multiparadigm programming: the Oz experience

Theory and Practice of Logic Programming, Nov 1, 2003

Oz is a multiparadigm language that supports logic programming as one of its major paradigms. A m... more Oz is a multiparadigm language that supports logic programming as one of its major paradigms. A multiparadigm language is designed to support different programming paradigms (logic, functional, constraint, object-oriented, sequential, concurrent, etc.) with equal ease. This article has two goals: to give a tutorial of logic programming in Oz and to show how logic programming fits naturally into the wider context of multiparadigm programming. Our experience shows that there are two classes of problems, which we call algorithmic and search problems, for which logic programming can help formulate practical solutions. Algorithmic problems have known efficient algorithms. Search problems do not have known efficient algorithms but can be solved with search. The Oz support for logic programming targets these two problem classes specifically, using the concepts needed for each. This is in contrast to the Prolog approach, which targets both classes with one set of concepts, which results in less than optimal support for each class. We give examples that can be run interactively on the Mozart system, which implements Oz. To explain the essential difference between algorithmic and search programs, we define the Oz execution model. This model subsumes both concurrent logic programming (committed-choice-style) and search-based logic programming (Prolog-style). Furthermore, as consequences of its multiparadigm nature, the model supports new abilities such as first-class top levels, deep * This article is a much-extended version of the tutorial talk "Logic Programming in Oz with Mozart" given at the International Conference on Logic Programming, Las Cruces, New Mexico, Nov. 1999. Some knowledge of traditional logic programming (with Prolog or concurrent logic languages) is assumed. P. Van Roy et al. guards, active objects, and sophisticated control of the search process. Instead of Horn clause syntax, Oz has a simple, fully compositional, higher-order syntax that accommodates the abilities of the language. We give a brief history of Oz that traces the development of its main ideas and we summarize the lessons learned from this work. Finally, we give many entry points into the Oz literature.

Self Management of Large-Scale Distributed Systems by Combining Peer-to-Peer Networks and Components

Self Management of Large-Scale DistributedSystems by Combining Structured OverlayNetworks and Components

Enhancing throughput of partially replicated state machines via multi-partition operation scheduling

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA)

State-machine replication (SMR) is a fundamental technique to implement fault-tolerant services. ... more State-machine replication (SMR) is a fundamental technique to implement fault-tolerant services. Recently, various works have aimed at enhancing the scalability of SMR by exploiting partial replication techniques. By sharding the state machine across disjoint partitions, and replicating each partition over independent groups of processes, a Partially Replicated State Machine (PRSM) can process operations that involve a single partition by only requiring synchronization among the replicas of that partition-achieving higher scalability than SMR. Unfortunately, though, existing PRSM rely on inefficient mechanisms to coordinate the execution of multi-partition operations, which either impose global coordination across all nodes in the system or require inter-partition synchronization on the critical path of execution of operations. As such, performance and scalability of existing PRSM systems is severely hindered in the presence of even a small fraction of multi-partition operations. This paper tackles this issue by presenting Genepi, a PRSM protocol that introduces a novel, highly efficient mechanism for regulating the execution of multi-partition operations. We show via an experimental evaluation based on both synthetic benchmarks and TPC-C that Genepi can achieve up to 5.5× of throughput gain over existing PRSM systems, with only negligible latency overhead at low load.

Exploiting speculation in partially replicated transactional data stores

Proceedings of the 2017 Symposium on Cloud Computing

Information systems → Distributed database transactions; Storage replication; Online services are... more Information systems → Distributed database transactions; Storage replication; Online services are often deployed over geographically-scattered data centers (geo-replication), which allows services to be highly available and reduces access latency. On the down side, to provide ACID transactions, global certification (i.e., across data centers) is needed to detect conflicts between concurrent transactions executing at different data centers. The global certification phase reduces throughput because transactions need to hold pre-commit locks, and it increases client-perceived latency because global certification lies in the critical path of transaction execution.

Transparent speculation in geo-replicated transactional data stores

Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

This work presents Speculative Transaction Replication (STR), a protocol that exploits transparen... more This work presents Speculative Transaction Replication (STR), a protocol that exploits transparent speculation techniques to enhance performance of geo-distributed, partially replicated transactional data stores. In addition, we define a new consistency model, Speculative Snapshot Isolation (SPSI), that extends the semantics of Snapshot Isolation (SI) to shelter applications from the subtle anomalies that can arise from using speculative transaction processing. SPSI extends SI in an intuitive and rigorous fashion by specifying desirable atomicity and isolation guarantees that must hold when using speculative execution. STR provides a form of speculation that is fully transparent for programmers (it does not expose the effects of misspeculations to clients). Since the speculation techniques employed by STR satisfy SPSI, they can be leveraged by application programs in a transparent way, without requiring any source-code modification to applications designed to operate using SI. STR combines two key techniques: speculative reads, which allow transactions to observe pre-committed versions, which can reduce the 'effective duration' of pre-commit locks and enhance throughput; Precise Clocks, a novel timestamping mechanism that uses per-item timestamps with physical clocks, which together greatly enhance the probability of successful speculation. We assess STR's performance on up to nine geo-distributed Amazon EC2 data centers, using both synthetic benchmarks as well as realistic benchmarks (TPC-C and RUBiS). Our evaluation shows that STR achieves throughput gains up to 11× and latency reduction up to 10×, in workloads characterized by low inter-data center contention. Furthermore, thanks to a self-tuning mechanism that dynamically and transparently enables and disables speculation, STR offers robust performance even when faced with unfavourable workloads that suffer from high misspeculation rates.

Ditching the Data Center: How to Stop Worrying and Love the Edge

Speculative Transaction Processing in Geo-Replicated Data Stores

This work presents STR, a geo-distributed, partially replicated transactional data store, which l... more This work presents STR, a geo-distributed, partially replicated transactional data store, which leverages on novel speculative techniques to mask the inter-replica synchronization latency. The theoretical foundations on top of which we built STR is a novel consistency criterion, which we call SPeculative Snapshot Isolation (SPSI). SPSI extends the well-known Snapshot Isolation semantics in an intuitive, yet rigorous way, by specifying desirable atomicity and isolation guarantees that shelter applications from subtle anomalies that can arise when adopting speculative transaction processing techniques. We assess STR’s performance on up to nine geo-distributed Amazon EC2 data centers, using both synthetic benchmarks as well as complex benchmarks (TPC-C and RUBiS). Our experimental study highlights that STR achieves throughput gains of up to 6⇥ and latency reduction up to 100⇥, in workloads characterized by low inter-data center contention. Furthermore, thanks to self-tuning techniques ...

An Operational Semantics for Multicasting Systems with Monotonic Values

We present an operational semantics as a simplified model for edge computing: Nodes can leave and... more We present an operational semantics as a simplified model for edge computing: Nodes can leave and join any time; each node performs computations independently, but, can also wait for values to arrive from peers; nodes can multicast to one another; and, most importantly, values can only grow monotonically. We prove an eventual consistency result for our operational semantics. Document type : Communication à un colloque (Conference Paper) Référence bibliographique Haeri, Seyed Hossein ; Van Roy, Peter. An Operational Semantics for Multicasting Systems with Monotonic Values.28th Nordic Workshop on Programming Theory (Rold StorKro (North Jutland), Denmark, du 31/10/2017 au 02/11/2016). In: Informal Proceedings of the 28th Nordic Workshop on Programming Theory, 2016 An Operational Semantics for Multicasting Systems with Monotonic Values ∗ Seyed H. Haeri (Hossein) and Peter Van Roy Université catholique de Louvain, Belgium {hossein.haeri,peter.vanroy}@uclouvain.be

Designing Distributed Applications Using a Phase-Aware, Reversible System

2017 IEEE International Conference on Edge Computing (EDGE), 2017

Distributed applications will break down or perform poorly when there are too many failures (of n... more Distributed applications will break down or perform poorly when there are too many failures (of nodes and/or communication) in the operating environment. Failures happen frequently on edge networks including mobile and ad hoc networks, but are also unexpectedly common on the general Internet. We propose an approach for designing stressaware distributed applications that can take environment stress into account to improve their behavior. We give a concrete illustration of the approach by targeting applications built on top of a Structured Overlay Network (SON). Our underlying SON is Reversible and Phase-Aware. A system is Reversible if the set of operations it provides is a function (called the reversibility function) of its current stress (i.e., all perturbing effects of the environment, including faults), and does not depend on past stress. Reversibility generalizes standard fault tolerance with nested fault models. When the fault rate goes outside the scope of one model, then it is still inside the next one. In order to approximate the reversibility function we introduce the concept of Phase, which is a per-node property that gives a qualitative measure of the available system operations under the current stress. Phase can be determined with no extra distributed operations. We show that making the phase available to applications allows them to improve their behavior in environments with high and variable stress. We propose a Phase API and we design an application, a collaborative graphic editor, that takes advantage of phase to enhance self-adaptation and selfoptimization properties. Furthermore, we analyze how the application itself can achieve reversibility in the applicationlevel semantics. Using the phase of the underlying node, the application provides an indication to the user regarding its behavior. Thus, the application has improved behavior with respect to the user, i.e., the user can better understand and decide what to do in a high-stress environment.

Practical evaluation of the Lasp programming model at large scale

Proceedings of the 19th International Symposium on Principles and Practice of Declarative Programming, 2017

Programming models for building large-scale distributed applications assist the developer in reas... more Programming models for building large-scale distributed applications assist the developer in reasoning about consistency and distribution. However, many of the programming models for weak consistency, which promise the largest scalability gains, have little in the way of evaluation to demonstrate the promised scalability. We present an experience report on the implementation and largescale evaluation of one of these models, Lasp, originally presented at PPDP '15, which provides a declarative, functional programming style for distributed applications. We demonstrate the scalability of Lasp's prototype runtime implementation up to 1024 nodes in the Amazon cloud computing environment. It achieves high scalability by uniquely combining hybrid gossip with a programming model based on convergent computation. We report on the engineering challenges of this implementation and its evaluation, speci cally related to operating research prototypes in a production cloud environment.

A history of the Oz multiparadigm language

Proceedings of the ACM on Programming Languages, 2020

Oz is a programming language designed to support multiple programming paradigms in a clean factor... more Oz is a programming language designed to support multiple programming paradigms in a clean factored way that is easy to program despite its broad coverage. It started in 1991 as a collaborative effort by the DFKI (Germany) and SICS (Sweden) and led to an influential system, Mozart, that was released in 1999 and widely used in the 2000s for practical applications and education. We give the history of Oz as it developed from its origins in logic programming, starting with Prolog, followed by concurrent logic programming and constraint logic programming, and leading to its two direct precursors, the concurrent constraint model and the Andorra Kernel Language (AKL). We give the lessons learned from the Oz effort including successes and failures and we explain the principles underlying the Oz design. Oz is defined through a kernel language, which is a formal model similar to a foundational calculus, but that is designed to be directly useful to the programmer. The kernel language is orga...

SD-CPS: software-defined cyber-physical systems. Taming the challenges of CPS with workflows at the edge

Cluster Computing, 2018

A cyber-physical system (CPS) is a smart mechanical environment, developed by an amalgamation of ... more A cyber-physical system (CPS) is a smart mechanical environment, developed by an amalgamation of computation, networking, and physical dimensions. Each CPS consists of a network of devices, often limited in computing, storage, or bandwidth resources. Moreover, the frequent small-scale communications between the various counterparts of CPS require data and computation of CPS to be deployed close to each other, with the ability to support micro-executions. Due to these operational requirements, CPS faces several inherent challenges, uncommon to a traditional computational environment. In this paper, we describe software-defined cyber-physical systems (SD-CPS), a CPS framework built by extending and adapting the design principles of software-defined networking (SDN) into CPS. We realize the support for CPS operation as a workflow of microservices, possibly in continuous or cyclic execution. SD-CPS coordinates each CPS execution step, performed by a microservice, through an extended SDN controller architecture. By creating, placing, deploying, migrating, and managing the computation processes of CPS as service workflows at the edge, SD-CPS orchestrates the entire lifecycle of the CPS effectively and efficiently. SD-CPS thus addresses the general challenges of CPS, concerning modeling, development, performance, management, communication and coordination, scalability, and fault-tolerance, through its software-defined approach. Our evaluations highlight the efficiency of the SD-CPS framework and the scalability of its SDN controller to manage the complex CPS environments. Keywords Cyber-physical system (CPS) Á Software-defined networking (SDN) Á Message-oriented middleware (MOM) Á Software-defined systems (SDS)

Loquat: A framework for large-scale actor communication on edge networks

2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), 2017

We provide a lightweight decentralized publishsubscribe framework for supporting large-scale acto... more We provide a lightweight decentralized publishsubscribe framework for supporting large-scale actor communication on edge networks. Our framework, called Loquat, does not depend on any reliable central nodes (e.g., data centers), provides reliability in the face of massive node failures and network partitioning, and provides scalability as the number of nodes increases. We consider that high reliability, i.e., that send operations reach close to 100% of live destination nodes, is a critical property for communication frameworks on edge networks. But reliability is difficult to achieve in a scalable way on edge networks because of the network's dynamicity, i.e., frequent node failures and partitioning. For example, both Internet of Things networks and mobile phone networks consist of devices that are often offline. To achieve reliability, our framework is based on two hybrid gossip algorithms, namely HyParView and Plumtree. Hybrid gossip algorithms combine gossip with other distributed algorithms to achieve both efficiency and high resilience. Our current implementation is written in Erlang and has demonstrated scalability up to 1024 nodes in Amazon's cloud computing environment.