Papers by Kees Van Reeuwijk
Performance of communication patterns on distributed memory systems
EUROSIM, 1996
Abstract—We describe Ruler, a flexible language for network traffic inspection and,rewriting. Rul... more Abstract—We describe Ruler, a flexible language for network traffic inspection and,rewriting. Ruler was,designed,to support anonymisation,at high link rates. As anonymisation,requires,a trade-off between privacy and usefulness of the anonymised data, flexibility is essential. For this purpose, Ruler allows matching of arbitrary traffic patterns by means of regular expressions, and,construction,of arbitrary output packets from,fragments,of the input packets. However, we show that Ruler
Parallel Computing, 2003
We describe a set of language extensions to Java to support parallel programming with distributio... more We describe a set of language extensions to Java to support parallel programming with distribution annotations. The system provides an integrated system of placement annotations on both code and data. This allows the programmer to freely mix data-parallel programming similar to HPF, and task- parallel programming similar to OpenMP. To evaluate the effectiveness of our parallel programming model, we did
MXIbis: Ibis-based Communication over Myrinet Express
The Ibis project aims at creating an efficient Java- based software platform for distributed comp... more The Ibis project aims at creating an efficient Java- based software platform for distributed computing. Within this project, the Ibis Portability Layer offers a platform-independent communication interface for distributed applications. Currently, Java (and thus: Ibis) only supports communication using the widely available TCP and UDP network protocols. As a con- sequence, Ibis-based communication over specialized high performance networks is sub-optimal.
Code generation techniques for the task-parallel programming language Spar
In this paper we describe a compilation scheme to translate implicitly parallel programs in the p... more In this paper we describe a compilation scheme to translate implicitly parallel programs in the program- ming languageSpar (an extension to Java) to efficient code for distributed-memory parallel computer sys- tems. The compilation scheme is formulated as a set of transformation rules. In Spar, the language constructs for parallelization have been designed for comfortable use by the pro- grammer, not
CardGuard is a signature detection system for intru- sion prevention that scans the entire payloa... more CardGuard is a signature detection system for intru- sion prevention that scans the entire payload of pack- ets for suspicious patterns and is implemented in soft- ware on a network card. The hardware that is used on the card consists of an Intel IXP and various mem- ories. One card can be used to protect either a sin- gle host,
In parallel programming, the nature of the distribution of the data over the processors, and the ... more In parallel programming, the nature of the distribution of the data over the processors, and the assignment of work to the processors in the system, strongly influence the performance of the program.
Modern Compiler Design (2nd edition)
The second, highly reorganised, edition of a popular textbook that explains the basics of compile... more The second, highly reorganised, edition of a popular textbook that explains the basics of compiler construction. Covers parsing, interpretation, code generation, optimisation, linking, run-time systems, and the compilation of various programming paradigms, such as imperative, object-oriented, functional, and logical.
Implementing HPF distributed arrays on a message-passing parallel computer
Lecture Notes in Computer Science, 2001
Fortran is still a very dominant language for scientific computations. However it lacks modern la... more Fortran is still a very dominant language for scientific computations. However it lacks modern language features like strong typing, object orientation, and other design features of modern programming languages. Therefore, among scientists there is an increasing interest in object oriented languages like Java. In this paper, we will discuss a number of prospects and problems in Java for scientific computation.
Parallel and Distributed Programs
Modern Compiler Design, 2012
Code Generation
Modern Compiler Design, 2012
ENSEMBLE
Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems - OM '01, 2001
ENSEMBLE
ACM SIGPLAN Notices, 2001
The ENSEMBLE communication library exploits over- lapping of message aggregation (computation) an... more The ENSEMBLE communication library exploits over- lapping of message aggregation (computation) and DMA transfers (communication) for embedded multi-processor systems. In contrast to traditional communication libraries, ENSEMBLE operates on -dimensional data descriptors that can be used to specify often-occurring data access patterns in -dimensional arrays. This allowsENSEMBLE to setup a three-stage pack-transfer-unpack pipeline, effec- tively overlapping message aggregation and DMA trans-
Proceedings 11th International Parallel Processing Symposium, 1997
In this paper we present a generalized forall statement for parallel languages. The forall statem... more In this paper we present a generalized forall statement for parallel languages. The forall statement occurs in many (data) parallel languages and specifies which computations can be performed independently. Many different definitions of such a construct can be found in literature, with different conditions and execution models. We will show how forall constructs of a wide class of parallel languages can be mapped to this generalized forall statement. In addition, the forall statement we propose has the ability to spawn more complex independent activities than can be found in these languages.
Lecture Notes in Computer Science, 1997
The forall statement is an important language construct in many (data) parallel languages 1], 2],... more The forall statement is an important language construct in many (data) parallel languages 1], 2], 3], 6], 8], 9]. It gives an indication to the compiler which computations can be performed independently.

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems - ANCS '07, 2007
Programming specialized network processors (NPU) is inherently difficult. Unlike mainstream proce... more Programming specialized network processors (NPU) is inherently difficult. Unlike mainstream processors where architectural features such as out-of-order execution and caches hide most of the complexities of efficient program execution, programmers of NPUs face a 'bare-metal' view of the architecture. They have to deal with a multithreaded environment with a high degree of parallelism, pipelining and multiple, heterogeneous, execution units and memory banks. Software development on such architectures is expensive. Moreover, different NPUs, even within the same family, differ considerably in their architecture, making portability of the software a major concern. At the same time expensive network processing applications based on deep packet inspection are both increasingly important and increasingly difficult to realize due to high link rates. They could potentially benefit greatly from the hardware features offered by NPUs, provided they were easy to use. We therefore propose to use more abstract programming models that hide much of the complexity of 'bare-metal' architectures from the programmer. In this paper, we present one such programming model: Ruler, a flexible high-level language for deep packet inspection (DPI) and packet rewriting that is easy to learn, platform independent and lets the programmer concentrate on the functionality of the application. Ruler provides packet matching and rewriting based on regular expressions. We describe our implementation on the Intel IXP2xxx NPU and show how it provides versatile packet processing at gigabit line rates.

Lecture Notes in Computer Science, 2006
Current intrusion detection systems have a narrow scope. They target flow aggregates, reconstruct... more Current intrusion detection systems have a narrow scope. They target flow aggregates, reconstructed TCP streams, individual packets or application-level data fields, but no existing solution is capable of handling all of the above. Moreover, most systems that perform payload inspection on entire TCP streams are unable to handle gigabit link rates. We argue that network-based intrusion detection systems should consider all levels of abstraction in communication (packets, streams, layer-7 data units, and aggregates) if they are to handle gigabit link rates in the face of complex application-level attacks such as those that use evasion techniques or polymorphism. For this purpose, we developed a framework for network-based intrusion prevention at the network edge that is able to cope with all levels of abstraction and can be easily extended with new techniques. We validate our approach by making available a practical system, SafeCard , capable of reconstructing and scanning TCP streams at gigabit rates while preventing polymorphic buffer-overflow attacks, using (up to) layer-7 checks. Such performance makes it applicable in-line as an intrusion prevention system. SafeCard merges multiple solutions, some new and some known. We made specific contributions in the implementation of deep-packet inspection at high speeds and in detecting and filtering polymorphic buffer overflows.

Parallel Computing, 1998
In this paper, we analyze the properties and efficiency of three basic local enumeration and thre... more In this paper, we analyze the properties and efficiency of three basic local enumeration and three storage compression schemes for cyclic(m) data distributions in High Performance Fortran (HPF). The methods are presented in a unified framework, showing the relations between the various methods. We show that for array accesses that are affine functions of the loop bounds, efficient local enumeration and storage compression schemes can be derived. Furthermore, the basic set enumeration and storage techniques are shown to be orthogonal, if the local storage compression scheme is collapsible. This allows choosing the most appropriate method in parts of the computation and communication phases of parallel loops. Performance figures of the methods show that programs with cyclic(m) data distributions can be executed efficiently even without compile-time knowledge of the relevant access, alignment, and distribution parameters.

IXA Education Summit, …, 2005
CardGuard is a signature detection system for intrusion prevention that scans the entire payload ... more CardGuard is a signature detection system for intrusion prevention that scans the entire payload of packets for suspicious patterns and is implemented in software on a network card. The hardware that is used on the card consists of an Intel IXP and various memories. One card can be used to protect either a single host, or a small group of machines connected to a switch. CardGuard is non-intrusive in the sense that no cycles of the host CPUs are used for signature detection and the system still operates at realistic link rates. It currently employs a parallelised version of an efficient string matching algorithm at the lowest level of the processing hierarchy. It is used for detecting the signatures corresponding to intrusion attempts in the packets' payloads. A new version supporting an advanced regular expression algorithm in the microengines of an Intel IXP network processor is under development. For TCP flows, CardGuard first reconstructs the TCP byte stream before applying the pattern matching engine. The system exploits the memory hierarchy of the network card by storing frequently needed data in fast on-chip memory, while data that is rarely accessed is kept in slower off-chip memory.
Uploads
Papers by Kees Van Reeuwijk