Papers by Pascal Wolkotte
— In this article we present the results of partitioning the OFDM baseband processing of a DRM re... more — In this article we present the results of partitioning the OFDM baseband processing of a DRM receiver into smaller independent processes. Furthermore, we give a short introduction into the relevant parts of the DRM standard. Based on the number of multiplications and additions we can map individual processes on a heterogeneous multi-tile architecture. This architecture can meet both the computational demands as well as the restricted energy budget.
Energy model of network-on-chip and a bus

IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI'06)
This paper presents an on-chip network for a run-time reconfigurable System-on-Chip. The network ... more This paper presents an on-chip network for a run-time reconfigurable System-on-Chip. The network uses packetswitching with virtual channels. It can provide guaranteed services as well as best effort services. The guaranteed services are based on virtual channel allocation, in contrast to other on-chip networks where guarantees are provided by time-division multiplexing. The network is particularly suitable for systems in which the traffic is dominated by streams. We model the data traffic in the system and simulate the behaviour of the network with this model. The results show that the network is capable of handling the system traffic and can provide the required guarantees. Advances in silicon technology bring, among others, two problems that chip designers have to face -a high design complexity and a signal integrity problem [1],[2], . The first problem is the concern that the complexity of a system that fits on a single chip is getting so high that the time needed to design a completely new system using the current design methods and tools is becoming impractical long. For that reason it is foreseen that future System-on-Chip (SoC) will be based mostly on pre-designed IP blocks relying on extensive IP reuse. To be practical, such a design methodology needs to be complemented with a unified and simple solution for interconnecting and integrating IP blocks in a system. Currently on-chip buses offer such a solution, but since the bus bandwidth does not scale with the number of IP cores on the chip it will soon become a system bottleneck. The second problem, the signal integrity problem, is due to the fact that with the technology scaling transistors get smaller and faster while wires get thinner and slower. Wire delay becomes proportional to the wire length and a few long wires on a chip can degrade the performance of the entire chip. Thus, the on-chip interconnects become a limiting factor for SoC performance and their physical parameters * This research is supported by the research program of the Dutch organisation for Scientific Research NWO (project number 612.064.103) and the EU-FP6 project 4S (IST 001908

International Conference on Field Programmable Logic and Applications, 2005.
A Network-on-Chip (NoC) is an energy-efficient on-chip communication architecture for Multi-Proce... more A Network-on-Chip (NoC) is an energy-efficient on-chip communication architecture for Multi-Processor System-on-Chip (MPSoC) architectures. In an earlier paper we proposed a energy-efficient reconfigurable circuit-switched NoC to reduce the energy consumption compared to a packetswitched NoC. In this paper we investigate a chordal slotted ring and a bus architecture that can be used to handle the best-effort traffic in the system and configure the circuitswitched network. Both architectures are compared on their latency behavior and power consumption. At the same clock frequency, the chordal ring has the major benefit of a lower latency and higher throughput. But the bus has a lower overall power consumption at the same frequency. However, if we tune the frequency of the network to meet the throughput requirements of control network, we see that the ring consumes less energy per transported bit.

Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
Digital Down Conversion (DDC) is an algorithm, used to lower the amount of samples per second by ... more Digital Down Conversion (DDC) is an algorithm, used to lower the amount of samples per second by selecting a limited frequency band out of a stream of samples. A possible DDC algorithm consists of two simple Cascading Integrating Comb (CIC) filters and a Finite Input Response (FIR) filter preceded by a modulator that is controlled with a Numeric Controlled Oscillator (NCO). Implementations of the algorithm have been made for five architectures, two Application Specific Integrated Circuits (ASIC), a General Purpose Processor (GPP), a Field Programmable Gate Array (FPGA), and the Montium Tile Processor (TP). All architectures are functionally capable of performing the algorithm. The differences between the architectures are their performance, flexibility and energy consumption. In this paper we compared the energy consumption of the architectures when performing the DDC algorithm. The ASIC is the best solution if digital down conversion is constantly required. When digital down conversion is needed only parts of the time, the Altera Cyclone II is the best solution due to its smaller technology size. In the spare time the reconfigurable architectures can be reconfigured for other tasks of today's multimedia devices.
Lecture Notes in Computer Science, 2006
Virtual channel reservation is a simple approach for providing guaranteed throughput services in ... more Virtual channel reservation is a simple approach for providing guaranteed throughput services in a virtual channel network-on-chip. However, its performance is limited by the number of virtual channels per physical channels. In this paper we explore the limits of the approach and investigate how these limits depend on the routing algorithm, the traffic locality, the network topology and the network size. The results show the the approach can be applied in a network of size 10-by-10 nodes with four virtual channels per physical channel. The traffic locality has strong influence on the performance limits of the approach and can also help in reducing the communication energy cost by 50% to 70%. The type of the routing algorithm does not practically influence the performance limits.
2007 International Conference on Field Programmable Logic and Applications, 2007
This paper describes the mapping of a two-dimensional inverse discrete cosine transform (2-D IDCT... more This paper describes the mapping of a two-dimensional inverse discrete cosine transform (2-D IDCT) onto a wordlevel reconfigurable Montium R processor. This shows that the IDCT is mapped onto the Montium tile processor (TP) with reasonable effort and presents performance numbers in terms of energy consumption, speed and silicon costs. The Montium results are compared with the IDCT implementation on three other architectures: TI DSP, ASIC and ARM.

19th IEEE International Parallel and Distributed Processing Symposium
Network-on-Chip (NoC) is an energy-efficient on-chip communication architecture for multi-tile Sy... more Network-on-Chip (NoC) is an energy-efficient on-chip communication architecture for multi-tile System-on-Chip (SoC) architectures. The SoC architecture, including its run-time software, can replace inflexible ASICs for future ambient systems. These ambient systems have to be flexible as well as energy-efficient. To find an energy-efficient solution for the communication network we analyze three wireless applications. Based on their communication requirements we observe that revisiting of the circuit switching techniques is beneficial. In this paper we propose a new energy-efficient reconfigurable circuit-switched Network-on-Chip. By physically separating the concurrent data streams we reduce the overall energy consumption. The circuit-switched router has been synthesized and analyzed for its power consumption in 0.13 µm technology. A 5-port circuit-switched router has an area of 0.05 mm 2 and runs at 1075 MHz. The proposed architecture consumes 3.5 times less energy compared to its packet-switched equivalent.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2009

EURASIP Journal on Embedded Systems, 2007
We focus on architectures for streaming DSP applications such as wireless baseband processing and... more We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm 2 in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool.

This presentation will focus on algorithms and reconfigurable tiled architectures for streaming D... more This presentation will focus on algorithms and reconfigurable tiled architectures for streaming DSP applications. The tile concept will not only be applied on chip level but also on board-level and system-level. The tile concept has a number of advantages: (1) depending on the requirements more or less tiles can be switched on/off, (2) the tile structure fits well to future IC process technologies, more tiles will be available in advanced process technologies, but the complexity per tile stays the same, (3) the tile concept is fault tolerant, faulty tiles can be discarded and (4) tiles can be configured in parallel. Because processing and memory is combined in the tiles, tasks can be executed efficiently on tiles (locality of reference). There are a number of application domains that can be considered as streaming DSP applications: for example wireless baseband processing (for HiperLAN/2, WiMax, DAB, DRM, DVB), multimedia processing (e.g. MPEG, MP3 coding/decoding), medical image pr...
Uploads
Papers by Pascal Wolkotte