Paul Fox

University of Cambridge, Computer Laboratory, Post-Doc

Followers

Following

Public Views

Supervisors: Simon Moore

less

InterestsView All (13)

Uploads

Papers by Paul Fox

Interconnect for commodity FPGA clusters: standardized or customized?

We demonstrate that a small library of customizable interconnect components permits low-area, hig... more We demonstrate that a small library of customizable interconnect components permits low-area, high-performance,
reliable communication tuned to an application, by analogy with
the way designers customize their compute. Whilst soft cores for standard protocols (Ethernet, RapidIO, Infiniband, Interlaken) are a boon for FPGA-to-other-system interconnect, we argue that they are inefficient and unnecessary for FPGA-to-FPGA interconnect. Using the example of BlueLink, our lightweight pluggable interconnect library, we describe how to construct reliable FPGA clusters from hundreds of lower-cost commodity FPGA boards. Utilizing the increasing number of serial links on FPGAs demands efficient use of soft-logic, making domain-optimized custom interconnect attractive for some time to come.

Download

Reliably Prototyping Large SoCs Using FPGA Clusters

Prototyping large SoCs (Systems on Chip) using multiple FPGAs introduces a risk of errors on inte... more Prototyping large SoCs (Systems on Chip) using multiple FPGAs introduces a risk of errors on inter-FPGA links. This raises the question of how we can prove the correctness of a SoC prototyped using multiple FPGAs. We propose using high-speed serial interconnect between FPGAs, with a transparent error detection and correction protocol working on a link-by-link basis. Our inter-FPGA interconnect has an interface that resembles that of a network-on-chip, providing a consistent interface to a prototype SoC and masking the difference between on-chip and off-chip interconnect. Lowlatency communication and low area usage are favoured at the expense of a little bandwidth inefficiency, a trade-off we believe is appropriate given the high bandwidth of inter-FPGA links.

Download

Managing the FPGA Memory Wall: Custom Computing or Vector Processing?

Managing the memory wall is critical for massively parallel FPGA applications where data-sets ar... more Managing the memory wall is critical for massively parallel
FPGA applications where data-sets are large and external
memory must be used. We demonstrate that a soft vector
processor can efficiently stream data from external memory
whilst running computation in parallel. A non-trivial neural
computation case study illustrates that multi-core vector
processing coupled with careful layout of data structures
performs similarly to an elaborate full-custom memory controller
and execution pipeline. The vector processing version
was far simpler to code so we encourage others to consider
vector machines before contemplating a full-custom architecture
on FPGA.

Download

Massively Parallel Neural Computation

"Reverse-engineering the brain is one of the US National Academy of Engineering’s “Grand Challeng... more "Reverse-engineering the brain is one of the US National Academy of Engineering’s “Grand Challenges.” The structure of the brain can be examined at many different levels, spanning many disciplines from low-level biology through psychology and computer science. This thesis focusses on real-time computation of large neural networks using the Izhikevich spiking neuron model.

Neural computation has been described as “embarrassingly parallel” as each neuron can be thought of as an independent system, with behaviour described by a mathematical model. However, the real challenge lies in modelling neural communication. While the connectivity of neurons has some parallels with that of electrical systems, its high fan-out results in massive data processing and communication requirements when modelling neural communication, particularly for real-time computations.

It is shown that memory bandwidth is the most signiﬁcant constraint to the scale of real-time neural computation, followed by communication bandwidth, which leads to a decision to implement a neural computation system on a platform based on a network of Field Programmable Gate Arrays (FPGAs), using commercial off-the-shelf components with some custom supporting infrastructure. This brings implementation challenges, particularly lack of on-chip memory, but also many advantages, particularly high-speed transceivers. An algorithm to model neural communication that makes efﬁcient use of memory and communication resources is developed and then used to implement a neural computation system on the multi-FPGA platform.

Finding suitable benchmark neural networks for a massively parallel neural computation system proves to be a challenge. A synthetic benchmark that has biologically-plausible fan-out, spike frequency and spike volume is proposed and used to evaluate the system. It is shown to be capable of computing the activity of a network of 256k Izhikevich spiking neurons with a fan-out of 1k in real-time using a network of 4 FPGA boards. This compares favourably with previous work, with the added advantage of scalability to larger neural networks using more FPGAs.

It is concluded that communication must be considered as a ﬁrst-class design constraint when implementing massively parallel neural computation systems."

Download

Bluehive — A Field-Programable Custom Computing Machine for Extreme-Scale Real-Time Neural Network Simulation

Bluehive is a custom 64-FPGA machine targeted at scientific simulations with demanding communicat... more Bluehive is a custom 64-FPGA machine targeted at scientific simulations with demanding communication re- quirements. Bluehive is designed to be extensible with a recon- figurable communication topology suited to algorithms with demanding high-bandwidth and low-latency communication, something which is unattainable with commodity GPGPUs and CPUs. We demonstrate that a spiking neuron algorithm can be efficiently mapped to Bluehive using Bluespec SystemVerilog by taking a communication-centric approach. This contrasts with many FPGA-based neural systems which are very focused on parallel computation, resulting in inefficient use of FPGA resources. Our design allows 64k neurons with 64M synapses per FPGA and is scalable to a large number of FPGAs.

Download

Massively Parallel Neural Network Simulation

A system for the simulation of neural networks in real time using FPGAs.

Download

Massively Parallel Neural Network Simulation

A system for the simulation of neural networks in real time using FPGAs.

Download

Communication-focussed approach for real-time neural simulation

Communication on- and off-chip now dominates the power and performance of modern electronic circu... more Communication on- and off-chip now dominates the power and performance of modern electronic circuits. We propose the use of modern field programmable gate arrays (FPGAs) to investigate the communication properties of systems capable of simulating one billion neurons. Each FPGA provides gigabits of chip-to-chip communication bandwidth and on- and off-chip memory bandwidth. The FPGA structure allows us to control the allocation of this bandwidth in great detail allowing optimisations and analysis to be performed. We present our architectural explorations and initial findings.

Download

Error correction in arithmetic operations by I/O inversion

In this paper we demonstrate how error-correcting addition and multiplication can be performed us... more In this paper we demonstrate how error-correcting addition and multiplication can be performed using self-checking modules. Our technique is based on the observation that a suitably designed full adder under the presence of any single stuck-at fault produces the fault-free complement of the desired output when fed by the complement of its functional input. We initially apply conventional parity-based error detection in arithmetic modules; upon detection of a fault, this is followed by input inversion, recomputation, and suitable output inversion. We present adder, register and multiplier designs that can be used in this context. We also design a large-scale circuit using this technique (an elliptical filter), outlining the area savings with respect to traditional triple modular redundancy

Download

Talks by Paul Fox

Inside the box - the secrets of a computer

Lecture given to Year 10 (15 year old) students to explain some of what Computer Science is about... more

Communication-focussed approach for real-time neural simulation

Communication on- and off-chip now dominates the power and performance of modern electronic circu... more Communication on- and off-chip now dominates the power and performance of modern electronic circuits. We propose the use of modern field programmable gate arrays (FPGAs) to investigate the communication properties of systems ca- pable of simulating one billion neurons. Each FPGA pro- vides gigabits of chip-to-chip communication bandwidth and on- and off-chip memory bandwidth. The FPGA structure allows us to control the allocation of this bandwidth in great detail allowing optimisations and analysis to be performed. We present our architectural explorations and initial find- ings.

The challenges of building a brain

Summary of my work for a non-scientific audience.

Paul Fox

Uploads

Papers by Paul Fox

Talks by Paul Fox

Log In