Papers by Mokhtar Aboelaze

Implementation of multiple PID controllers on FPGA
Proportional Integral Control (PID) is one of the most widely used control techniques. Its main a... more Proportional Integral Control (PID) is one of the most widely used control techniques. Its main advantages are simplicity of design and ease of implementation. Although many other control techniques have been proposed and used, the PID controller is the workhorse of the industry. Usually, PID controllers are implemented on microcontrollers. However, with the increase of the use of FPGA's and especially when we require a large number of controllers controlling the same plant (although many processes), FPGA's seem as a very good alternative. One point though, today's FPGA chips run at a frequency of 50-100 MHz or even more for high-end chips. That is very high frequency than what is required for most PID controllers. Our goal is to utilize the FPGA chip resources to implement multiple PID controllers in the same chip. In this paper, we present a technique to implement multiple PID controllers on the same FPGA chip using the computational resources required by only 1 PID core.
A New Hierarchical Small-Node Degree Interconnection Network
Applied Informatics, 1999
Performance Evaluation of a Call Admission Control Protocol for Cellular Networks
International Conference on Wireless Networks, 2004
Abstract Call admission protocols play a central role in determining both the performance of any ... more Abstract Call admission protocols play a central role in determining both the performance of any network, and the revenue of the network. The call admission protocol must decide either to accept the call or reject it, Thus having an impact on both the quality of calls and the network rev- ...
Iranian journal of electrical and computer engineering, 2003
This paper presents a new recursive formulation for Walsh-Hadamard Transform (WHT) that allows th... more This paper presents a new recursive formulation for Walsh-Hadamard Transform (WHT) that allows the generation of higher order (longer size) multidimensional (m-d) WHT architectures from m 2 lower order (shorter sizes) WHT architectures. The objective of our work is to derive a unified framework and a design methodology that allows direct mapping of the proposed algorithms into modular VLSI architectures. Our methodology is based on manipulating tensor product forms so that they can be mapped directly into modular parallel architectures. The resulting WHT circuits have very simple modular structure and regular topology.
This letter presents an improved Toom's algorithm that allows hardware savings without slowing do... more This letter presents an improved Toom's algorithm that allows hardware savings without slowing down the processing speed. We derive formulae for the number of multiplications and additions required to compute the linear convolution of size = 2 . We demonstrate the computational advantage of the proposed improved algorithm when compared to previous algorithms, such as the original matrix-vector multiplication and the FFT algorithms.
Predictive Line Buffer: A Fast, Energy Efficient
Two of the most important factors in the design of any processor are speed and energy consumption... more Two of the most important factors in the design of any processor are speed and energy consumption. Depending on the application, and the processor type, generally one of these two factors will be more important than the other. In this paper, we pro- pose a new cache architecture. Our proposed architecture does not require any changes to the processor architecture, it only assume the existence of a BTB, and it adds few gates and multiplexers for the prediction mechanism. By using Simplescalar simulator, CACTI 3.2 power simulator, and SPEC2000, Mediabench, and Mibench, we tested our proposed architecture using a wide variety of the programs in these three benchmarks. Our results show that our proposed architecture consumes less energy, and have better memory access time, than many existing cache architecture.
An Efficient Methodology for Mapping Algorithms to Scalable Embedded Architectures
This paper presents a general approach for generating higher order (longer size) multidimensional... more This paper presents a general approach for generating higher order (longer size) multidimensional (m-d) architectures from
This letter presents an improved Toom's algorithm that allows hardware savings without slowin... more This letter presents an improved Toom's algorithm that allows hardware savings without slowing down the processing speed. We derive formulae for the number of multiplications and additions required to compute the linear convolution of size . We demonstrate the computational advantage of the proposed improved algorithm when compared to previous algorithms, such as the original matrix-vector multiplication and the FFT algorithms.

Systematic design of computational arrays
In this thesis we discuss some aspects of the design of a system of systolic arrays, from the VLS... more In this thesis we discuss some aspects of the design of a system of systolic arrays, from the VLSI layout level to the system level. First, we discuss 3-D VLSI layout, where tighter lower and upper bounds for the volume and maximum wire length for the layout of the different families of graphs in a 3-D environment were developed. Except in two cases, all the bounds for the volume are optimal. The first case is the one-active-layer layout of the planar graphs, the other is the unrestricted layout for graphs with separators N$\sp{\rm q}$, q = 2/3. A cost model for reflecting the real cost of the layout, instead of taking the volume as a measure of cost, was also developed. In Chapter 3, we develop a methodology for designing a systolic array starting from recurrence equations. The idea of Control Flow Systolic Arrays to handle uniform, as well as nonuniform recurrence equations, is developed. This methodology is basically a search for a heuristic solution in the space of all the possi...
A hardware in the loop emulator for a satellite control system
International Journal of Embedded Systems, 2018

Implementation of multiple PID controllers on FPGA
2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS), 2015
Proportional Integral Control (PID) is one of the most widely used control techniques. Its main a... more Proportional Integral Control (PID) is one of the most widely used control techniques. Its main advantages are simplicity of design and ease of implementation. Although many other control techniques have been proposed and used, the PID controller is the workhorse of the industry. Usually, PID controllers are implemented on microcontrollers. However, with the increase of the use of FPGA's and especially when we require a large number of controllers controlling the same plant (although many processes), FPGA's seem as a very good alternative. One point though, today's FPGA chips run at a frequency of 50-100 MHz or even more for high-end chips. That is very high frequency than what is required for most PID controllers. Our goal is to utilize the FPGA chip resources to implement multiple PID controllers in the same chip. In this paper, we present a technique to implement multiple PID controllers on the same FPGA chip using the computational resources required by only 1 PID core.

Canadian Conference on Electrical and Computer Engineering, 2005.
Computational grids are believed to be an effective and scalable solution to the problem of resou... more Computational grids are believed to be an effective and scalable solution to the problem of resource sharing over large, heterogeneous networks of computing devices. Since grids are highly distributed in nature, one of the most challenging problems is the discovery of dynamic resources in a grid. In this paper we use ideas from P2P systems to propose a solution for the problem. Specifically, we classify nodes as consumers and producers, depending on whether they consume or produce more jobs. Our algorithm connects all producer nodes using a overlay network that is a small-world graph (the graph is produced by adding "shortcut" chords to a circle). The consumer nodes hang off the small world graph. The producer nodes are forced to take part in resource cataloging and discovery. This has three distinct advantages -first, it prevents "freeloading" by forcing producers to do useful work; second, it frees the consumers to only do computations; third, the low diameter of the overlay graph ensures that all resources are within a small number of hops. We simulate and evaluate the performance of our algorithm in realistic traffic conditions. We evaluate the performance of our algorithm using metrics like the average time to answer the query, the average number of requests that were dropped and the average number of hops traveled by query packets. Our experiments show that our algorithm performs well with thousands of nodes.
Proceedings of the IEEE SoutheastCon 2006
Two of the most important factors in the design of any processor are speed and energy consumption... more Two of the most important factors in the design of any processor are speed and energy consumption. In this paper, we propose a new cache architecture that results in a faster memory access and lower energy consumption. Our proposed architecture does not require any changes to the processor architecture, it only assume the existence of a BTB. Using Mediabench, a benchmark used for embedded applications, Simplescalar simulator, and CACTI power simulator,we show that our proposed architecture consumes less energy, and have better memory access time, than many existing cache architectures.

Lecture Notes in Computer Science
Energy efficiency plays a crucial role in the design of embedded processors especially for portab... more Energy efficiency plays a crucial role in the design of embedded processors especially for portable devices with its limited energy source in the form of batteries. Since memory access (either cache or main memory) consumes a significant portion of the energy of a processor, the design of fast low-energy caches has become a very important aspect of modern processor design. In this paper, we present a novel cache architecture to reduce the dynamic energy in instruction cache. Our proposed cache architecture consists of the L1 cache, multiple line buffers, and a prediction mechanism to predict which line buffer, or L1 cache to access next. We used simulation to evaluate our proposed architecture and to compare it with the HotSpot cache, Filter cache, Predictive line buffer cache and Way-Halting cache. Simulation results show that our approach can reduce instruction cache energy consumption, on average, by 75% (compared to the base line architcture) without sacrificing performance

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, 2006
The cache memory plays a crucial role in the performance of any processor. The cache memory (SRAM... more The cache memory plays a crucial role in the performance of any processor. The cache memory (SRAM), especially the on chip cache, is 3-4 times faster than the main memory (DRAM). It can vastly improve the processor performance and speed. Also the cache consumes much less energy than the main memory. That leads to a huge power saving which is very important for embedded applications. In today's processors, although the cache memory reduces the energy consumption of the processor, however the energy consumption in the on-chip cache account to almost 40% of the total energy consumption of the processor. In this paper, we propose a cache architecture, for the instruction cache, that is a modification of the hotspot architecture. Our proposed architecture consists of a small filter cache in parallel with the hotspot cache, between the L1 cache and the main memory. The small filter cache is to hold the code that was not captured by the hotspot cache. We also propose a prediction mechanism to steer the memory access to either the hotspot cache, the filter cache, or the L1 cache. Our design has both a faster access time and less energy consumption compared to both the filter cache and the hotspot cache architectures. We use Mibench and Mediabench benchmarks, together with the simplescalar simulator in order to evaluate the performance of our proposed architecture and compares it with the filter cache and the hotspot cache architectures. The simulation results show that our design outperforms both the filter cache and the hotspot cache in both the average memory access time and the energy consumption.

IET Computers & Digital Techniques, 2008
Modern microprocessors dedicate a large portion of the chip area to the cache. Decreasing the ene... more Modern microprocessors dedicate a large portion of the chip area to the cache. Decreasing the energy consumption of the microprocessor, which is a very important design goal especially for small, battery powered, devices, depends on decreasing the energy consumption of the memory/cache system in the microprocessor. The authors investigate the energy consumption in caches and present a novel cache architecture for reduced energy instruction caches. Our cache architecture consists of the L1 cache, multiple line buffers and a prediction mechanism to predict which line buffer, or L1 cache, to access next. In the proposed technique, the authors use the multiple line buffers as a continuous small filter cache that can catch most of the cache access but they access only a single line buffer, thus reducing the energy consumption of the cache. They used simulation to evaluate the proposed architecture and to compare it with the HotSpot cache, filter cache and single line buffer cache. Simulation results show that the approach is slightly faster than the above mentioned caches, and it consumes considerably less energy than any of these cache architectures.

Optical networks consist of switches that are connected using fiber optics links. Each link consi... more Optical networks consist of switches that are connected using fiber optics links. Each link consists of a set of wavelengths and each wavelength can be used by one or more users to transmit information between two switches. In order to establish a connection between the source and destination nodes, a set of switches and links must be efficiently selected. This is known as the routing problem. A wavelength is then assigned in each selected link to establish the connection. This is known as the wavelength assignment problem. The problem of routing and wavelength assignment (RWA) in optical networks has been shown to be NP-Complete. In this paper, we propose a new approach to solving the RWA problem using advanced Boolean satisfiability (SAT) techniques. SAT has been heavily researched in the last few years. Significant advances have been proposed and have lead to the development of powerful SAT solvers that can handle very large problems. SAT solvers use intelligent search algorithms that can traverse the search space and efficiently prune parts that contain no solutions. These solvers have recently been used to solve many challenging problems in Engineering and Computer Science. In this paper, we show how to formulate the RWA problem as a SAT instance and evaluate several advanced SAT techniques in solving the problem. Our approach is verified on various network topologies. The results are promising and indicate that using the proposed approach can improve on previous techniques.
Routing and Wavelength Assignment in Optical Networks Using Boolean Satisfiability
2008 5th Ieee Consumer Communications and Networking Conference, 2008
Single copy vs. multiple copies cache coherence protocols for hierarchical bus multiprocessors
Conference Proceedings of the 1996 IEEE Fifteenth Annual International Phoenix Conference on Computers and Communications, 1996
An FPGA based low power multiplier for FFT in OFDM systems using precomputations
2013 International Conference on ICT Convergence (ICTC), 2013
Uploads
Papers by Mokhtar Aboelaze