Dara Rahmati

Institute for Research in Fundamental Sciences, Computer Science, Post-Doc

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Dara Rahmati

GPU Acceleration of LS-SVM, Based on Fractional Orthogonal Functions

Industrial and applied mathematics, 2023

Hardware Efficient FIR Filter Architectures Using Accurate Unary Stochastic Computing

2022 IEEE 40th International Conference on Computer Design (ICCD)

Financial Market Prediction Using Deep Neural Networks with Hardware Acceleration

2022 12th International Conference on Computer and Knowledge Engineering (ICCKE), Nov 17, 2022

MCILS: Monte-Carlo Interpolation Least-Square Algorithm for Approximation of Edge-Reliability Polynomial

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)

Edge reliability problem has many applications in different field of science and engineering such... more Edge reliability problem has many applications in different field of science and engineering such as: cognitive science, neuroscience, electrical engineering, network science and so on. The major challenge in this problem is time complexity of the exact algorithm. Computing the reliability of a network is NP-hard problem. So, computing the reliability of a large scale network is a challenging problem. In this paper, we present a novel algorithm based on a hybrid Monte-Carlo, interpolation and least-square methods to approximate the reliability of a network. The presented algorithm is applied on some networks that the exact reliability polynomial is available for them. the experiments show that the presented algorithm is accurate and robust.

A Study on Non-overlapping Multi-agent Pathfinding

Lecture Notes in Networks and Systems, 2022

Application Specific Networks-on-Chip Synthesis: An Energy Efficient Approach

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2018

Multiple Voltage Supply (MSV) chip fabrication is considered a viable technique for addressing th... more Multiple Voltage Supply (MSV) chip fabrication is considered a viable technique for addressing the power and thermal challenges of modern many-core systems. Efficiency of this technique has been demonstrated in application specific Network-on-Chips (NoCs) which have lots of cores and various operating voltages/frequencies. In this paper, a four-phase synthesis toolchain is proposed and evaluated for the design of multi-voltage application specific NoCs. The proposed synthesis toolchain performs (i) core to router allocation, (ii) voltage islanding to match voltages of cores connected to the same router, (iii) hierarchical floorplanning to reduce the complexity of power delivery network, and (iv) path allocation to connect routers based on the application requirements. The distinguishing feature of the proposed toolchain is that, for the first time, the router allocation phase is performed prior to voltage islanding. This approach offers more flexibility and more efficiency in the multi-voltage NoC synthesis process. Experimental results on real benchmarks show that the toolchain (a) provides 63% less energy consumption and (b) produces twice as much alternative designs satisfying the benchmarks requirements when compared to existing approaches. Index Terms-application-specific chip, custom NoC synthesis, partitioning, islanding, floorplanning.

Download

$Research paper thumbnail of A single layer fractional orthogonal neural network for solving various types of Lane–Emden equation$

A single layer fractional orthogonal neural network for solving various types of Lane–Emden equation

New Astronomy, 2019

This is a PDF file of an article that has undergone enhancements after acceptance, such as the ad... more This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Download

HyperDbg: Reinventing Hardware-Assisted Debugging

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security

Software analysis, debugging, and reverse engineering have a crucial impact in today's software i... more Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this paper, † we present a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, HyperDbg relies on state-of-theart hardware features available in today's CPUs, such as VT-x and Extended Page Table (EPT). In contrast to other widely used existing debuggers, we design HyperDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of 10,853 malware samples show that Hy-perDbg's stealthiness allows debugging on average 22% and 26% more samples than WinDbg and x64dbg, respectively. Moreover, in contrast to existing debuggers, HyperDbg is not detected by any of the 13 tested packers and protectors. We improve the performance over other debuggers by deploying a VMX-compatible script engine, eliminating unnecessary context switches. Our experiment on three concrete debugging scenarios shows that compared to WinDbg as the only kernel debugger, HyperDbg performs step-in, conditional breaks, and syscall recording, 2.98x, 1319x, and 2018x faster, respectively. We finally show real-world applications, such as

Download

HyperDbg: Reinventing Hardware-Assisted Debugging (Extended Version)

Cornell University - arXiv, May 29, 2022

Software analysis, debugging, and reverse engineering have a crucial impact in today's software i... more Software analysis, debugging, and reverse engineering have a crucial impact in today's software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this paper, we present HyperDbg, * a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, HyperDbg relies on state-of-the-art hardware features available in today's CPUs, such as VT-x and Extended Page Table (EPT). In contrast to other widely used existing debuggers, we design HyperDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of 10,853 malware samples show that HyperDbg's stealthiness allows debugging on average 22% and 26% more samples than WinDbg and x64dbg, respectively. Moreover, in contrast to existing debuggers, HyperDbg is not detected by any of the 13 tested packers and protectors. We improve the performance over other debuggers by deploying a VMX-compatible script engine, eliminating unnecessary context switches. Our experiment on three concrete debugging scenarios shows that compared to WinDbg as the only kernel debugger, HyperDbg performs step-in, conditional breaks, and syscall recording, 2.98x, 1319x, and 2018x faster, respectively. We finally show real-world applications, such as a 0-day analysis, structure reconstruction for reverse engineering, software performance analysis, and code-coverage analysis. CCS CONCEPTS • Security and privacy → Virtualization and security; Software security engineering; • Software and its engineering → Compilers.

Download

HyperDbg: Reinventing Hardware-Assisted Debugging

Software analysis, debugging, and reverse engineering have a crucial impact in today’s software i... more Software analysis, debugging, and reverse engineering have a crucial impact in today’s software industry. Efficient and stealthy debuggers are especially relevant for malware analysis. However, existing debugging platforms fail to address a transparent, effective, and high-performance low-level debugger due to their detectable fingerprints, complexity, and implementation restrictions. In this paper, we present HyperDbg, a new hypervisor-assisted debugger for high-performance and stealthy debugging of user and kernel applications. To accomplish this, HyperDbg relies on state-of-the-art hardware features available in today’s CPUs, such as VT-x and extended page tables. In contrast to other widely used existing debuggers, we design HyperDbg using a custom hypervisor, making it independent of OS functionality or API. We propose hardware-based instruction-level emulation and OS-level API hooking via extended page tables to increase the stealthiness. Our results of the dynamic analysis of...

Download

ANDRESTA: An Automated NoC-Based Design Flow for Real-Time Streaming Applications

2020 CSI/CPSSI International Symposium on Real-Time and Embedded Systems and Technologies (RTEST)

Download

A TSX-Based KASLR Break: Bypassing UMIP and Descriptor-Table Exiting

Lecture Notes in Computer Science

In this paper, we introduce a reliable method based on Transactional Synchronization Extensions (... more In this paper, we introduce a reliable method based on Transactional Synchronization Extensions (TSX) side-channel leakage to break the KASLR and reveal the address of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT). We indicate that by detecting these addresses, one could execute instructions to sidestep Intel’s UserMode Instruction Prevention (UMIP) and the Hypervisor-based mitigation and, consequently, neutralized them. The introduced method is successfully performed after the most recent patches for Meltdown and Spectre. Moreover, we demonstrate that a combination of this method with a call-gate mechanism (available in modern processors) in a chain of events will eventually lead to a system compromise despite the limitations of a super-secure sandboxed environment in the presence of Windows’s proprietary Virtualization Based Security (VBS). Finally, we suggest software-based mitigation to avoid these issues with an acceptable overhead cost.

FPGA-orthopoly: a hardware implementation of orthogonal polynomials

Engineering with Computers, 2022

A multi-application approach for synthesizing custom network-on-chips

The Journal of Supercomputing

A low-power hybrid non-volatile cache with asymmetric coding

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE), 2017

Cache memories such as magnetic ram or phase change memory came a long way in term of their archi... more Cache memories such as magnetic ram or phase change memory came a long way in term of their architecture from their earlier models and have marked differences in power, performance, access latency, and dynamic/static energy consumption. In our work, we propose a hybrid cache design that exploits the characteristics of the employed cache technologies to achieve better power and area efficiency alongside the asymmetric coding that increases the ratio of 0s to 1s in the cache data by adding an order of information redundancy to the cache's original data. We benefit from a hybrid cache memory architecture that utilizes the positive aspects of STT-RAM and SRAM technologies to propose a solution that is more energy efficient compared to conventional cache architectures. By the evaluation of programs' cache data from Splash-2 and Parsec suits, it is indicated that alone by the hybrid architecture the total static and dynamic power consumption has dropped by 55% compared to the SRAM and DRAM caches and the area has reduced by 45%. With the aid of the proposed coding scheme, the number of set operations issued to cache has decreased by 47%. This reduces the write power of programs by 24%, leading to an overall 14% reduction in the programs' total static and dynamic power consumption.

High-Average and Guaranteed Performance for Wireless Networks-on-Chip Architectures

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2018

Network on Chip (NoC) is the underlying communication platform for multi-core embedded systems. W... more Network on Chip (NoC) is the underlying communication platform for multi-core embedded systems. Wireless NoCs (WNoC) employ wired and wireless structures simultaneously to facilitate communication scenarios. In this paper, we propose an arbitration mechanism to guarantee the performance parameters for real-time traffic flows transferred in the wireless plane, while preserving high average performance for all the traffic flows in the wired or wireless sections of a wireless NoC. Different scenarios have been carefully selected and clear suggestions are provided using analytic performance models to effectively use a wireless NoC for real time application category. We have provided different examples to illustrate the effectiveness of the proposal.

On Using Monte-Carlo Tree Search to Solve Puzzles

2021 7th International Conference on Computer Technology Applications, 2021

Solving puzzles has become increasingly important in artificial intelligence research since the s... more Solving puzzles has become increasingly important in artificial intelligence research since the solutions could be directly applied to real-world or general problems such as pathfinding, path planning, and exploration problems. Selecting the best approach to solve puzzles has always been an essential issue. Monte-Carlo Tree Search (MCTS) has surged into popularity as a promising approach due to its low run-time and memory complexity. Thus, it is required to know how to employ this method to solve the puzzles. In this work, we study the applicability of MCTS in solving puzzles or solving a puzzle with MCTS, not comparing many MCTS approaches. We propose a general classification of puzzles based on their features. This classification consists of four primary classes that provide a mathematical formula for each and their satisfactory

Download

Unlucky Explorer: A Complete non-Overlapping Map Exploration

2021 The 3rd World Symposium on Software Engineering, 2021

Nowadays, the field of Artificial Intelligence in Computer Games (AI in Games) is going to be mor... more Nowadays, the field of Artificial Intelligence in Computer Games (AI in Games) is going to be more alluring since computer games challenge many aspects of AI with a wide range of problems, particularly general problems. One of these kinds of problems is Exploration, which states that an unknown environment must be explored by one or several agents. In this work, we have first introduced the Maze Dash puzzle as an exploration problem where the agent must find the a Hamiltonian Path visiting all the cells. Then, we have investigated to find suitable methods by a focus on Monte-Carlo Tree Search (MCTS) and SAT to solve this puzzle quickly and accurately. An optimization has been applied to the proposed MCTS algorithm to obtain a promising result. Also, since the prefabricated test cases of this puzzle are not large enough to assay the proposed method, we have proposed and employed a technique to generate solvable test cases to evaluate the approaches. Eventually, the MCTS-based method has been assessed by the auto-generated test cases and compared with our implemented SAT approach that is considered a good rival. Our comparison indicates that the MCTS-based approach is an up-and-coming method which could cope with the test cases with small and medium sizes with faster run-time compared to SAT. However, for certain discussed reasons, including the features of the problem, tree search organization, and also the approach of MCTS in the Simulation step, MCTS takes more time to execute in Large size scenarios. Consequently, we have found the bottleneck for the MCTS-based method in significant test cases that could be improved in two real-world problems.

Download

Generating High Quality Random Numbers: A High Throughput Parallel Bitsliced Approach

In this work, by employing a bitsliced data representation as building blocks of algorithms, we s... more In this work, by employing a bitsliced data representation as building blocks of algorithms, we showcase the capability and scalability of our proposed method in a variety of PRNG methods in the category of block and stream ciphers. While demonstrating the suitability of stream-ciphers for high throughput PRNG, as an example, we implement and investigate a bitsliced MICKEY 2.0 PRNG by altering the paradigm of internal functions and data structure. The LFSR-based (Linear Feedback Shift Register) nature of the PRNG in our implementation perfectly suits the GPU's many-core structure due to its register oriented architecture and allows the usage of bit slicing technique to further improve the performance. In our SIMD vectorized fully parallel GPU implementation, each GPU thread is capable of generating a remarkable number of 32 pseudo-random bits in each LFSR clock cycle. We then compare our implementation with some of the most significant PRNGs that display a satisfactory performan...

Download

A Way Around UMIP and Descriptor-Table Exiting via TSX-based Side-Channel

Nowadays, in operating systems, numerous protection mechanisms prevent or limit the user-mode app... more Nowadays, in operating systems, numerous protection mechanisms prevent or limit the user-mode applicationsto access the kernels internal information. This is regularlycarried out by software-based defenses such as Address Space Layout Randomization (ASLR) and Kernel ASLR(KASLR). They play pronounced roles when the security of sandboxed applications such as Web-browser are considered.Armed with arbitrary write access in the kernel memory, if these protections are bypassed, an adversary could find a suitable where to write in order to get an elevation of privilege or code execution in ring 0. In this paper, we introduce a reliable method based on Transactional Synchronization Extensions (TSX) side-channel leakage to reveal the address of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT). We indicate that by detecting these addresses, one could execute instructions to sidestep the Intels User-Mode InstructionPrevention (UMIP) and the Hypervisor-based mitigation and...

Download

Dara Rahmati

Uploads

Papers by Dara Rahmati

Log In