Shafigh Parsazad

Data Selection for Semi-Supervised Learning

ArXiv, 2012

The real challenge in pattern recognition task and machine learning process is to train a discrim... more The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of problems. It uses only a small set of labeled with the company of huge remain and unlabeled data to train the discriminator. In semi-supervised learning, it is very essential that which data is labeled and depend on position of data it effectiveness changes. In this paper, we proposed an evolutionary approach called Artificial Immune System (AIS) to determine which data is better to be labeled to get the high quality data. The experimental results represent the effectiveness of this algorithm in finding these data points.

Download

Fast Simulation of Programmable Network Forwarding Plane Devices

With the evolution of the Internet, the processing of packets at the routers while providing flex... more With the evolution of the Internet, the processing of packets at the routers while providing flexibility in deploying new protocols and services at the same time has become a major concern. Programmable forwarding elements with high processing capability have emerged as a solution. But the main challenge is to find the optimal hardware architecture while taking into account constraints such as different packet processing functions, task scheduling options, electrical power consumption and providing quality-of-service (QoS) guarantees. Therefore, it is essential to investigate methods that help in identifying limitations and bottlenecks before physical fabrication. Having an appropriate model provides designers a progressive path to narrow the design space and establish credible and feasible alternatives before deciding on an implementation. In this thesis, we propose a flexible and fast instruction accurate host-compiled simulator to make it possible to explore wide ranges of archit...

Download

Fast Feature Reduction in intrusion detection datasets

In the most intrusion detection systems (IDS), a system tries to learn characteristics of differe... more In the most intrusion detection systems (IDS), a system tries to learn characteristics of different type of attacks by analyzing packets that sent or received in network. These packets have a lot of features. But not all of them is required to be analyzed to detect that specific type of attack. Detection speed and computational cost is another vital matter here, because in these types of problems, datasets are very huge regularly. In this paper we tried to propose a very simple and fast feature selection method to eliminate features with no helpful information on them. Result faster learning in process of redundant feature omission. We compared our proposed method with three most successful similarity based feature selection algorithm including Correlation Coefficient, Least Square Regression Error and Maximal Information Compression Index. After that we used recommended features by each of these algorithms in two popular classifiers including: Bayes and KNN classifier to measure th...

Download

Automatic firewall rules generator for anomaly detection systems withApriori algorithm

Network intrusion detection systems have become a crucial issue for computer systems security inf... more Network intrusion detection systems have become a crucial issue for computer systems security infrastructures. Different methods and algorithms are developed and proposed in recent years to improve intrusion detection systems. The most important issue in current systems is that they are poor at detecting novel anomaly attacks. These kinds of attacks refer to any action that significantly deviates from the normal behaviour which is considered intrusion. This paper proposed a model to improve this problem based on data mining techniques. Apriori algorithm is used to predict novel attacks and generate real-time rules for firewall. Apriori algorithm extracts interesting correlation relationships among large set of data items. This paper illustrates how to use Apriori algorithm in intrusion detection systems to cerate a automatic firewall rules generator to detect novel anomaly attack. Apriori is the best-known algorithm to mine association rules. This is an innovative way to find association rules on large scale.

Download

A new scheduling algorithm for server farms load balancing

International Conference on Industrial and Information Systems, 2010

This paper describes a new scheduling algorithm to distribute jobs in server farm systems. The pr... more This paper describes a new scheduling algorithm to distribute jobs in server farm systems. The proposed algorithm overcomes the starvation caused by SRPT (Shortest Remaining Processing Time). This algorithm is used in process scheduling in operating system approach. The algorithm was developed to be used in dispatcher scheduling. This algorithm is non-preemptive discipline, similar to SRPT, in which the priority

Download

Improving the K-means algorithm using improved downhill simplex search

the k-means algorithm is one of the well-known and most popular clustering algorithms. K-means se... more the k-means algorithm is one of the well-known and most popular clustering algorithms. K-means seeks an optimal partition of the data by minimizing the sum of squared error with an iterative optimization procedure, which belongs to the category of hill climbing algorithms. As we know hill climbing searches are famous for converging to local optimums. Since kmeans can converge to a local optimum, different initial points generally lead to different convergence cancroids, which makes it important to start with a reasonable initial partition in order to achieve high quality clustering solutions. However, in theory, there exist no efficient and universal methods for determining such initial partitions. In this paper we tried to find an optimum initial partitioning for k-means algorithm. To achieve this goal we proposed a new improved version of downhill simplex search, and then we used it in order to find an optimal result for clustering approach and then compare this algorithm with Genetic Algorithm base (GA), Genetic K-Means (GKM), Improved Genetic K-Means (IGKM) and k-means algorithms.

Download

PFPSim

Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, 2016

In this paper, we introduce PFPSim, a host-compiled simulator for early validation and analysis o... more In this paper, we introduce PFPSim, a host-compiled simulator for early validation and analysis of packet processing applications on programmable forwarding plane architectures. The simulation model is automatically generated from a high-level description of the hardware/software architecture of the forwarding device and the behavioral description of the various modules in the architecture. Our high-level architectural description language is capable of defining many-core network processors as well as reconfigurable pipelines. The behavior of the fixed-function processing elements in the architecture is defined in C++. The code targeted for the processor cores, or reconfigurable pipeline stages, is compiled from P4, an emerging programming language for packet processing applications. Application developers can use PFPSim as a virtual prototype to simulate and debug their applications before hardware availability. Moreover, forwarding device architects can use PFPSim to evaluate the trade-offs between different hardware/software design decisions.

Download

Fast Feature Reduction in intrusion detection datasets

In the most intrusion detection systems (IDS), a system tries to learn characteristics of differe... more In the most intrusion detection systems (IDS), a system tries to learn characteristics of different type of attacks by analyzing packets that sent or received in network. These packets have a lot of features. But not all of them is required to be analyzed to detect that specific type of attack. Detection speed and computational cost is another vital matter here, because in these types of problems, datasets are very huge regularly. In this paper we tried to propose a very simple and fast feature selection method to eliminate features with no helpful information on them. Result faster learning in process of redundant feature omission. We compared our proposed method with three most successful similarity based feature selection algorithm including Correlation Coefficient, Least Square Regression Error and Maximal Information Compression Index. After that we used recommended features by each of these algorithms in two popular classifiers including: Bayes and KNN classifier to measure the quality of the recommendations. Experimental result shows that although the proposed method can't outperform evaluated algorithms with high differences in accuracy, but in computational cost it has huge superiority over them.

Download

Data Selection for Semi-Supervised Learning

The real challenge in pattern recognition task and machine learning process is to train a discrim... more The real challenge in pattern recognition task and machine learning process is to train a discriminator using labeled data and use it to distinguish between future data as accurate as possible. However, most of the problems in the real world have numerous data, which labeling them is a cumbersome or even an impossible matter. Semi-supervised learning is one approach to overcome these types of problems. It uses only a small set of labeled with the company of huge remain and unlabeled data to train the discriminator. In semi-supervised learning, it is very essential that which data is labeled and depend on position of data it effectiveness changes. In this paper, we proposed an evolutionary approach called Artificial Immune System (AIS) to determine which data is better to be labeled to get the high quality data. The experimental results represent the effectiveness of this algorithm in finding these data points.

Download

Uploads

Papers by Shafigh Parsazad

Log In