Papers by Hamid Alinejad-Rokny

The dinucleotide CpG is highly underrepresented in the genome of human immunodeficiency virus typ... more The dinucleotide CpG is highly underrepresented in the genome of human immunodeficiency virus type 1 (HIV-1). To identify the source of CpG depletion in the HIV-1 genome, we investigated two biological mechanisms: (1) CpG methylation-induced transcriptional silencing and (2) CpG recognition by Toll-like receptors (TLRs). We hypothesized that HIV-1 has been under selective evolutionary pressure by these mechanisms leading to the reduction of CpG in its genome. A CpG depleted genome would enable HIV-1 to avoid methylation-induced transcriptional silencing and/or to avoid recognition by TLRs that identify foreign CpG sequences. We investigated these two hypotheses by determining the sequence context dependency of CpG depletion and comparing it with that of CpG methylation and TLR recognition. We found that in both human and HIV-1 genomes the CpG motifs flanked by T/A were depleted most and those flanked by C/G were depleted least. Similarly, our analyses of human methylome data revealed that the CpG motifs flanked by T/A were methylated most and those flanked by C/G were methylated least. Given that a similar CpG depletion pattern was observed for the human genome within which CpGs are not likely to be recognized by TLRs, we argue that the main source of CpG depletion in HIV-1 is likely host-induced methylation. Analyses of CpG motifs in over 100 viruses revealed that this unique CpG representation pattern is specific to the human and simian immunodeficiency viruses.
The human genome encodes for a family of editing enzymes known as APOBEC3 (apolipoprotein B mRNA ... more The human genome encodes for a family of editing enzymes known as APOBEC3 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like3). They induce context dependent G-to-A changes, referred to as “hypermutation”, in the genome of viruses such as HIV, SIV, HBV and endogenous retroviruses. Hypermutation is characterized by aligning affected sequences to a reference sequence. We show that indels (insertions/deletions) in the sequences lead to an incorrect assignment of APOBEC3 targeted and non-target sites. This can result in an incorrect identification of hypermutated sequences and erroneous biological inferences made based on hypermutation analysis.

CD8+ T cells are important for the control of chronic HIV infection. However, the virus rapidly a... more CD8+ T cells are important for the control of chronic HIV infection. However, the virus rapidly acquires “escape mutations” that reduce CD8+ T cell recognition and viral control. The timing of when immune escape occurs at a given epitope varies widely
among patients and also among different epitopes within a patient. The strength of the CD8+ T cell response, as well as mutation rates, patterns of particular amino acids undergoing escape, and growth rates of escape mutants, may affect when escape occurs.
In this study, we analyze the epitope-specific CD8+ T cells in 25 SIV-infected pigtail macaques responding to three SIV epitopes.
Two epitopes showed a variable escape pattern and one had a highly monomorphic escape pattern. Despite very different patterns,
immune escape occurs with a similar delay of on average 18 d after the epitope-specific CD8+ T cells reach 0.5% of total CD8+
T cells. We find that the most delayed escape occurs in one of the highly variable epitopes, and that this is associated with a delay
in the epitope-specific CD8+ T cells responding to this epitope. When we analyzed the kinetics of immune escape, we found that multiple escape mutants emerge simultaneously during the escape, implying that a diverse population of potential escape mutants
is present during immune selection. Our results suggest that the conservation or variability of an epitope does not appear to affect the timing of immune escape in SIV. Instead, timing of escape is largely determined by the kinetics of epitope-specific CD8+ T cells.

The influence of major histocompatibility complex class I (MHC-I) alleles on human immunodeficien... more The influence of major histocompatibility complex class I (MHC-I) alleles on human immunodeficiency virus (HIV) diversity in humans has been well characterized at the population level. MHC-I alleles likely affect viral diversity in the simian immunodeficiency virus (SIV)-infected pig-tailed macaque (Macaca nemestrina) model, but this is poorly characterized. We studied the evolution of SIV in pig-tailed macaques with a range of MHC-I haplotypes. SIVmac251 genomes were amplified from the plasma of 44 pig-tailed macaques infected with SIVmac251 at 4 to 10 months after infection and characterized by Illumina deep sequencing. MHC-I typing was performed on cellular RNA using Roche/454 pyrosequencing. MHC-I haplotypes and viral sequence polymorphisms at both individual mutations and groups of mutations spanning 10-amino-acid segments were linked using in-house bioinformatics pipelines, since cytotoxic T lymphocyte (CTL) escape can occur at different amino acids within the same epitope in different animals. The approach successfully identified 6 known CTL escape mutations within 3 Mane-A1*084-restricted epitopes. The approach also identified over 70 new SIV polymorphisms linked to a variety of MHC-I haplotypes. Using functional CD8 T cell assays, we confirmed that one of these associations, a Mane-B028 haplotype-linked mutation in Nef, corresponded to a CTL epitope. We also identified mutations associated with the Mane-B017 haplotype that were previously described to be CTL epitopes restricted by Mamu-B*017:01 in rhesus macaques. This detailed study of pig-tailed macaque MHC-I genetics and SIV polymorphisms will enable a refined level of analysis for future vaccine design and strategies for treatment of HIV infection.

We used a multivariate data analysis approach to identify motifs associated with HIV hypermutatio... more We used a multivariate data analysis approach to identify motifs associated with HIV hypermutation by different APOBEC3 enzymes. The analysis showed that APOBEC3G targets G mainly within GG, TG, TGG, GGG, TGGG and also GGGT. The G nucleotides flanked by a C at the 39 end (in +1 and +2 positions) were indicated as disfavoured targets by APOBEC3G. The G nucleotides within GGGG were found to be targeted at a frequency much less than what is expected. We found that the infrequent G-to-A mutation within GGGG is not limited to the inaccessibility, to APOBEC3, of poly Gs in the central and 39polypurine tracts (PPTs) which remain double stranded during the HIV reverse transcription. GGGG motifs outside the PPTs were also disfavoured. The motifs GGAG and GAGG were also found to be disfavoured targets for APOBEC3. The motif dependent mutation of G within the HIV genome by members of the APOBEC3 family other than APOBEC3G was limited to GARAA changes. The results did not show evidence of other types of context dependent G-to-A changes in the HIV genome.

The low fidelity of HIV replication facilitates immune and drug escape. Some reverse transcriptas... more The low fidelity of HIV replication facilitates immune and drug escape. Some reverse transcriptase (RT) inhibitor drug-resistance mutations increase RT fidelity in biochemical assays but their effect during viral replication is unclear. We investigated the effect of RT mutations K65R, Q151N and V148I on SIV repli-cation and fidelity in vitro, along with SIV replication in pigtailed macaques. SIV mac239-K65R and SIV mac239-V148I viruses had reduced replication capacity compared to wild-type SIV mac239. Direct virus competition assays demonstrated a rank order of wild-type 4K65R 4V148I mutants in terms of viral fitness. In single round in vitro-replication assays, SIV mac239-K65R demonstrated significantly higher fidelity than wild-type, and rapidly reverted to wild-type following infection of macaques. In contrast, SIV mac239-Q151N was replication incompetent in vitro and in pigtailed macaques. Thus, we showed that RT mutants, and specifically the common K65R drug-resistance mutation, had impaired replication capacity and higher fidelity. These results have implications for the pathogenesis of drug-resistant HIV.

In the most of standard learning algorithms it is presumed or at least expected that distribution... more In the most of standard learning algorithms it is presumed or at least expected that distributions governing on different classes of at-hand dataset are balanced; it means that there are the identical number of data points in each class. It is also resumed there that the misclassification cost of each data point is a fixed value regardless of its class. The standard algorithms fail to learn at
the imbalanced datasets. An imbalance dataset is the one that the distributions governing among their classes are not identical for all classes. A very well-known domain example of imbalanced datasets is automatic patient detection. In such systems there are many clients while a few of them
are patient and the others are all healthy. So it is very common to face an imbalanced dataset in a system for patient detection. In a breast cancer patient detection that is a special case of the
mentioned systems, we try to discriminate the patient clients from healthy clients. It should be noted that the imbalanced shape of a dataset can be relative where the mean number of samples is high in the minority class, but it is very less than the number of samples in the majority class. This paper presents an algorithm which is well-suited to the field of non-relative imbalanced datasets, in both speed and efficacy of learning. The experimental results show that the performance of the
proposed algorithm outperforms some of the best methods in the literature.

One of the important discussions in data mining is extracting effective and useful rules from the... more One of the important discussions in data mining is extracting effective and useful rules from the great set of datasets. So, we should follow set of features that at first; are without any noise; secondly, having a little correlation with other features. In other words, we should use instances that are distinctive with other features. So, in this paper we present a combined approach to consider how factors such as distinct features and instances are useful for extracting the rules. In this approach we used a trained neural network to explore useful features, clustering to find out the best instances from dataset and finally we used artificial immune system for rules extraction. In order to evaluating of our introduced approach, we applied it on the UCI dataset of breast cancer diagnosis. Our experiments demonstrate that the combined proposed approach generates reliable rules and contributes more accuracy eventually; these results show the proposed method has %5.9 better accuracy relative to CART method.
One of the most important tasks in pattern, machine learning, and data mining is classification p... more One of the most important tasks in pattern, machine learning, and data mining is classification problem. Introducing a general classifier is a challenge for pattern recognition communities, which enables one to learn each problem׳ s dataset. Many classifiers have
been proposed to learn any problem thus far. However, many of them have their own positive and negative aspects. So they are good only for specific problems. But there is no strong solution to recognize which classifier is better or good for a specific problem.
The influence of MHC I alleles and CTL immunityon HIV diversity has been well characterised in hum... more The influence of MHC I alleles and CTL immunityon HIV diversity has been well characterised in humans at the pop-ulation level. MHC I alleles likely affect viral diversity in the SIVinfected pigtail macaque model, but this is poorly characterised.
Clustering ensembles combine multiple partitions of data into a single clustering solution of bet... more Clustering ensembles combine multiple partitions of data into a single clustering solution of better quality. Inspired by the success of supervised bagging and boosting algorithms, we propose non-adaptive and adaptive resampling schemes for the integration of
multiple independent and dependent clusterings.We investigate the effectiveness of bagging techniques, comparing the efficacy of sampling with and without replacement, in conjunction with several consensus algorithms. In our adaptive approach, individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the
given dataset.
Inspired by bagging and boosting algorithms in classification, the non-weighing and weighing-base... more Inspired by bagging and boosting algorithms in classification, the non-weighing and weighing-based sampling approaches for clustering are proposed and studied in the paper.
The effectiveness of non-weighing-based sampling technique, comparing the efficacy of
sampling with and without replacement, in conjunction with several consensus algorithms have been invested in this paper. Experimental results have shown improved stability and
accuracy for clustering structures obtained via bootstrapping, subsampling, and boosting techniques. Subsamples of small size can reduce the computational cost and measurement
complexity for many unsupervised data mining tasks with distributed sources of data. This empirical research study also compares the performance of boosting and bagging clustering ensembles using different consensus functions on a number of datasets.

One of the main challenges in wireless sensor networks is the energy constraints of sensor nodes ... more One of the main challenges in wireless sensor networks is the energy constraints of sensor nodes which must be considered precisely when designing algorithms for such networks. Clustering is known as one of the approaches which can be used for addressing this challenge. In this paper, an efficient method for clustering wireless sensor networks by means of cellular learning automata has been presented (LaClustering). Proposed method selects cluster head (CHs) through several stages; each considers one parameter affecting the overall performance of the clustering. Parameters considered in different stages of the proposed algorithm are energy levels of the sensor nodes, number of neighbors of each node, network connectivity, and formation of balanced clusters. To evaluate the performance of the proposed method, several experiments have been conducted using the J‑sim simulator and the proposed method has been compared with some of the best clustering algorithms reported in literature. The simulation results have shown that the proposed algorithm can provide clustering infrastructure with higher overall quality than the existing algorithms, especially in balancing the number of sensor nodes in different clusters and selecting CHs with higher energy levels.

Dynamic optimization in which global optima and local optima change over time is always a hot res... more Dynamic optimization in which global optima and local optima change over time is always a hot research topic. It has been shown that particle swarm optimization works well when facing dynamic environments. On the other hand, a learning automaton can be considered as an intelligent tool (agent) which can learn what action is the best interacting with its environment. The great deluge algorithm is also a search algorithm applied to optimization problems. All these algorithms have their drawbacks and advantages. This paper explores how one can combine these algorithms to reach better performance in dynamic spaces. Indeed a learning automaton is employed per particle in the swarm to decide whether its particle updates its velocity (and consequently its position) considering the best global particle position, local particle position or a combined position extracted out of global and local particle position. Water level in the deluge algorithm is used in the progress of the algorithm. Experimental results on different dynamic environments modeled by moving peaks benchmark show that the combination of these algorithms outperforms PSO algorithm, fast multi-swarm method (FMSO), a similar particle swarm algorithm for dynamic environments, for all tested environments.

Sensor networks are established of many inexpensive sensors with limited energy and computational... more Sensor networks are established of many inexpensive sensors with limited energy and computational resources and memory. Each node can sense special information, such as the temperature, humidity, pressure and so on and then send them to the central station. One of the major challenges in these networks, is limit energy consumption and one of the ways for reducing energy consumption in wireless sensor networks, is reducing the number of packets that are transmitted in the network. Data Aggregation technique that combines related data together and prevents sending additional packets on the network can be effective in reducing the number of packets sent over the network. In this paper a Data Aggregation method based on learning automata is presented and with identifying sensors that are in the similar area, and produce the same data and enable the sensor nodes periodically avoid sending additional packets on the network, and significantly saves energy and increases the lifetime of the network. Simulation results show the optimal performance of the proposed method.

In this paper we propose an ensemble based approach for feature selection. We aim at overcoming t... more In this paper we propose an ensemble based approach for feature selection. We aim at overcoming the problem of parameter sensitivity of feature selection approaches. To do this we employ ensemble method. We get the results per different possible threshold values automatically in our algorithm. For each threshold value, we get a subset of features. We give a score to each feature in these subsets. Finally by use of ensemble method, we select the features which have the highest scores. This method is not a parameter sensitive one, and also it has been shown that using the method based on the fuzzy entropy results in more reliable selected features than the previous methods'. Empirical results show that although the efficacy of the method is not considerably decreased in most of cases (or it is even increased the performance in the most cases), the method becomes free from setting of any parameter.

In this paper, a new approach to deal with the multiclass problems is proposed. The main idea of ... more In this paper, a new approach to deal with the multiclass problems is proposed. The main idea of proposed approach is using pairwise classifiers just like a decision tree classifier. Firstly, a multiclass classifier is considered to learn the multiclass problem. After that the confusion matrix for different classes is derived by employing the trained multiclass classifier over a predefined validation set. Then the approach tries to find a hierarchical metaclasses in such a way that passing through the hierarchical metaclasses, classification of validation set has minimum error. In each level, our objective is to minimize the error between metaclasses in the validation set. This method is similar to creation of a binary tree. Each time the data is divided into two metaclasses, until there is no node greater than one class. Each node is equal to one classifier that distinguishes the metaclasses (or classes) of the left and right nodes. The genetic algorithm makes sure that we have the minimum error in confusion matrix. The Decision Tree, Support Vector Machine, MultiLayer Perceptron and K-Nearest Neighbor are used as base classifiers. Experimental results demonstrate improved accuracy on a Farsi digit handwritten dataset.

The variety of social networks and virtual communities has created problematic for users of diffe... more The variety of social networks and virtual communities has created problematic for users of different ages and preferences; in addition, since the true nature of groups is not clearly outlined, users are uncertain about joining various virtual groups and usually face the trouble of joining the undesired ones. As a solution, in this study, we introduced the hybrid community recommender system which offers customized recommendations based on user preferences. Although techniques such as content based filtering and collaborative filtering methods are available, these techniques are not enough efficient and in some cases make problems and bring limitations to users. Our method is based on a combination of content based filtering and collaborative filtering methods. It is created by selecting related features of users based on supervised entropy as well as using association rules and classification method. Supposing users in each community or group share similar characteristics, by hierarchical clustering, heterogeneous members are identified and removed. Unlike other methods, this is also applicable for users who have just joined the social network where they do not have any connections or group memberships. In such situations, this method could still offer recommendations.
Uploads
Papers by Hamid Alinejad-Rokny
among patients and also among different epitopes within a patient. The strength of the CD8+ T cell response, as well as mutation rates, patterns of particular amino acids undergoing escape, and growth rates of escape mutants, may affect when escape occurs.
In this study, we analyze the epitope-specific CD8+ T cells in 25 SIV-infected pigtail macaques responding to three SIV epitopes.
Two epitopes showed a variable escape pattern and one had a highly monomorphic escape pattern. Despite very different patterns,
immune escape occurs with a similar delay of on average 18 d after the epitope-specific CD8+ T cells reach 0.5% of total CD8+
T cells. We find that the most delayed escape occurs in one of the highly variable epitopes, and that this is associated with a delay
in the epitope-specific CD8+ T cells responding to this epitope. When we analyzed the kinetics of immune escape, we found that multiple escape mutants emerge simultaneously during the escape, implying that a diverse population of potential escape mutants
is present during immune selection. Our results suggest that the conservation or variability of an epitope does not appear to affect the timing of immune escape in SIV. Instead, timing of escape is largely determined by the kinetics of epitope-specific CD8+ T cells.
the imbalanced datasets. An imbalance dataset is the one that the distributions governing among their classes are not identical for all classes. A very well-known domain example of imbalanced datasets is automatic patient detection. In such systems there are many clients while a few of them
are patient and the others are all healthy. So it is very common to face an imbalanced dataset in a system for patient detection. In a breast cancer patient detection that is a special case of the
mentioned systems, we try to discriminate the patient clients from healthy clients. It should be noted that the imbalanced shape of a dataset can be relative where the mean number of samples is high in the minority class, but it is very less than the number of samples in the majority class. This paper presents an algorithm which is well-suited to the field of non-relative imbalanced datasets, in both speed and efficacy of learning. The experimental results show that the performance of the
proposed algorithm outperforms some of the best methods in the literature.
been proposed to learn any problem thus far. However, many of them have their own positive and negative aspects. So they are good only for specific problems. But there is no strong solution to recognize which classifier is better or good for a specific problem.
multiple independent and dependent clusterings.We investigate the effectiveness of bagging techniques, comparing the efficacy of sampling with and without replacement, in conjunction with several consensus algorithms. In our adaptive approach, individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the
given dataset.
The effectiveness of non-weighing-based sampling technique, comparing the efficacy of
sampling with and without replacement, in conjunction with several consensus algorithms have been invested in this paper. Experimental results have shown improved stability and
accuracy for clustering structures obtained via bootstrapping, subsampling, and boosting techniques. Subsamples of small size can reduce the computational cost and measurement
complexity for many unsupervised data mining tasks with distributed sources of data. This empirical research study also compares the performance of boosting and bagging clustering ensembles using different consensus functions on a number of datasets.