Papers by Robert Harrison

Protein Engineering Design & Selection, Jun 1, 1999
Drug-resistant mutants of HIV-1 protease limit the longterm effectiveness of current anti-viral t... more Drug-resistant mutants of HIV-1 protease limit the longterm effectiveness of current anti-viral therapy. In order to study drug resistance, the wild-type HIV-1 protease and the mutants R8Q, V32I, M46I, V82A, V82I, V82F, I84V, V32I/I84V and M46I/I84V were modeled with the inhibitors saquinavir and indinavir using the program AMMP. A new screen term was introduced to reproduce more correctly the electron distribution of atoms. The atomic partial charge was represented as a delocalized charge distribution instead of a point charge. The calculated protease-saquinavir interaction energies showed the highly significant correlation of 0.79 with free energy differences derived from the measured inhibition constants for all 10 models. Three different protonation states of indinavir were evaluated. The best indinavir model included a sulfate and gave a correlation coefficient of 0.68 between the calculated interaction energies and free energies from inhibition constants for nine models. The exception was R8Q with indinavir, probably due to differences in the solvation energy. No significant correlation was found using the standard molecular mechanics terms. The incorporation of the new screen correction resulted in better prediction of the effects of inhibitors on resistant protease variants and has potential for selecting more effective inhibitors for resistant virus.
Biochemistry, Apr 15, 2016
Journal of Biological Chemistry, 1994

Preliminary crystal structure of Acinetobacter glutaminasificans glutaminase-asparaginase
Journal of Biological Chemistry, 1988
The preliminary structure of a glutaminase-asparaginase from Acinetobacter glutaminasificans is r... more The preliminary structure of a glutaminase-asparaginase from Acinetobacter glutaminasificans is reported. The structure was determined at 3.0-A resolution with a combination of phase information from multiple isomorphous replacement at 4-5-A resolution and phase improvement and extension by two density modification techniques. The electron density map was fitted by a polypeptide chain that was initially polyalanine. This was subsequently replaced by a polypeptide with an amino acid sequence in agreement with the sizes and shapes of the side chain electron densities. The crystallographic R factor is 0.300 following restrained least squares refinement with data to 2.9-A resolution. The A. glutaminasificans glutaminase-asparaginase subunit folds into two domains: the aminoterminal domain contains a five-stranded beta sheet surrounded by five alpha helices, while the carboxyl-terminal domain contains three alpha helices and less regular structure. The connectivity is not fully determined at present, due in part to the lack of a complete amino acid sequence. The A. glutaminasificans glutaminase-asparaginase structure has been used successfully to determine the relative orientations of the molecules in crystals of Pseudomonas 7A glutaminase-asparaginase, in crystals of Vibrio succinogenes asparaginase, and in a new crystal form of Escherichia coli asparaginase (space group 1222, one subunit per asymmetric unit).
Journal of Medicinal Chemistry, 2013
Journal of Biological Chemistry, 1995
Journal of Biological Chemistry, 1996
Three-Dimensional Quantitative Structure Activity Relationships

Site-directed mutagenesis studies on the determinants of sugar specificity and cooperative behavior of human beta-cell glucokinase
Journal of Biological Chemistry, 1994
The determinants of sugar specificity and cooperative behavior of human beta-cell glucokinase wer... more The determinants of sugar specificity and cooperative behavior of human beta-cell glucokinase were studied by mutating several active site residues and performing a steady-state kinetic analysis of the purified mutant and wild-type enzymes after their expression in Escherichia coli. Asn-204, Glu-256, and Glu-290 were predicted from molecular modeling to interact with the 3-OH, 4-OH, 2-OH, and 1-OH groups of glucose. Mutation of these residues resulted in enzymes with decreased values of kcat and increased values of Km for glucose, mannose, and 2-deoxyglucose. Lys-56 is also predicted to make an interaction with the side chain of Glu-256 and its mutation increased the Km for glucose, deoxyglucose, mannose, and fructose by 4-, 4-, 3-, and 10-fold, respectively, and also increased the kcat for fructose by 5-fold. The Ki values for N-acetylglucosamine and mannoheptulose for the wild-type enzyme were 0.2 and 0.8 mM, respectively, and mutation of glucose binding residues to alanine resulted in an increase of about 3 orders of magnitude in these Ki values. Mutation of residues that directly hydrogen bond glucose hydroxyls (Asn-204, Glu-256, and Glu-290) to alanine resulted in enzymes that did not exhibit cooperative behavior, but mutation of Lys-56 or other residues that do not directly contact glucose had no effect on the Hill coefficient. Only glucose and deoxyglucose exhibited cooperative behavior. The results 1) confirm the predictions of the model that Asn-204, Glu-256, and Glu-290 are important residues involved in catalysis and hydrogen bonding glucose hydroxyl groups, 2) provide evidence for a role of Lys-56 in hexose binding, and 3) are consistent with the cooperative behavior of glucokinase being mediated by interactions of other regions of the protein with the highly conserved active site glucose binding residues.
Factoring tertiary classification into binary classification improves neural network for protein secondary structure prediction
IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing (IEEE Cat. No.04CH37612)
Protein secondary structure prediction is one of the most important problems in bioinformatics re... more Protein secondary structure prediction is one of the most important problems in bioinformatics research. When the traditional tertiary classifier is used in our neural network, 72% accuracy is reached. Since the neural network might not work very well in three-class classification for certain domains, the three-class problem is reduced to six binary class problems for the first time to carry

Knowledge Discovery in Bioinformatics, 2007
The desire to understand protein structure has produced many approaches over the last four decade... more The desire to understand protein structure has produced many approaches over the last four decades since Blout et al. (1960) attempted to correlate the sequence information of amino acids with their structural elements (Casbon, 2002). Instead of costly and time-consuming experimental approaches, effective prediction methods have been developed continuously. With the help of growing databases and the evolutionary information available from multiple-sequence alignments, resources for secondary-structure prediction became abundant. Also, progress in machine learning technology provided various advanced tools for prediction. Among the many machine learning approaches, support vector machine (SVM) methods are the most recent to be used for structure prediction. SVMs perform successfully, but compared with other machine learning approaches, there is no systematic review in the SVM approach when applied to secondary-structure prediction. Therefore, this study focuses mainly on methods of predicting secondary structure based on support vector machines. The organization of this chapter is as follows. In Section 1.1, traditional secondary-structure prediction approaches are described. In Section 1.2, various SVM-based prediction methods are introduced. In Section 1.3, the performance of SVM methods is evaluated, and in Section 1.4, problems with the SVM approach and efforts to overcome them are discussed.
International Journal of Data Mining and Bioinformatics, 2011
We modified an existing association rule-based classifier CPAR to improve traditional black box m... more We modified an existing association rule-based classifier CPAR to improve traditional black box model based learning machine approaches on Transmembrane (TM) segment prediction. The modified classifier was improved further by combining with SVM. The experimental results indicate that this hybrid scheme offers biologically meaningful rules on TM/EM segment prediction while maintaining the performance almost as well as the SVM method. The evaluation of the sturdiness and the Receiver Operating Characteristic (ROC) curve analysis proved that this new scheme is robust and competent with SVM on TM/EM segment prediction. The prediction server is available at http://bmcc2.cs.gsu.edu/~haeh2/.

Protein Engineering Design and Selection, 2003
Covalent attachment of hydrogen to the donor atom may be not an essential characteristic of stabl... more Covalent attachment of hydrogen to the donor atom may be not an essential characteristic of stable hydrogen bonds. A positively charged particle (such as a proton), located between the two negatively charged residues, may lead to a stable interaction of the two negative residues. This paper analyzes close Asp-Glu pairs of residues in a large set of protein chains; 840 such pairs of residues were identified, of which 28% were stabilized by a metal ion, 12% by a positive residue nearby and 60% are likely to be stabilized by a proton. The absence of apparent structural constraints, secondary structure preferences, somewhat lower B-factors and a distinct correlation between pH and the minimal O-O distance in carboxylate pairs suggest that most of the abnormally close pairs could indeed be stabilized by a shared proton. Implications for protein stability and modeling are discussed.
International Journal of Computational Biology and Drug Design, 2009
Protein sequence motifs have the potential to determine the conformation, function and activities... more Protein sequence motifs have the potential to determine the conformation, function and activities of the proteins. In order to obtain protein sequence motifs which are universally conserved across protein family boundaries, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is demanded. We create two granular computing models to efficiently generate protein motif information which transcend protein family boundaries. We have performed a comprehensive comparison between the two models. In addition, we further combine the results from the FIK and FGK models to generate our best sequence motif information.

IEEE Transactions on Nanobioscience, 2006
Support vector machines (SVMs) have shown strong generalization ability in a number of applicatio... more Support vector machines (SVMs) have shown strong generalization ability in a number of application areas, including protein structure prediction. However, the poor comprehensibility hinders the success of the SVM for protein structure prediction. The explanation of how a decision made is important for accepting the machine learning technology, especially for applications such as bioinformatics. The reasonable interpretation is not only useful to guide the "wet experiments," but also the extracted rules are helpful to integrate computational intelligence with symbolic AI systems for advanced deduction. On the other hand, a decision tree has good comprehensibility. In this paper, a novel approach to rule generation for protein secondary structure prediction by integrating merits of both the SVM and decision tree is presented. This approach combines the SVM with decision tree into a new algorithm called SVM_DT, which proceeds in three steps. This algorithm first trains an SVM. Then, a new training set is generated through careful selection from the output of the SVM. Finally, the obtained training set is used to train a decision tree learning system and to extract the corresponding rule sets. The results of the experiments of protein secondary structure prediction on RS126 data set show that the comprehensibility of SVM_DT is much better than that of the SVM. Moreover, the generalization ability of SVM_DT is better than that of C4.5 decision trees and is similar to that of the SVM. Hence, SVM_DT can be used not only for prediction, but also for guiding biological experiments.

IEEE Transactions on Nanobioscience, 2005
Information about local protein sequence motifs is very important to the analysis of biologically... more Information about local protein sequence motifs is very important to the analysis of biologically significant conserved regions of protein sequences. These conserved regions can potentially determine the diverse conformation and activities of proteins. In this work, recurring sequence motifs of proteins are explored with an improved K-means clustering algorithm on a new dataset. The structural similarity of these recurring sequence clusters to produce sequence motifs is studied in order to evaluate the relationship between sequence motifs and their structures. To the best of our knowledge, the dataset used by our research is the most updated dataset among similar studies for sequence motifs. A new greedy initialization method for the K-means algorithm is proposed to improve traditional K-means clustering techniques. The new initialization method tries to choose suitable initial points, which are well separated and have the potential to form high-quality clusters. Our experiments indicate that the improved K-means algorithm satisfactorily increases the percentage of sequence segments belonging to clusters with high structural similarity. Careful comparison of sequence motifs obtained by the improved and traditional algorithms also suggests that the improved K-means clustering algorithm may discover some relatively weak and subtle sequence motifs, which are undetectable by the traditional K-means algorithms. Many biochemical tests reported in the literature show that these sequence motifs are biologically meaningful. Experimental results also indicate that the improved K-means algorithm generates more detailed sequence motifs representing common structures than previous research. Furthermore, these motifs are universally conserved sequence patterns across protein families, overcoming some weak points of other popular sequence motifs. The satisfactory result
Journal of Biological Chemistry, 1994

BMC Bioinformatics, 2018
Background: Drug resistance in HIV is the major problem limiting effective antiviral therapy. Com... more Background: Drug resistance in HIV is the major problem limiting effective antiviral therapy. Computational techniques for predicting drug resistance profiles from genomic data can accelerate the appropriate choice of therapy. These techniques can also be used to select protease mutants for experimental studies of resistance and thereby assist in the development of next-generation therapies. Results: The machine learning produced highly accurate and robust classification of HIV protease resistance. Genotype data were mapped to the enzyme structure and encoded using Delaunay triangulation. Generative machine learning models trained on one inhibitor could classify resistance from other inhibitors with varying levels of accuracy. Generally, the accuracy was best when the inhibitors were chemically similar. Conclusions: Restricted Boltzmann Machines are an effective machine learning tool for classification of genomic and structural data. They can also be used to compare resistance profiles of different protease inhibitors.

A model architecture for Big Data applications using relational databases
2014 IEEE International Conference on Big Data (Big Data), 2014
Effective Big Data applications dynamically handle the retrieval of decisioned results based on s... more Effective Big Data applications dynamically handle the retrieval of decisioned results based on stored large datasets efficiently. One effective method of requesting decisioned results, or querying, large datasets is the use of SQL and database management systems such as MySQL. But a problem with using relational databases to store huge datasets is the decisioned result retrieval time, which is often slow largely due to poorly written queries / decision requests. This work presents a model to re-architect Big Data applications in order to efficiently present decisioned results: lowering the volume of data being handled by the application itself, and significantly decreasing response wait times while allowing the flexibility and permanence of a standard relational SQL database, supplying optimal user satisfaction in today's Data Analytics world. In this paper we review a Big Data case study in the telecommunications field and use it to experimentally demonstrate the effectiveness of our approach.

Identifying essential features for the classification of real and pseudo microRNAs precursors using fuzzy decision trees
2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2010
MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches... more MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches to identify microRNAs are expensive and time-consuming. Computational approaches have proven to be useful for identifying microRNA candidates. Most approaches rely on features extracted from miroRNA precursors (pre-microRNA) and their secondary structure. Selecting the appropriate set of features plays a critical role in improving the prediction accuracy of pre-microRNA candidates. This work aims to investigate the triplet elements encoding scheme and to identify essential features needed for the correct classification of pre-microRNAs. To achieve these goals, an extension of the triplet elements encoding scheme is introduced. Features extracted using the extended scheme were combined with global features introduced in the literature, and fuzzy decision tree (FDT) is used as a classification and a feature selection tool. Unlike previous machine-learning-based approaches, FDT produces a human comprehensible classification model. The interpretability of the classification model provides a means to identify the essential features needed to recognize microRNA candidates and offers a better understanding of this problem. Our results indicate that the triplet elements scheme is not superior to any of its proposed extensions. Further analysis revealed that including the features extracted using triplet elements scheme does not add any value to this classification problem but rather introduces some noisy features, and comparable classification results can be achieved by using only the six global features identified by FDT.
Uploads
Papers by Robert Harrison