Iskander Akhmetov

Followers

Following

Public Views

Iskander Akhmetov is a Machine learning Engineer at the Institute of Information and Computational Technologies and a Kazakh-British Technical University 2nd year Ph.D. student in the specialty "Information Systems," the framework of the educational program Data Science. The theme of his Ph.D. research is "Scientific text summarization approach development, "and Dr. Alexander Gelbukh is a foreign supervisor for his Ph.D. work. Iskander Akhmetov holds a master's degree in technical sciences, specializing in "Information Systems."
Supervisors: Alexander Gelbukh and Alexander Pak

less

Interests

Uploads

Papers by Iskander Akhmetov

The arXive dataset extract with high ROUGE score summaries generated by 5 different methods

The arXive dataset extract of 17,038 scientific articles with high ROUGE score summaries was gene... more The arXive dataset extract of 17,038 scientific articles with high ROUGE score summaries was generated by five different methods. The study in our paper "Reaching for the Upper Bound of Extractive Summarization Methods ROUGE score" aimed to discover the ROUGE score upper-boundary for Extractive Summarization methods.

Handling data imbalance using CNN and LSTM in financial news sentiment analysis

2021 16th International Conference on Electronics Computer and Computation (ICECCO)

Greedy Optimization Method for Extractive Summarization of Scientific Articles

IEEE Access

This work presents a method for summarizing scientific articles from the arXive and PubMed datase... more This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

Download

Topic-aware sentiment analysis

Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles

An Open-Source Lemmatizer for Russian Language based on Tree Regression Models

Res. Comput. Sci., 2020

In this article, we consider the problem of supervised morphological analysis using an approach t... more In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in particular, on the algorithms of regression analysis, trained on the open grammatical dictionary of Russian language. Comparison of obtained results was performed with existing alternative applications that are used nowadays for addressing lemmatization problems in NLP problems for Russian language. The proposed method shows some potential for further development as it has comparable quality but uses relatively simple machine learning algorithm and at the same time is not rule based involving no manual work. The source code for our lemmatizer is publicly available.

Download

Determining the Relationship Between the Letters in the Voynich Manuscript Splitting the Text into Parts

Advances in Soft Computing

Plagiarism Detection in Students’ Answers Using FP-Growth Algorithm

Advances in Soft Computing

Hospital Bed Demand Forecasting: A Case Study from Health Industry

POLIBITS, 2020

We describe our approach towards the hospital bed demand forecasting in Nur-Sultan (Kazakhstan) f... more We describe our approach towards the hospital bed demand forecasting in Nur-Sultan (Kazakhstan) for new hospital building capacity planning and rationalization. Autoregression was used to project future population size. The hospital bed demand was forecasted using the regression model built on the data available on past years' population sizes and respective demand calculated on the disease diagnosis and hospitalization rates information.

Download

Highly Language-Independent Word Lemmatization Using a Machine-Learning Classifier

Computacion y Sistemas, 2020

Lemmatization is a process of finding the base morphological form (lemma) of a word. It is an imp... more Lemmatization is a process of finding the base morphological form (lemma) of a word. It is an important step in many natural language processing, information retrieval, and information extraction tasks, among others. We present an open-source language-independent lemmatizer based on the Random Forest classification model. This model is a supervised machine-learning algorithm with decision trees that are constructed corresponding to the grammatical features of the language. This lemmatizer does not require any manual work for hard-coding of the rules, and at the same time, it is simple and interpretable. We compare the performance of our lemmatizer with that of the UDPipe lemmatizer on twenty-two out of twenty-five languages we work on for which UDPipe has models. Our lemmatization method shows good performance in different languages from various language groups, and it is easily extensible to other languages. The source code of our lemmatizer is publicly available.

Download

An Open-Source Lemmatizer for Russian Language based on Tree Regression Models

Research in Computing Science, 2020

Download

Iskander Akhmetov

Uploads

Papers by Iskander Akhmetov

Log In