Papers by Iskander Akhmetov
The arXive dataset extract with high ROUGE score summaries generated by 5 different methods
The arXive dataset extract of 17,038 scientific articles with high ROUGE score summaries was gene... more The arXive dataset extract of 17,038 scientific articles with high ROUGE score summaries was generated by five different methods. The study in our paper "Reaching for the Upper Bound of Extractive Summarization Methods ROUGE score" aimed to discover the ROUGE score upper-boundary for Extractive Summarization methods.
Handling data imbalance using CNN and LSTM in financial news sentiment analysis
2021 16th International Conference on Electronics Computer and Computation (ICECCO)

IEEE Access
This work presents a method for summarizing scientific articles from the arXive and PubMed datase... more This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.
Topic-aware sentiment analysis
Using K-Means and Variable Neighborhood Search for Automatic Summarization of Scientific Articles
Res. Comput. Sci., 2020
In this article, we consider the problem of supervised morphological analysis using an approach t... more In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in particular, on the algorithms of regression analysis, trained on the open grammatical dictionary of Russian language. Comparison of obtained results was performed with existing alternative applications that are used nowadays for addressing lemmatization problems in NLP problems for Russian language. The proposed method shows some potential for further development as it has comparable quality but uses relatively simple machine learning algorithm and at the same time is not rule based involving no manual work. The source code for our lemmatizer is publicly available.
Determining the Relationship Between the Letters in the Voynich Manuscript Splitting the Text into Parts
Advances in Soft Computing
Plagiarism Detection in Students’ Answers Using FP-Growth Algorithm
Advances in Soft Computing
POLIBITS, 2020
We describe our approach towards the hospital bed demand forecasting in Nur-Sultan (Kazakhstan) f... more We describe our approach towards the hospital bed demand forecasting in Nur-Sultan (Kazakhstan) for new hospital building capacity planning and rationalization. Autoregression was used to project future population size. The hospital bed demand was forecasted using the regression model built on the data available on past years' population sizes and respective demand calculated on the disease diagnosis and hospitalization rates information.

Computacion y Sistemas, 2020
Lemmatization is a process of finding the base morphological form (lemma) of a word. It is an imp... more Lemmatization is a process of finding the base morphological form (lemma) of a word. It is an important step in many natural language processing, information retrieval, and information extraction tasks, among others. We present an open-source language-independent lemmatizer based on the Random Forest classification model. This model is a supervised machine-learning algorithm with decision trees that are constructed corresponding to the grammatical features of the language. This lemmatizer does not require any manual work for hard-coding of the rules, and at the same time, it is simple and interpretable. We compare the performance of our lemmatizer with that of the UDPipe lemmatizer on twenty-two out of twenty-five languages we work on for which UDPipe has models. Our lemmatization method shows good performance in different languages from various language groups, and it is easily extensible to other languages. The source code of our lemmatizer is publicly available.
Research in Computing Science, 2020
In this article, we consider the problem of supervised morphological analysis using an approach t... more In this article, we consider the problem of supervised morphological analysis using an approach that differs from industry spread analogs. The article describes a new method of lemmatization based on the algorithms of machine learning, in particular, on the algorithms of regression analysis, trained on the open grammatical dictionary of Russian language. Comparison of obtained results was performed with existing alternative applications that are used nowadays for addressing lemmatization problems in NLP problems for Russian language. The proposed method shows some potential for further development as it has comparable quality but uses relatively simple machine learning algorithm and at the same time is not rule based involving no manual work. The source code for our lemmatizer is publicly available.
Uploads
Papers by Iskander Akhmetov