Semen Budennyy

Moscow Institute of Physics and Technology, Aero Physics and Space Research, Faculty Member

ESPCI ParisTech, Physique, Graduate Student

Novosibirsk State University, Physics, Graduate Student

Followers

Following

Co-author

Public Views

PhD in Computational Physics, Head of Applied Research at Sber AI Lab, Scientific consultant at AIRI. Alma mater: MIPT 2019, ESPCI 2014, NSU 2011.

less

InterestsView All (8)

Uploads

Papers by Semen Budennyy

Do we Benefit from the Categorization of the News Flow in the Stock Price Prediction Problem?

Doklady. Mathematics, Dec 1, 2023

eco4cast: Bridging Predictive Scheduling and Cloud Computing for Reduction of Carbon Emissions for ML Models Training

Doklady. Mathematics, Mar 11, 2024

Estimating the attractiveness of the city for skilled workers using jobs-housing matching, spatial data and NLP techniques

Procedia Computer Science, Dec 31, 2022

Comparison of Solution Methods the Maximal Covering Location Problem of Public Spaces for Teenagers in the Urban Environment

Communications in computer and information science, Dec 19, 2023

Graph neural networks for predicting structural stability of Cd- and Zn-doped <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" altimg="si5.svg" display="inline" id="d1e465"><mml:mi>γ</mml:mi></mml:math>-CsPbI3

Computational Materials Science, Dec 31, 2023

New drugs and stock market: how to predict pharma market reaction to clinical trial announcements

Research Square (Research Square), Feb 3, 2023

Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a ... more Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence for the result promulgation influence on the public pharma market value. Whereas most works focus on retrospective impact analysis, the present research aims to predict the numerical values of announcement-induced changes in stock prices. For this purpose, we develop a pipeline that includes a BERT-based model for extracting sentiment polarity of announcements, a Temporal Fusion Transformer for forecasting the expected return, a graph convolution network for capturing event relationships, and gradient boosting for predicting the price change. The challenge of the problem lies in inherently different patterns of responses to positive and negative announcements, reflected in a stronger and more pronounced reaction to the negative news. Moreover, such phenomenon as the drop in stocks after the positive announcements affirms the counterintuitiveness of the price behavior. Importantly, we discover two crucial factors that should be considered while working within a predictive framework. The first factor is the drug portfolio size of the company, indicating the greater susceptibility to an announcement in the case of small drug diversification. The second one is the network effect of the events related to the same company or nosology. All findings and insights are gained on the basis of one of the biggest FDA (the Food and Drug Administration) announcement datasets, consisting of 5436 clinical trial announcements from 681 companies over the last five years.

Download

Hierarchical waste detection with weakly supervised segmentation in images from recycling plants

Engineering Applications of Artificial Intelligence

Assessing the transport connectivity of urban territories, based on intermodal transport accessibility

Frontiers in Built Environment, Jun 15, 2023

By 2050, around 70% of people will live in urban areas. According to the 11.2 target of UN SDG "S... more By 2050, around 70% of people will live in urban areas. According to the 11.2 target of UN SDG "Sustainable cities and communities" to provide access to safe, affordable, accessible, and sustainable transport systems for all, the aim of the paper presented was to investigate accessibility and connectivity of urban territories by public transport systems. The main emphasis of the research was directed at transport infrastructure, which can be seen as sustainable, including public transport. The quality of life in a large city is determined by the ability to get from one destination to another quickly and efficiently. To implement this task a methodology has been developed to assess the connectivity and accessibility of urban areas. The method, based on an intermodal transport graph, is presented as an example of assessing accessibility and connectivity in different districts of Saint Petersburg (Russia), Helsinki (Finland), Stockholm (Sweden), and Amsterdam (Netherlands). The results are presented as graphs with clusters of city blocks presented as points. It is indicated that different areas of the city are connected through time values differently. The method can be used to make urban planning decisions about the provision of urban infrastructure, allows for ongoing monitoring of the situation, and filling in the gaps.

Download

Assessment of Spatial Inequality Through the Accessibility of Urban Services

Lecture Notes in Computer Science, 2023

Eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI

arXiv (Cornell University), Jul 31, 2022

The size and complexity of deep neural networks continue to grow exponentially, significantly inc... more The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an opensource package eco2AI 1 to help data scientists and researchers to track energy consumption and equivalent CO 2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy consumption tracking and correct regional CO 2 emissions accounting. We encourage research community to search for new optimal Artificial Intelligence (AI) architectures with a lower computational cost. The motivation also comes from the concept of AI-based green house gases sequestrating cycle with both Sustainable AI and Green AI pathways. Keywords ESG • Sustainable AI • Green AI • Sustainability • Ecology • Carbon footprint • CO 2 emissions • GHG

Download

Mixture of probability distributions in the problemsof regression and anomaly detection and its applications to PVT properties

Trudy Moskovskogo fiziko-tehničeskogo instituta (gosudarstvennogo universiteta), 2020

Application of Machine Learning for Oilfield Data Quality Improvement (Russian)

The Battle of Information Representations: Comparing Sentiment and Semantic Features for Forecasting Market Trends

arXiv (Cornell University), Mar 24, 2023

The study of the stock market with the attraction of machine learning approaches is a major direc... more The study of the stock market with the attraction of machine learning approaches is a major direction for revealing hidden market regularities. This knowledge contributes to a profound understanding of financial market dynamics and getting behavioural insights, which could hardly be discovered with traditional analytical methods. Stock prices are inherently interrelated with world events and social perception. Thus, in constructing the model for stock price prediction, the critical stage is to incorporate such information on the outside world, reflected through news and social media posts. To accommodate this, researchers leverage the implicit or explicit knowledge representations: (1) sentiments extracted from the texts or (2) raw text embeddings. However, there is too little research attention to the direct comparison of these approaches in terms of the influence on the predictive power of financial models. In this paper, we aim to close this gap and figure out whether the semantic features in the form of contextual embeddings are more valuable than sentiment attributes for forecasting market trends. We consider the corpus of Twitter posts related to the largest companies by capitalization from NASDAQ and their close prices. To start, we demonstrate the connection of tweet sentiments with the volatility of companies' stock prices. Convinced of the existing relationship, we train Temporal Fusion Transformer models for price prediction supplemented with either tweet sentiments or tweet embeddings. Our results show that in the substantially prevailing number of cases, the use of sentiment features leads to higher metrics. Noteworthy, the conclusions are justifiable within the considered scenario involving Twitter posts and stocks of the biggest tech companies.

Download

Image Processing and Machine Learning Approaches for Petrographic Thin Section Analysis (Russian)

SPE Russian Petroleum Technology Conference, 2017

The article presents the methodology of petrographic thin section analysis, combining the algorit... more The article presents the methodology of petrographic thin section analysis, combining the algorithms of image processing and statistical learning. The methodology includes the structural description of thin sections and rock classification based on images obtained from polarized optical microscope. To evaluate the properties of structural objects in thin section (grain, cement, voids, cleavage), first they are segmented by watershed method with advanced noise reduction, preserving the boundaries of grains. Analysis of segmentation for test thin sections showed a fairly accurate contouring of mineral grains which makes possible automatically carry out the calculation of their key features (size, perimeter, contour features, elongation, orientation, etc.). The paper presents an example of particle size analysis – definition of grains size class. The roundness and rugosity coefficients of grains are estimated also. Statistical analysis of templates for manual determination of roundness and rugosity coefficients revealed drawback of examined templates in terms statistical accuracy (high dispersion of coefficient for all grain within one template, outliers presence). In the frame of classification problem the feature importance analysis and clustering of non-correctly segmented grains are handled. The classifier for rock type definition (sandstone, limestone, dolomite) is trained with decision tree method, while the classifier of mineral composition of sandstones (greywackes, arkose) is learnt with &amp;amp;quot;random forest&amp;amp;quot; method. Both classifiers are learnt in the feature space generated from segmented grains and their evaluated properties. As a result, we proved the possibility to conduct automatic quantitative and qualitative analysis of thin sections applying image processing and statistical learning methods.

Reliability Assessment of PVT-Properties of Reservoir Fluids on the Basis of a Probabilistic Mixture Model of Student's Distributions (Russian)

New drugs and stock market: how to predict pharma market reaction to clinical trial announcements

arXiv (Cornell University), Aug 11, 2022

Download

Well Logging Verification Using Machine Learning Algorithms

Well logging analysis plays a crucial role in the design of oil field development. The analysis d... more Well logging analysis plays a crucial role in the design of oil field development. The analysis determines the location of the reservoir and its thickness, which defines directly the estimation of oil reserves. Present paper proposes an approach to the automation and verification of logging studies, namely reservoir identification along the wellbore, based on machine learning methods. Logging data for training were taken from the real oil field in Western Siberia. The paper describes approach used for data pre-processing and key aspects of the data. In this study, we considered two methodologies for reservoir prediction: by sample with the help of gradient busting method and by interval based on one dimensional convolutional neural network.

Atomic structure and energy spectrum of Ga(As,P)/GaP heterostructures

Journal of Applied Physics, Oct 15, 2012

The atomic structure and energy spectrum of Ga(As,P)/GaP heterostructures were studied. It was sh... more The atomic structure and energy spectrum of Ga(As,P)/GaP heterostructures were studied. It was shown that the deposition of GaAs of the same nominal thickness leads to the formation of pseudomorphic GaAs/GaP quantum wells (QW), fully relaxed GaAs/GaP self-assembled quantum dots (SAQDs), or pseudomorphic GaAsP/GaP SAQDs depending on the growth temperature. We demonstrate that the atomic structure of Ga(As,P)/GaP heterostructures is ruled by the temperature dependence of adatom diffusion rate and GaAs-GaP intermixing. The band alignment of pseudomorphic GaAs/GaP QW and GaAsP/GaP SAQDs is shown to be of type II, in contrast to that of fully relaxed GaAs/GaP SAQDs, which have the band alignment of type I with the lowest electronic states at the indirect L valley of the GaAs conduction band.

Student Mixture and Its Machine Learning Applications to PVT Properties of Reservoir Fluids

Advances in systems science and applications, Jun 30, 2020

Distribution mixture models are widely used in cluster analysis. Particularly, a mixture of Stude... more Distribution mixture models are widely used in cluster analysis. Particularly, a mixture of Student t-distributions is mostly applied for robust data clustering. In this paper, we introduce EM algorithm for a mixture of Student distributions, where at the E-step, we apply variational Bayesian inference for parameters estimation. Based on the mixture of Student distributions, we construct a machine learning method that allows to solve regression problems for any set of features, clustering, and anomaly detection within one model. Each of these problems can be solved by the model even if there are missing values in the data. The proposed method was tested on real data describing the PVT properties of reservoir fluids. The results obtained by the model do not contradict the basic physical properties. In majority of conducted experiments our model gives more accurate results than well-known machine learning methods in terms of MAPE and RMSPE metrics.

Download

Verification of Field Data and Forecast Model Based on Variational Autoencoder in the Application to the Mechanized Fund

The importance of quality, consistent data is difficult to overestimate. The better the field dat... more The importance of quality, consistent data is difficult to overestimate. The better the field data describes the real system, the higher the predictive ability of models at all levels based on it and the higher the accuracy of production decisions. This issue is particularly relevant in the context of data from the mechanized well stock. The paper presents both methods of data analysis in real time, and an approach for retrospective analysis (analysis of historical data) in the application to well data in the operation of the ESP. The key advantage of the model presented in the paper is that it allows considering a complex set of time dependencies, taking into account their mutual influence. In order to account for the dependencies between physical quantities and time, a model using probabilistic neural networks has been developed allowing for retrospective filtering and data filtering in streaming mode. The model is based on the principle of the conditional variational autoencoder model. This model of a neural network is characterized by the fact that it allows establishing the main dependencies in the data with acceptable quality under given conditions. A special feature of the research model is its probabilistic nature, that is, it is able to calculate data distributions, as well as distributions of values on certain layers. The distribution at the model output is used to estimate the degree of abnormality of objects. Special data transformations, the introduction of weights and the addition of key features, allow dealing with missing values (often observed in field data due to sensor malfunction, features of measurement measures, inconsistencies in time scales for several measurements, etc.), and fix important patterns in the data. The key point in working with features is a special scheme for preprocessing time series, namely, the introduction of weights depended on the frequency of measurements, and the calculation of weighted quantiles for different time intervals.

Semen Budennyy

Uploads

Papers by Semen Budennyy

Log In