Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a ... more Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence for the result promulgation influence on the public pharma market value. Whereas most works focus on retrospective impact analysis, the present research aims to predict the numerical values of announcement-induced changes in stock prices. For this purpose, we develop a pipeline that includes a BERT-based model for extracting sentiment polarity of announcements, a Temporal Fusion Transformer for forecasting the expected return, a graph convolution network for capturing event relationships, and gradient boosting for predicting the price change. The challenge of the problem lies in inherently different patterns of responses to positive and negative announcements, reflected in a stronger and more pronounced reaction to the negative news. Moreover, such phenomenon as the drop in stocks after the positive announcements affirms the counterintuitiveness of the price behavior. Importantly, we discover two crucial factors that should be considered while working within a predictive framework. The first factor is the drug portfolio size of the company, indicating the greater susceptibility to an announcement in the case of small drug diversification. The second one is the network effect of the events related to the same company or nosology. All findings and insights are gained on the basis of one of the biggest FDA (the Food and Drug Administration) announcement datasets, consisting of 5436 clinical trial announcements from 681 companies over the last five years.
By 2050, around 70% of people will live in urban areas. According to the 11.2 target of UN SDG "S... more By 2050, around 70% of people will live in urban areas. According to the 11.2 target of UN SDG "Sustainable cities and communities" to provide access to safe, affordable, accessible, and sustainable transport systems for all, the aim of the paper presented was to investigate accessibility and connectivity of urban territories by public transport systems. The main emphasis of the research was directed at transport infrastructure, which can be seen as sustainable, including public transport. The quality of life in a large city is determined by the ability to get from one destination to another quickly and efficiently. To implement this task a methodology has been developed to assess the connectivity and accessibility of urban areas. The method, based on an intermodal transport graph, is presented as an example of assessing accessibility and connectivity in different districts of Saint Petersburg (Russia), Helsinki (Finland), Stockholm (Sweden), and Amsterdam (Netherlands). The results are presented as graphs with clusters of city blocks presented as points. It is indicated that different areas of the city are connected through time values differently. The method can be used to make urban planning decisions about the provision of urban infrastructure, allows for ongoing monitoring of the situation, and filling in the gaps.
The size and complexity of deep neural networks continue to grow exponentially, significantly inc... more The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an opensource package eco2AI 1 to help data scientists and researchers to track energy consumption and equivalent CO 2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy consumption tracking and correct regional CO 2 emissions accounting. We encourage research community to search for new optimal Artificial Intelligence (AI) architectures with a lower computational cost. The motivation also comes from the concept of AI-based green house gases sequestrating cycle with both Sustainable AI and Green AI pathways. Keywords ESG • Sustainable AI • Green AI • Sustainability • Ecology • Carbon footprint • CO 2 emissions • GHG
The study of the stock market with the attraction of machine learning approaches is a major direc... more The study of the stock market with the attraction of machine learning approaches is a major direction for revealing hidden market regularities. This knowledge contributes to a profound understanding of financial market dynamics and getting behavioural insights, which could hardly be discovered with traditional analytical methods. Stock prices are inherently interrelated with world events and social perception. Thus, in constructing the model for stock price prediction, the critical stage is to incorporate such information on the outside world, reflected through news and social media posts. To accommodate this, researchers leverage the implicit or explicit knowledge representations: (1) sentiments extracted from the texts or (2) raw text embeddings. However, there is too little research attention to the direct comparison of these approaches in terms of the influence on the predictive power of financial models. In this paper, we aim to close this gap and figure out whether the semantic features in the form of contextual embeddings are more valuable than sentiment attributes for forecasting market trends. We consider the corpus of Twitter posts related to the largest companies by capitalization from NASDAQ and their close prices. To start, we demonstrate the connection of tweet sentiments with the volatility of companies' stock prices. Convinced of the existing relationship, we train Temporal Fusion Transformer models for price prediction supplemented with either tweet sentiments or tweet embeddings. Our results show that in the substantially prevailing number of cases, the use of sentiment features leads to higher metrics. Noteworthy, the conclusions are justifiable within the considered scenario involving Twitter posts and stocks of the biggest tech companies.
Image Processing and Machine Learning Approaches for Petrographic Thin Section Analysis (Russian)
SPE Russian Petroleum Technology Conference, 2017
The article presents the methodology of petrographic thin section analysis, combining the algorit... more The article presents the methodology of petrographic thin section analysis, combining the algorithms of image processing and statistical learning. The methodology includes the structural description of thin sections and rock classification based on images obtained from polarized optical microscope. To evaluate the properties of structural objects in thin section (grain, cement, voids, cleavage), first they are segmented by watershed method with advanced noise reduction, preserving the boundaries of grains. Analysis of segmentation for test thin sections showed a fairly accurate contouring of mineral grains which makes possible automatically carry out the calculation of their key features (size, perimeter, contour features, elongation, orientation, etc.). The paper presents an example of particle size analysis – definition of grains size class. The roundness and rugosity coefficients of grains are estimated also. Statistical analysis of templates for manual determination of roundness and rugosity coefficients revealed drawback of examined templates in terms statistical accuracy (high dispersion of coefficient for all grain within one template, outliers presence). In the frame of classification problem the feature importance analysis and clustering of non-correctly segmented grains are handled. The classifier for rock type definition (sandstone, limestone, dolomite) is trained with decision tree method, while the classifier of mineral composition of sandstones (greywackes, arkose) is learnt with "random forest" method. Both classifiers are learnt in the feature space generated from segmented grains and their evaluated properties. As a result, we proved the possibility to conduct automatic quantitative and qualitative analysis of thin sections applying image processing and statistical learning methods.
Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a ... more Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. In this work, we provide statistical evidence for the result promulgation influence on the public pharma market value. Whereas most works focus on retrospective impact analysis, the present research aims to predict the numerical values of announcement-induced changes in stock prices. For this purpose, we develop a pipeline that includes a BERT-based model for extracting sentiment polarity of announcements, a Temporal Fusion Transformer for forecasting the expected return, a graph convolution network for capturing event relationships, and gradient boosting for predicting the price change. The challenge of the problem lies in inherently different patterns of responses to positive and negative announcements, reflected in a stronger and more pronounced reaction to the negative news. Moreover, such phenomenon as the drop in stocks after the positive announcements affirms the counterintuitiveness of the price behavior. Importantly, we discover two crucial factors that should be considered while working within a predictive framework. The first factor is the drug portfolio size of the company, indicating the greater susceptibility to an announcement in the case of small drug diversification. The second one is the network effect of the events related to the same company or nosology. All findings and insights are gained on the basis of one of the biggest FDA (the Food and Drug Administration) announcement datasets, consisting of 5436 clinical trial announcements from 681 companies over the last five years.
Well Logging Verification Using Machine Learning Algorithms
Well logging analysis plays a crucial role in the design of oil field development. The analysis d... more Well logging analysis plays a crucial role in the design of oil field development. The analysis determines the location of the reservoir and its thickness, which defines directly the estimation of oil reserves. Present paper proposes an approach to the automation and verification of logging studies, namely reservoir identification along the wellbore, based on machine learning methods. Logging data for training were taken from the real oil field in Western Siberia. The paper describes approach used for data pre-processing and key aspects of the data. In this study, we considered two methodologies for reservoir prediction: by sample with the help of gradient busting method and by interval based on one dimensional convolutional neural network.
Atomic structure and energy spectrum of Ga(As,P)/GaP heterostructures
Journal of Applied Physics, Oct 15, 2012
The atomic structure and energy spectrum of Ga(As,P)/GaP heterostructures were studied. It was sh... more The atomic structure and energy spectrum of Ga(As,P)/GaP heterostructures were studied. It was shown that the deposition of GaAs of the same nominal thickness leads to the formation of pseudomorphic GaAs/GaP quantum wells (QW), fully relaxed GaAs/GaP self-assembled quantum dots (SAQDs), or pseudomorphic GaAsP/GaP SAQDs depending on the growth temperature. We demonstrate that the atomic structure of Ga(As,P)/GaP heterostructures is ruled by the temperature dependence of adatom diffusion rate and GaAs-GaP intermixing. The band alignment of pseudomorphic GaAs/GaP QW and GaAsP/GaP SAQDs is shown to be of type II, in contrast to that of fully relaxed GaAs/GaP SAQDs, which have the band alignment of type I with the lowest electronic states at the indirect L valley of the GaAs conduction band.
Advances in systems science and applications, Jun 30, 2020
Distribution mixture models are widely used in cluster analysis. Particularly, a mixture of Stude... more Distribution mixture models are widely used in cluster analysis. Particularly, a mixture of Student t-distributions is mostly applied for robust data clustering. In this paper, we introduce EM algorithm for a mixture of Student distributions, where at the E-step, we apply variational Bayesian inference for parameters estimation. Based on the mixture of Student distributions, we construct a machine learning method that allows to solve regression problems for any set of features, clustering, and anomaly detection within one model. Each of these problems can be solved by the model even if there are missing values in the data. The proposed method was tested on real data describing the PVT properties of reservoir fluids. The results obtained by the model do not contradict the basic physical properties. In majority of conducted experiments our model gives more accurate results than well-known machine learning methods in terms of MAPE and RMSPE metrics.
Verification of Field Data and Forecast Model Based on Variational Autoencoder in the Application to the Mechanized Fund
The importance of quality, consistent data is difficult to overestimate. The better the field dat... more The importance of quality, consistent data is difficult to overestimate. The better the field data describes the real system, the higher the predictive ability of models at all levels based on it and the higher the accuracy of production decisions. This issue is particularly relevant in the context of data from the mechanized well stock. The paper presents both methods of data analysis in real time, and an approach for retrospective analysis (analysis of historical data) in the application to well data in the operation of the ESP. The key advantage of the model presented in the paper is that it allows considering a complex set of time dependencies, taking into account their mutual influence. In order to account for the dependencies between physical quantities and time, a model using probabilistic neural networks has been developed allowing for retrospective filtering and data filtering in streaming mode. The model is based on the principle of the conditional variational autoencoder model. This model of a neural network is characterized by the fact that it allows establishing the main dependencies in the data with acceptable quality under given conditions. A special feature of the research model is its probabilistic nature, that is, it is able to calculate data distributions, as well as distributions of values on certain layers. The distribution at the model output is used to estimate the degree of abnormality of objects. Special data transformations, the introduction of weights and the addition of key features, allow dealing with missing values (often observed in field data due to sensor malfunction, features of measurement measures, inconsistencies in time scales for several measurements, etc.), and fix important patterns in the data. The key point in working with features is a special scheme for preprocessing time series, namely, the introduction of weights depended on the frequency of measurements, and the calculation of weighted quantiles for different time intervals.
Uploads
Papers by Semen Budennyy