Papers by Artem Polyvyanyy

CAiSE, 2024
Process discovery is the problem of automatically constructing a process model from an event log ... more Process discovery is the problem of automatically constructing a process model from an event log of an information system that supports the execution of a business process in an organisation. In this paper, we study how to construct models that, in addition to the control flow of the process, capture the importance, in terms of probabilities, of various execution scenarios of the process. Such probabilistic aspects of the process are instrumental in understanding the process and to predict aspects of its future. We formally define the problem of stochastic process discovery, which aims to describe the processes captured in the event log. We study several implications of this definition, and introduce two discovery techniques that return optimal solutions in the presence and absence of a model of the control flow of the process. The proposed discovery techniques have been implemented and are publicly available. Finally, we evaluate the feasibility and applicability of the new techniques and show that their models outperform models constructed using existing stochastic discovery techniques.

CAiSE Forum, 2024
Process mining studies ways to use event data generated by information systems to understand and ... more Process mining studies ways to use event data generated by information systems to understand and improve the business processes of organizations. One of the core problems in process mining is process discovery. A process discovery algorithm takes event data as input and constructs a process model that describes the processes the system that generated the data can execute. The discovered model, hence, aims to represent both historical processes with traces in the data and the yet unseen processes of the system (total generalization). In this paper, we introduce process forecasting as an alternative approach to process discovery. First, given historical event data, the corresponding future event data is forecasted for a requested period in the future (event data forecasting). Then, a process model is constructed from the forecasted data to describe the processes the system is anticipated to execute during the target future period (process model forecasting). The benefits of this alternative approach are at least twofold. Firstly, it divides the problem into two fundamentally different sub-problems that can be studied and mastered separately. Secondly, a forecasted model that describes the processes of the system from a given period rather than in general (tailored generalization) can help organizations plan future operations and process improvement initiatives.

IEEE Access, 2025
Enterprise systems, such as enterprise resource planning, customer relationship management, and s... more Enterprise systems, such as enterprise resource planning, customer relationship management, and supply chain management systems, are widely used in corporate sectors and are notorious for being large, inflexible and monolithic. Their many application-specific methods are challenging to decouple manually because they manage asynchronous, user-driven business processes and business objects having complex structural relationships. We present an automated technique for identifying parts of enterprise systems that can run separately as fine-grained microservices in flexible and scalable Cloud systems. Our remodularization technique uses both semantic properties of enterprise systems, i.e., domain-level business object and method relationships, together with syntactic features of the methods' code, e.g., their call patterns and structural similarity. Semantically, business objects derived from databases form the basis for prospective clustering of those methods that act on them as modules, while on a syntactic level, structural and interaction details between the methods themselves provide further insights into module dependencies for grouping, based on K-means clustering and optimization. Our technique was prototyped and validated using two open-source enterprise customer relationship management systems, SugarCRM and ChurchCRM. The empirical results demonstrate improved feasibility of remodularizing enterprise systems, inclusive of coded business objects and methods, compared to microservices constructed using class-level decoupling of business objects only. Furthermore, the microservices recommended, integrated with ''backend'' enterprise systems, demonstrate improvements in execution efficiency, scalability, and availability.

CAiSE, 2025
Process discovery studies ways to construct process models from event logs of historical executio... more Process discovery studies ways to construct process models from event logs of historical executions of a system. While discovered models aim to describe the system, process model forecasting aims to construct models that faithfully describe the executions the system will perform in a given period in the future, informing timely system improvements. Existing approaches tackle the problem of process model forecasting by decomposing it into multiple univariate time series forecasting problems. They forecast each directly-follows constraint over a pair of process activities separately and then aggregate these individual forecasts into the resulting process model. In this paper, we propose a deep learning-based approach that leverages multivariate time series forecasting to solve the process model forecasting problem. Our method learns dependencies across all activity constraints simultaneously, generating an integrated forecast of the entire model at once. Through evaluation over industrial event logs, we demonstrate that this approach significantly outperforms existing baselines and statistical multivariate methods in accuracy. Additionally, we introduce a new measure to evaluate the structural correctness of the forecasted models. In the context of information systems engineering, our work addresses the challenge of predicting process models to support future process planning and optimization.

Information and Software Technology, 2025
Context:
With the acknowledged benefits of microservices architectures, such as scalability, flex... more Context:
With the acknowledged benefits of microservices architectures, such as scalability, flexibility, improved maintenance, and deployment, legacy software systems are increasingly being reengineered into microservices. Recently, a plethora of methods, techniques, tools, and evaluation criteria for reengineering software systems into microservices have been proposed without being systematized.
Objectives:
The objective of this work is to conduct an in-depth systematic literature review to identify and analyze methods, techniques, and tools for reengineering software systems into microservices and the ways for evaluating such reengineering initiatives and their results.
Methods:
A systematic literature review of works on reengineering software systems into microservices was performed, yielding 117 primary studies. The review focused on addressing key research questions concerning the evolution of microservices reengineering, methodologies employed, tools available, and the challenges faced in the reengineering process. We used a taxonomy development method to systematize knowledge in these areas.
Results:
The analysis revealed multiple reengineering approaches: static, dynamic, hybrid, and artifact-driven. Significant evaluation criteria identified include coupling, cohesion, and modularity. Key paradigms for microservices reengineering, such as domain-driven design and interface analysis, were identified and discussed. The study also highlights that incremental and iterative transitions are favored in practice.
Conclusion:
This study provides a structured overview of the current state of research on reengineering software systems into microservices. It highlights challenges in existing reengineering methodologies. Future directions include validating behavioral equivalence of original and reengineered systems, automating microservices generation, and refining database layer partitioning. The findings emphasize the need for further work to enhance the reengineering process and evaluation of the transition between monolithic and microservices architectures.

Information Systems, 2025
Process mining (PM)-based goal recognition (GR) techniques, which infer goals or targets based on... more Process mining (PM)-based goal recognition (GR) techniques, which infer goals or targets based on sequences of observed actions, have shown efficacy in real-world engineering applications. This study explores the applicability of PM-based GR in identifying target poses for users employing powered transhumeral prosthetics. These prosthetics are designed to restore missing anatomical segments below the shoulder, including the hand. In this article, we aim to apply the GR techniques to identify the intended movements of users, enabling the motors on the powered transhumeral prosthesis to execute the desired motions precisely. In this way, a powered transhumeral prosthesis can assist individuals with disabilities in completing movement tasks. PM-based GR techniques were initially designed to infer goals from sequences of observed actions, where discrete event names represent actions. However, the electromyography electrodes and kinematic sensors on powered transhumeral prosthetic devices register sequences of continuous, real-valued data measurements. Therefore, we rely on methods to transform sensor data into discrete events and integrate these methods with the PM-based GR system to develop target pose recognition approaches. Two data transformation approaches are introduced. The first approach relies on the clustering of data measurements collected before the target pose is reached (the clustering approach). The second approach uses the time series of measurements collected while the dynamic user movement to perform linear discriminant analysis (LDA) classification and identify discrete events (the dynamic LDA approach). These methods are evaluated through offline and human-in-the-loop (online) experiments and compared with established techniques, such as static LDA, an LDA classification based on data collected at static target poses, and GR approaches based on neural networks. Real-time human-in-the-loop experiments further validate the effectiveness of the proposed methods, demonstrating that PM-based GR using the dynamic LDA classifier achieves superior F1 score and balanced accuracy compared to state-of-the-art techniques.

CAiSE, 2025
Process discovery studies algorithms for constructing process models that describe control flow o... more Process discovery studies algorithms for constructing process models that describe control flow of systems that generated given event logs, where an event log is a recording of executed traces of a system, with each trace captured as a sequence of executed actions. Traditional process discovery relies on an event log recorded and stored in a centralized repository. However, in distributed environments, such as cross-organizational process discovery, this centralization raises concerns about data availability, privacy, and high communication and bandwidth demands. To address these challenges, this paper introduces a novel Federated Stochastic Process Discovery (FSPD) approach. FSPD avoids centralized event logs by retaining them in decentralized silos, at organizations where they were originally recorded. Process discovery is then performed locally within each organization on its event log, and the resulting local models are shared with a central server for aggregation into a global model. Our evaluations on industrial event logs demonstrate that FSPD effectively constructs global process models while preserving organizational autonomy and privacy, providing a scalable and robust solution for process discovery in distributed settings.

Knowledge and Information Systems, 2024
Process mining aids organisations in improving their operational processes by providing visualisa... more Process mining aids organisations in improving their operational processes by providing visualisations and algorithms that turn event data into insights. How often behaviour occurs in a process-the stochastic perspective-is important for simulation, recommendation, enhancement and other types of analysis. Although the stochastic perspective is important, the focus is often on control-flow. Stochastic conformance checking techniques assess the quality of stochastic process models and/or event logs with one another. In this paper, we address three limitations of existing stochastic conformance checking techniques: inability to handle uncertain event data (e.g., events having only a date), exponential blow-up in computation time due to the analysis of all interleavings of concurrent behaviour, and the problem that loops that can be unfolded infinitely often. To address these challenges, we provide bounds for conformance measures and use partial orders to encode behaviour. An open-source implementation is provided, which we use to illustrate and evaluate the practical feasibility of the approach.
ICPM Demos, 2024
This paper extends Entropia, a command-line tool for performing conformance checking between proc... more This paper extends Entropia, a command-line tool for performing conformance checking between process models and corresponding event logs. The extension introduces functionalities for estimating the generalization of a process model presented as a directly-follows graph using the bootstrap generalization method and evaluating the representativeness of an event log.

PQMI@ICPM 2024, 2024
A sub-field of process mining, conformance checking, quantifies how well the process behavior of ... more A sub-field of process mining, conformance checking, quantifies how well the process behavior of a model represents the observed behavior recorded in a log. A stochastic-aware perspective that accounts for the probability of behavior in both model and log is necessary to support conformance checking. However, existing stochastic conformance checking measures are not comparable for a broad framework that includes log-to-log (L2L), log-to-model (L2M), and model-to-model (M2M) comparison settings. Therefore, we propose a stochastic conformance checking measure based on the Jensen-Shannon Distance (JSD), which interprets models and logs as probability distributions over traces. It can be applied to perform L2L, L2M, and M2M conformance, while the latter requires approximation. Notably, it is the only known stochastic conformance measure that is a metric. JSD has been implemented and is publicly available. Our quantitative evaluations show the feasibility of computing JSD over real-life event logs, and that it provides diagnostic results different from those of existing measures. Moreover, experiments in the M2M setting confirm that our measure can be approximated using unbiased sampling.

CAiSE, 2024
Starting with a collection of traces generated by process executions, process discovery is the ta... more Starting with a collection of traces generated by process executions, process discovery is the task of constructing a simple model that describes the process, where simplicity is often measured in terms of model size. The challenge of process discovery is that the process of interest is unknown, and that while the input traces constitute positive examples of process executions, no negative examples are available. Many commercial tools discover Directly-Follows Graphs, in which nodes represent the observable actions of the process, and directed arcs indicate execution order possibilities over the actions. We propose a new approach for discovering sound Directly-Follows Graphs that is grounded in grammatical inference over the input traces. To promote the discovery of small graphs that also describe the process accurately we design and evaluate a genetic algorithm that supports the convergence of the inference parameters to the areas that lead to the discovery of interesting models. Experiments over real-world datasets confirm that our new approach can construct smaller models that represent the input traces and their frequencies more accurately than the state-of-the-art technique. Reasoning over the frequencies of encoded traces also becomes possible, due to the stochastic semantics of the action graphs we propose, which, for the first time, are interpreted as models that describe the stochastic languages of action traces.

Information Syystems, 2023
Process mining studies ways to improve real-world processes using historical event data generated... more Process mining studies ways to improve real-world processes using historical event data generated by IT systems that support business processes of organisations. Given an event log of an IT system, process discovery algorithms construct a process model representing the processes recorded in the log, while conformance checking techniques quantify how well the discovered model achieves this objective. State-of-the-art discovery and conformance techniques either completely ignore or consider but hide from the users information about the likelihood of process behaviour. That is, the vast majority of the existing process discovery algorithms construct nonstochastic aware process models. Consequently, few conformance checking techniques can assess how well such discovered models describe the relative likelihoods of traces recorded in the log or how well they represent the likelihood of future traces generated by the same system. Note that this is necessary to support process simulation, prediction and recommendation. Furthermore, stochastic information can provide business analysts with further actionable insights on frequent and rare conformance issues. This article presents precision and recall measures based on the notion of entropy of stochastic automata, which are capable of quantifying and, hence, differentiating, between frequent and rare deviations of an event log and a process model that is enriched with the information on the relative likelihoods of traces it describes. An evaluation over several realworld datasets that uses our open-source implementation of the measures demonstrates the feasibility of using our precision and recall measures in industrial settings. Finally, we propose a range of intuitive desired properties that stochastic precision and recall measures should possess, and study our and other existing stochastic-aware conformance measures with respect to these properties.

Information Systems, 2023
Prescriptive process monitoring methods seek to control the execution of a business process by tr... more Prescriptive process monitoring methods seek to control the execution of a business process by triggering interventions, at runtime, to optimize one or more performance measure(s) such as cycle time or defect rate. Examples of interventions include, for example, using a premium shipping service to reduce cycle time in an order-to-cash process, or offering better loan conditions to increase the acceptance rate in a loan origination process. Each of these interventions comes with a cost. Thus, it is important to carefully select the set of cases to which an intervention is applied. The paper proposes a prescriptive process monitoring method that incorporates causal inference techniques to estimate the causal effect of triggering an intervention on each ongoing case of a process. Based on this estimate, the method triggers interventions according to a user-defined policy, taking into account the net gain of the interventions. The method is evaluated on four real-life data sets.

arXiv (Cornell University), Sep 20, 2019
Organizations can benefit from the use of practices, techniques, and tools from the area of busin... more Organizations can benefit from the use of practices, techniques, and tools from the area of business process management. Through the focus on processes, they create process models that require management, including support for versioning, refactoring and querying. Querying thus far has primarily focused on structural properties of models rather than on exploiting behavioral properties capturing aspects of model execution. While the latter is more challenging, it is also more effective, especially when models are used for auditing or process automation. The focus of this paper is to overcome the challenges associated with behavioral querying of process models in order to unlock its benefits. The first challenge concerns determining decidability of the building blocks of the query language, which are the possible behavioral relations between process tasks. The second challenge concerns achieving acceptable performance of query evaluation. The evaluation of a query may require expensive checks in all process models, of which there may be thousands. In light of these challenges, this paper proposes a special-purpose programming language, namely Process Query Language (PQL) for behavioral querying of process model collections. The language relies on a set of behavioral predicates between process tasks, whose usefulness has been empirically evaluated with a pool of process model stakeholders. This study resulted in a selection of the predicates to be implemented in PQL, whose decidability has also been formally proven. The computational performance of the language has been extensively evaluated through a set of experiments against two large process model collections.

arXiv (Cornell University), Mar 6, 2023
Increasing the success rate of a process, i.e. the percentage of cases that end in a positive out... more Increasing the success rate of a process, i.e. the percentage of cases that end in a positive outcome, is a recurrent process improvement goal. At runtime, there are often certain actions (a.k.a. treatments) that workers may execute to lift the probability that a case ends in a positive outcome. For example, in a loan origination process, a possible treatment is to issue multiple loan offers to increase the probability that the customer takes a loan. Each treatment has a cost. Thus, when defining policies for prescribing treatments to cases, managers need to consider the net gain of the treatments. Also, the effect of a treatment varies over time: treating a case earlier may be more effective than later in a case. This paper presents a prescriptive monitoring method that automates this decision-making task. The method combines causal inference and reinforcement learning to learn treatment policies that maximize the net gain. The method leverages a conformal prediction technique to speed up the convergence of the reinforcement learning mechanism by separating cases that are likely to end up in a positive or negative outcome, from uncertain cases. An evaluation on two real-life datasets shows that the proposed method outperforms a state-of-the-art baseline.
Prescriptive process monitoring based on causal effect estimation
Information Systems

Lecture notes in business information processing, 2022
User interaction logs allow us to analyze the execution of tasks in a business process at a finer... more User interaction logs allow us to analyze the execution of tasks in a business process at a finer level of granularity than event logs extracted from enterprise systems. The fine-grained nature of user interaction logs open up a number of use cases. For example, by analyzing such logs, we can identify best practices for executing a given task in a process, or we can elicit differences in performance between workers or between teams. Furthermore, user interaction logs allow us to discover repetitive and automatable routines that occur during the execution of one or more tasks in a process. Along this line, this chapter introduces a family of techniques, called Robotic Process Mining (RPM), which allow us to discover repetitive routines that can be automated using robotic process automation technology. The chapter presents a structured landscape of concepts and techniques for RPM, including techniques for user interaction log preprocessing, techniques for discovering frequent routines, notions of routine automatability, as well as techniques for synthesizing executable routine specifications for robotic process automation.

arXiv (Cornell University), Dec 2, 2022
The problem of process discovery in process mining studies ways to construct process models that ... more The problem of process discovery in process mining studies ways to construct process models that encode business processes that induced event data recorded by IT systems. Most existing discovery algorithms are concerned with constructing models that represent the control flow of the processes. Agent system mining argues that business processes often emerge from interactions of autonomous agents and uses event data to construct models of the agents and their interactions. This paradigm shift from the control flow to agent system discovery proves beneficial when interacting agents have produced the underlying data. This paper presents an algorithm, called Agent Miner, for discovering models of agents and their interactions that compose the system that has generated the business processes recorded in the input event data. The conducted evaluation using our open-source implementation of Agent Miner over publicly available industrial datasets confirms that the approach can unveil insights into the process participants and their interaction patterns and often discovers models that describe the data more accurately in terms of precision and recall and are smaller in size than the corresponding models discovered using conventional discovery algorithms.

Application and Theory of Petri Nets and Concurrency, 2020
State-of-the-art process discovery methods construct free-choice process models from event logs. ... more State-of-the-art process discovery methods construct free-choice process models from event logs. Hence, the constructed models do not take into account indirect dependencies between events. Whenever the input behavior is not free-choice, these methods fail to provide a precise model. In this paper, we propose a novel approach for the enhancement of free-choice process models, by adding non-free-choice constructs discovered a-posteriori via region-based techniques. This allows us to benefit from both the performance of existing process discovery methods, and the accuracy of the employed fundamental synthesis techniques. We prove that the proposed approach preserves fitness with respect to the event log, while improving the precision when indirect dependencies exist. The approach has been implemented and tested on both synthetic and real-life datasets. The results show its effectiveness in repairing process models discovered from event logs.

Discovering data transfer routines from user interaction logs
Information Systems, 2021
Robotic Process Automation (RPA) is a technology to automate routine work such as copying data ac... more Robotic Process Automation (RPA) is a technology to automate routine work such as copying data across applications or filling in document templates using data from multiple applications. RPA tools allow organizations to automate a wide range of routines. However, identifying and scoping routines that can be automated using RPA tools is time consuming. Manual identification of candidate routines via interviews, walk-throughs, or job shadowing allow analysts to identify the most visible routines, but these methods are not suitable when it comes to identifying the long tail of routines in an organization. This article proposes an approach to discover automatable routines from logs of user interactions with IT systems and to synthetize executable specifications for such routines. The proposed approach focuses on discovering routines where a user transfers data from a set of fields (or cells) in an application, to another set of fields in the same or in a different application (data transfer routines). The approach starts by discovering frequent routines at a control-flow level (candidate routines). It then determines which of these candidate routines are automatable and it synthetizes an executable specification for each such routine. Finally, it identifies semantically equivalent routines so as to output a set of non-redundant routines. The article reports on an evaluation of the approach using a combination of synthetic and real-life logs. The evaluation results show that the approach can discover automatable routines that are known to be present in a UI log, and that it discovers routines that users recognize as such in real-life logs
Uploads
Papers by Artem Polyvyanyy
With the acknowledged benefits of microservices architectures, such as scalability, flexibility, improved maintenance, and deployment, legacy software systems are increasingly being reengineered into microservices. Recently, a plethora of methods, techniques, tools, and evaluation criteria for reengineering software systems into microservices have been proposed without being systematized.
Objectives:
The objective of this work is to conduct an in-depth systematic literature review to identify and analyze methods, techniques, and tools for reengineering software systems into microservices and the ways for evaluating such reengineering initiatives and their results.
Methods:
A systematic literature review of works on reengineering software systems into microservices was performed, yielding 117 primary studies. The review focused on addressing key research questions concerning the evolution of microservices reengineering, methodologies employed, tools available, and the challenges faced in the reengineering process. We used a taxonomy development method to systematize knowledge in these areas.
Results:
The analysis revealed multiple reengineering approaches: static, dynamic, hybrid, and artifact-driven. Significant evaluation criteria identified include coupling, cohesion, and modularity. Key paradigms for microservices reengineering, such as domain-driven design and interface analysis, were identified and discussed. The study also highlights that incremental and iterative transitions are favored in practice.
Conclusion:
This study provides a structured overview of the current state of research on reengineering software systems into microservices. It highlights challenges in existing reengineering methodologies. Future directions include validating behavioral equivalence of original and reengineered systems, automating microservices generation, and refining database layer partitioning. The findings emphasize the need for further work to enhance the reengineering process and evaluation of the transition between monolithic and microservices architectures.