Failure Recovery

description583 papers

group8 followers

lightbulbAbout this topic

Failure recovery refers to the processes and strategies employed to restore a system, application, or organization to operational status after a failure or disruption. It encompasses the identification of failure causes, implementation of corrective actions, and the establishment of protocols to prevent future occurrences, ensuring resilience and continuity.

lightbulbAbout this topic

Key research themes

1. How can recovery-oriented computing methodologies optimize system failure recovery to improve availability and reduce total cost of ownership?

This theme explores methods of designing computing systems that can recover quickly and efficiently from failures by rethinking recovery as a first-class design goal rather than a secondary concern, thereby enhancing system availability, reducing downtime costs, and lowering the total cost of ownership (TCO). The focus is on recovery-oriented computing (ROC) principles that target networked services with metrics such as availability, rapid scale, and change, analyzing failure causes and developing techniques for automatic and effective failure recovery.

Recovery-oriented computing (ROC): Motivation, definition, techniques, and …

by William Tetzlaff and

2016

Key finding: This foundational paper introduces recovery-oriented computing (ROC) which emphasizes making recovery a primary design goal to significantly improve system availability and reduce downtime costs. It demonstrates that operator... Read more

articleView Paper downloadDownload

Automatic Recovery from Runtime Failures

by Mauro Pezzè

2015

Key finding: This work presents a technique exploiting intrinsic redundancy in reusable software components to automatically avoid application field failures without requiring system restarts. By generating alternative workarounds... Read more

articleView Paper downloadDownload

Automatic Model-Driven Recovery in Distributed Systems

by Kaustubh 'KJ' Joshi

2025, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05)

Key finding: This study develops a model-driven, Bayesian and Markov decision process based framework enabling automatic system monitoring and recovery in distributed systems under imperfect and conflicting monitoring conditions. It... Read more

articleView Paper downloadDownload

A Software-Based Hardware Fault Tolerance Scheme for Multicomputers

by Eli Gafni

2025

Key finding: This paper details a software-driven fault tolerance scheme for large multicomputer systems executing long jobs, where error detection and recovery are mostly handled by software via paired subsystems executing identical... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the formal models and programming paradigms that enable systematic recovery and self-healing in software systems after failures?

This theme investigates formal approaches and frameworks for implementing recovery and self-healing capabilities in software systems. It includes transactional compensation models enabling undoing committed transactions without cascading aborts, recovery-oriented programming paradigms embedding monitoring and recovery actions for safety and liveness properties, and systems exhibiting self-healing inspired by biological analogies to autonomously detect, diagnose, and repair faults. The goal is to provide theoretical and practical bases for building software resilient to transient and permanent faults.

A formal approach to recovery by compensating transactions

by Eliezer Levy and

2016, The VLDB Journal

Key finding: This paper formulates a transaction model introducing compensating transactions which semantically undo effects of committed or uncommitted transactions affecting others, thereby avoiding cascading aborts. It formalizes... Read more

articleView Paper downloadDownload

Recovery Oriented Programming: Runtime Monitoring of Safety and Liveness

by Olga Brukman

2017

Key finding: This research proposes the recovery oriented programming (ROP) paradigm wherein programs integrate monitoring of safety and liveness properties and embed recovery actions upon violation detection. Using a generic... Read more

articleView Paper downloadDownload

On conditions for self-healing in distributed software systems

by Naftaly Minsky

2023, 2003 Autonomic Computing Workshop

Key finding: The paper identifies that self-healing in distributed software requires invariant regularities across all system configurations, proposing imposing artificial 'laws' on heterogeneous distributed systems to achieve this. It... Read more

articleView Paper downloadDownload

Self-Healing Systems: Application and Methodologies-A Review

by Fidelis Ugwuanyi

2022, International Journal of Research

Key finding: This review systematically categorizes self-healing techniques inspired by biological systems, presenting methodologies such as middleware-based self-adaptive fault tolerance, monitoring frameworks, and hierarchical fault... Read more

articleView Paper downloadDownload

Self-Healing Systems: Foundations and Challenges

by Gabi Dreo Rodosek

2016

Key finding: This position paper delineates self-healing as systems autonomously detecting faults and performing recovery steps to restore specified operational modes. It distinguishes self-healing from fault tolerance and related... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can failure recovery be optimized in storage and network systems through algorithmic and architectural techniques to ensure minimum performance degradation during faults?

This theme considers optimizing failure recovery in storage and network infrastructures, focusing on minimizing recovery overhead, ensuring consistency without rollback cascades, and maintaining service continuity under component failures. It covers topics such as I/O optimal recovery schemes for erasure-coded storage minimizing read/write operations needed for reconstruction, failure recovery architectures in cluster computing free from domino effect, and fault-tolerance frameworks in software-defined networking (SDN) and optical transport networks.

In search of I/O-optimal recovery from disk failures

by Osama N Khan

2021

Key finding: This work develops an algorithm to find minimum I/O schedules for recovery from arbitrary numbers of disk failures in XOR-based erasure-coded storage. It introduces a family of codes enabling recovery from up to 11... Read more

articleView Paper downloadDownload

Impact: an Unreliable Failure Detector Based on Processes' Relevance and the Confidence Degree in the System

by Anubis Graciela de Moraes Rossetto

2025

Key finding: This paper introduces the Impact Failure Detector that assigns impact factors to processes and outputs a trust level for a set of monitored processes rather than individual binary suspicion. By defining thresholds that... Read more

articleView Paper downloadDownload

Domino-Effect Free Crash Recovery for Concurrent Failures in Cluster Federation

by Shahram Rahimi

2025, Lecture Notes in Computer Science

Key finding: The authors propose a recovery approach for multi-cluster federations that handles both inter-cluster orphan and lost messages, ensuring recovery free from the domino effect, thereby minimizing recomputation. By using common... Read more

articleView Paper downloadDownload

Fault-Tolerance in the Scope of Software-Defined Networking (SDN)

by Rui Aguiar

2024, IEEE Access

Key finding: This survey details fault tolerance challenges and solutions within SDN architectures, examining detection and recovery mechanisms in data, control, and application planes. It highlights that SDN introduces novel fault... Read more

articleView Paper downloadDownload

Disaster resilience of optical networks: State of the art, challenges, and opportunities

by georgios ellinas

2025, Optical Switching and Networking

Key finding: This position paper reviews mechanisms enabling optical networks to achieve resilience against disasters including natural events and malicious attacks. It categorizes proactive pre-disaster, preparatory, and reactive... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Failure Recovery

Single-link failure recovery with or without software-defined networking switches

by Dajin Wang

2025, 2018 International Conference on Information and Computer Technologies (ICICT)

In this paper, we consider IP fast recovery from single-link failures in a given network topology. The basic idea is to replace some existing routers with a designated switch. When a link fails, the affected router will send all the... more

descriptionView Paper arrow_downwardDownload

Ad Hoc Networks

by Luciano Lenzini

2025

descriptionView Paper arrow_downwardDownload

Hotel Overbooking

by Breffni Noone

2025, Journal of Hospitality & Tourism Research

Overbooking represents an important strategy for many service providers that apply revenue management. Although the objective is to overbook such that no customers are denied service, denials may result when the customer no-show rate is... more

descriptionView Paper arrow_downwardDownload

Congestion Control Based on Distributed Statistical QoS-Aware Routing Management

by Bogdan Rus

2025

In this paper a distributed routing management solution is described that takes into consideration statistical Quality of Service (QoS) information about the state of network links. The goal is to offer dynamic metrics to the routing... more

descriptionView Paper arrow_downwardDownload

Routing Management Based on Statistical Cross-Layer QoS Information Regarding Link Status

by Bogdan Rus

2025, users.utcluj.ro

AbstractThis paper presents the design principles and the practical implementation of a routing management solution which takes into account statistical cross-layer Quality of Service information regarding the state of the network. Link... more

descriptionView Paper arrow_downwardDownload

ZIGZAG: an efficient peer-to-peer scheme for media streaming

by Đức Trần

2025, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428)

We design a peer-to-peer technique called ZIGZAG for single-source media streaming. ZIGZAG allows the media server to distribute content to many clients by organizing them into an appropriate tree rooted at the server. This... more

descriptionView Paper arrow_downwardDownload

A Peer-to-Peer Architecture for Media Streaming

by Đức Trần

2025, IEEE Journal on Selected Areas in Communications

Given the fact that the current Internet does not widely support IP Multicast while content-distribution-networks technologies are costly, the concept of peer-to-peer could be a promising start for enabling large-scale streaming systems.... more

descriptionView Paper arrow_downwardDownload

Scalable application layer multicast

by Đức Trần

2025, ACM SIGCOMM Computer Communication Review

We describe a new scalable application-layer multicast protocol, specifically designed for low-bandwidth, data streaming applications with large receiver sets. Our scheme is based upon a hierarchical clustering of the application-layer... more

descriptionView Paper arrow_downwardDownload

OntoOmnia: A Meta-Operating System for Resilient AI Singularity Management

by YoochulKim ontomotoos

2025

This paper presents OntoOmnia, a new meta-operating system architecture designed to address the challenges and risks of AI singularity. While previous research has focused on embedding ethical principles or ontological frameworks into AI... more

descriptionView Paper arrow_downwardDownload

A Survey of State Management in Big Data Processing Systems

by Volker Markl

2025, arXiv (Cornell University)

The concept of state and its applications vary widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Heron, Apache Samza, Apache Spark, and Apache... more

descriptionView Paper arrow_downwardDownload

A survey of state management in big data processing systems

by Volker Markl

2025, The VLDB Journal

descriptionView Paper arrow_downwardDownload

Disaster resilience of optical networks: State of the art, challenges, and opportunities

by georgios ellinas

2025, Optical Switching and Networking

For several decades, optical networks, due to their high capacity and long-distance transmission range, have been used as the major communication technology to serve network traffic, especially in the core and metro segments of... more

descriptionView Paper arrow_downwardDownload

A framework for MPLS path setup in unidirectional multicast shared trees

by Chung-horng Lung

2025, Proceedings of SPIE

Establishing multicast communications in MPLS-capable networks is an essential requirement for a wide-scale deployment of MPLS in the Internet. This paper outlines a framework for the setup of a MultiPoint-to-MultiPoint (MP2MP) Label... more

descriptionView Paper arrow_downwardDownload

MPLS-based Multicast Shared Trees

by Chung-horng Lung

2025

This paper presents a study of our proposed architecture for the setup of a MultiPoint-to-MultiPoint (MP2MP) Label Switched Path (LSP). This form of LSP is needed for establishing uni-directional multicast shared trees. Such trees are... more

descriptionView Paper arrow_downwardDownload

Using Logical Rings to Solve the Distributed Mutual Exclusion Problem with Fault Tolerance Issues

by Niki Pissinou

2025, The Journal of Supercomputing

In this paper, we investigate distributed mutual exclusion algorithms and delineate the features of a new distributed mutual exclusion algorithm. The basis of the algorithm is the logical ring structure employed in token-based mutual... more

descriptionView Paper arrow_downwardDownload

A Congestion-Aware Clustering and Routing (CCR) Protocol for Mitigating Congestion in WSN

by Mahmoud Badawy

2025, IEEE Access

Wireless sensor networks (WSN) have been investigated as a powerful distributed sensing application to enhance the efficiency of embedded systems and wireless networking capabilities. Although WSN has offered unique opportunities to set... more

descriptionView Paper arrow_downwardDownload

Transparent recovery of Mach applications

by Arthur Goldberg

2025

We have built a software layer on top of Mach 2.5 that recovers multitask Mach applications from fail-stop failures. The layer implements Optimistic Recovery (OR), a mechanism for transparent recovery from failing tasks and processors,... more

descriptionView Paper arrow_downwardDownload

An Empirical Experience with 3DROV Simulator: Testing an Advanced Autonomous Controller for Rover Operations

by Angelo Oddi

2025

The aim of this paper is to convey our experience using the ESA's 3DROV planetary rover simulator as a visualization and validation tool through a dynamic analysis on the performance of an advanced autonomous control architecture: a... more

descriptionView Paper arrow_downwardDownload

An Integrated Constraint-Based, Power-Aware Control System for Autonomous Rover Mission Operations

by Angelo Oddi

2025

This paper aims at describing an integrated power-aware, model-based autonomous control architecture for planetary rover-based mission operations synthesized in the context of a Ph.D. program on the topic "Autonomy for Interplanetary... more

descriptionView Paper arrow_downwardDownload

Parallel edge detection using uni-directional multiring on spiral architecture

by Hamid Arabnia

2025, Parallel and Distributed Processing Techniques and Applications

Improving the computation efficiency is the key issue in image processing, especially in edge detection, because edge detection is very computationally intensive. With the development of real-time image processing application, fast... more

descriptionView Paper arrow_downwardDownload

An Incremental Harmonic Function-based Probabilistic Roadmap Approach to Robot Path Planning

by M.M Kazemi

2025, Proceedings of the 2005 IEEE International Conference on Robotics and Automation

descriptionView Paper arrow_downwardDownload

Resource allocation strategies for survivability in WDM optical networks

by Paramjeet singh

2025, Optical Fiber Technology

WDM optical networks are high speed networks and provide enormous capacity. Survivability is very important issue in these networks. Survivability requires resources for handling the failures. So, efficient resource allocation strategy is... more

descriptionView Paper arrow_downwardDownload

A new roll-forward checkpointing/recovery mechanism for cluster federation

by Shahram Rahimi

2025

In this paper, we have addressed the complex problem of determining a recovery line for cluster federation and proposed an efficient checkpointing / recovery mechanism for it. The main objective of the proposed approach is to advance the... more

descriptionView Paper arrow_downwardDownload

Scalable application layer multicast

by Đức Trần

2025, ACM SIGCOMM Computer Communication Review

descriptionView Paper arrow_downwardDownload

A Unified Approach to Model-Based Planning and Execution

by Peter Norvig

2025

descriptionView Paper arrow_downwardDownload

Composite Web Service Failure Recovery Considering User Non-functional Preferences

by Hossein Rahmani

2025, 2008 4th International Conference on Next Generation Web Services Practices

A composite web service is essentially a combination of smaller services to provide extended functionalities. However, such services are more susceptible to failures than atomic services. This is due to its dependency on other services... more

descriptionView Paper arrow_downwardDownload

Towards compensation correctness in interactive systems

by Cátia Vaz

2025, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

In the Microbial typing field, the need to have a common understanding of the concepts described and the ability to share results within the community is an increasingly important requisite for the continued development of portable and... more

descriptionView Paper arrow_downwardDownload

A low-overhead recovery technique using quasi-synchronous checkpointing

by Dakshnamoorthy Manivannan

2024, Proceedings of 16th International Conference on Distributed Computing Systems

In this paper we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easertess and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of afailedprocess. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronousfor single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation. uses the checkpoints and message logs to restore the system to a consistent global state [12]. In the literature, several checkpointing schemes have been proposed for distributed systems. They can be broadly classified into two categories asynchronous and synchronous. In asynchronous checkpointing [3], processes take checkpoints periodically without any coordination with others. To recover from a failure, a process communicates with other processes to determine if their local states are causally related. If they are. processes that received messages which are responsible for causal dependencies, roll back to eliminate these causal dependencies. This process is repeated until the local states of all the processes are free from causal dependencies. This approach allows maximum process autonomy and has low checkpointing overhead. However, this approach may suffer from the domino effect, in which the processes roll back recursively while determining a consistent set of checkpoints. To reduce domino effect, Kim et al. [9] and Venkatesh et al. [17] use the dependency tracking and insert checkpoints before processing a new message that introduces dependency. Message logging [6, 7, 14, 161 and message reordering [ 191 have been suggested in the literature to cope with the domino effect.

descriptionView Paper arrow_downwardDownload

Finding consistent global checkpoints in a distributed computation

by Dakshnamoorthy Manivannan

2024, IEEE Transactions on Parallel and Distributed Systems

Finding consistent global checkpoints of a distributed computation is important for analyzing, testing, or verifying properties of these computations. In this paper we present a theoretical foundation for nding consistent global... more

descriptionView Paper arrow_downwardDownload

Quasi-synchronous checkpointing: Models, characterization, and classification

by Dakshnamoorthy Manivannan

2024, IEEE Transactions on Parallel and Distributed Systems

Checkpointing algorithms are classi ed as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always... more

descriptionView Paper arrow_downwardDownload

Dynamic scheduling of network resources with advance reservations in optical grids

by Harry Perros

2024, International Journal of Network Management

Advance reservation of lightpaths in grid environments is necessary to guarantee QoS and reliability. In this paper, we have evaluated and compared several algorithms for dynamic scheduling of lightpaths using a flexible advance... more

descriptionView Paper arrow_downwardDownload

Soft errors detection and automatic recovery based on replication combined with different levels of checkpointing

by emilio luque

2024, Future Generation Computer Systems

Handling faults is a growing concern in HPC. In future exascale systems, it is projected that silent undetected errors will occur several times a day, increasing the occurrence of corrupted results. In this article, we propose SEDAR,... more

descriptionView Paper arrow_downwardDownload

Congestion Control Based on Distributed Statistical QoS-Aware Routing Management

by Andrei Rus

2024

descriptionView Paper arrow_downwardDownload

An Architecture for IP/LDP Fast-Reroute Using Maximally Redundant Trees

by Mike Shand

2024

This document defines the architecture for IP and LDP Fast Reroute using Maximally Redundant Trees (MRT-FRR). MRT-FRR is a technology that gives link-protection and node-protection with 100% coverage in any network topology that is still... more

descriptionView Paper arrow_downwardDownload

Hybrid constraints for robust parsing: First experiments and evaluation

by vito pirrelli

2024, Proceedings of LREC

Istituto di Linguistica Computazionale – CNR1 Area della Ricerca, via G. Moruzzi 1, 56100 Pisa, Italy {roberto.bartolini, simonetta.montemagni, vito.pirrelli}@ilc.cnr.it ... Università di Pisa, Dipartimento di Linguistica2 via Santa... more

descriptionView Paper arrow_downwardDownload

A Mechanism to Overcome Link Failures in Single Path Network Architecture

by Asma Parveen

2024

In a single link network architecture if a link fails, system hunts for the substitute link and transmits the data through that link. It is always necessary for system to search the reason for path break then configure the system again to... more

descriptionView Paper arrow_downwardDownload

Fault-Tolerance in the Scope of Software-Defined Networking (SDN)

by Rui Aguiar

2024, IEEE Access

Fault-tolerance is an essential aspect of network resilience. Fault-tolerance mechanisms are required to ensure high availability and high reliability in systems. The advent of software-defined networking (SDN) has both presented new... more

descriptionView Paper arrow_downwardDownload

Multicasting in cognitive radio networks: Algorithms, techniques and protocols

by Asad Ali

2024, Journal of Network and Computer Applications

Multicasting is a fundamental networking primitive utilized by numerous applications. This also holds true for cognitive radio networks (CRNs) which have been proposed as a solution to the problems that emanate from the static... more

descriptionView Paper arrow_downwardDownload

Availability

by Duc M Nguyen

2024, Quality attribute in: Availability

Availability in software refers to the system's ability to be operational and ready to perform its tasks when required. This concept is broader than reliability, as it includes not only consistent performance but also the system's... more

descriptionView Paper arrow_downwardDownload

A hybrid procedural/deductive executive for autonomous spacecraft

by Christian Plaunt

2024, Proceedings of the second international conference on Autonomous agents - AGENTS '98

Tha Ncvr Millennium Remote Agent (NMRA) will be the first AI system to control an actual spacecraft. The spacecraft domain places a strong premium on autonomy and requires dynamic recoveries and robust concurrent execution, all in the... more

descriptionView Paper arrow_downwardDownload

Optimizing Spectrum Sensing by Using Artificial Neural Network in Cognitive Radio Sensor Networks

by Dr.G. Rajakumar, B.E.,M.B.A.,M.E.,Ph.D.,D.Litt.,

2024

Resource allocation is most needed in the next generation of Cognitive radio networks these techniques are used to increase the Cognitive radio network's performance. But, it is difficult to accomplish these techniques in real-time... more

Fig.9 Throughput graph Fig.8 Packet delivery ratio graph

To express the Resource allocation technique for Cognitive Radio Network, a Cognitive Radio Network element is supposed at the head, as Fig. 1 displays, which contains the Pri- mary User, Secondary Users, primary and secondary base station. Consider an uplink and downlink communication system that contains several secondary users as shown in Fig. 1. Entire Primary user Stations (PUS) or Secondary user Stations (SUS) assist their users through the equal frequency source. The secondary users placed in the attendance area of the chief primary cell share source with the primary user. The chief cell in the primary user is branded as the first cell. To evade severe pilot interference, entire primary users in every cell utilize orthogonal pilots @ = (1, 62, ... .Pxp} 7 € M*”"*? here b, * #; =p is the pilot signal power (Fig. 2).

a I Figure 4 displays the neighbourhood nodes are identified and route request packets are directed to the neighbour nodes. Neighbourhood nodes are identified and route request packets are directed to the neighbour nodes. @ —Routing carried over (~)—source, desti- nation nodes. Ad-hoc On-Demand Distance Vector protocol initiates route discovery pro- cess only after preferred through a source node. Source node sends Route request (RREQ) message to neighbors if it is unaware of the destination node. Neighbor nodes further rebroadcast if they do not have a fresh enough route. Every node has RREQ-ID and Sequence Number an RREP message is sent from Source to Destination via revere route. Figure 5 shows the attacker problem where the Adversary node is identified and the alarm message is sent to the neighbour nodes where node 26 is the attacker node and the

Fig.6 Data communication from source to destination

Dr. T. Aruna obtained her B.E. degree from Thiagarajar college of Engineering, Madurai Kamaraj University and M.E. degree from Ala- gappa Chettiar college of Engineering and Technology, Karaikudi, India in 1990, 1998 respectively. She has completed her Ph.D. in the area of Mobile Ad Hoc Networks in 2011. She is now working as an Assistant professor in Thiagarajar college of Engineering, Madurai, India. She has published more than 32 papers in the national, interna- tional conferences and journals. Her research interest includes Multi- user MIMO and Ad Hoc networks. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

descriptionView Paper arrow_downwardDownload

The Impact of Service Operations Failures on Customer Satisfaction: Evidence on How Failures and Their Source Affect What Matters to Customers

by Shannon Anderson

2024, Manufacturing & Service Operations Management

R esearch in consumer psychology shows that customers seek reasons for service failures and that attributions of blame moderate the effects of failure on the level of customer satisfaction. This paper extends research on service operations failures by hypothesizing that attributions of blame also affect what matters to the customer during service failures. Specifically, we hypothesize that the relative weights that customers assign to key service elements in reaching an overall assessment of customer satisfaction are affected by customer attributions of blame for service failures. We use the U.S. airline industry as a quasi-experimental research setting to investigate the components of customer satisfaction for three samples of customers who experience (1) routine service, (2) flight delays of external (i.e., weather) origin, and (3) flight delays of internal origin. Although the level of customer satisfaction is lower for all service failures, we find that the key components of satisfaction differ between delayed and routine flights only when customers blame the service provider for the failure. Specifically, when delays are of external origin satisfaction is lower than for routine flights, but there is virtually no difference in the weight that customers assign to the components of customer satisfaction (including employee interactions). In contrast, when delays are of internal origin, satisfaction is lower than for either routine flights or flights delayed by external factors, and employee interactions have a significantly diminished role in customer satisfaction evaluations. Contrary to the popular view that employee interactions take on a greater role in determining customer satisfaction during service failures, we find that the opposite is true if the customer attributes blame to the service provider. Our findings highlight the important role of customer attributions during service failures and present more nuanced evidence on the role of employee-customer interactions in mitigating the effects of service failures on customer satisfaction.

descriptionView Paper arrow_downwardDownload

An architecture for highly available wide-area service composition

by Bhaskaran Raman

2024, Computer Communications

Service composition provides a flexible way to quickly enable new application functionalities in next generation networks. We focus on the scenario where next generation portal providers 'compose' the component services of other... more

descriptionView Paper arrow_downwardDownload

Evaluation of Lasing Range with a 1.8 m Undulator in KU-FEL

by Hideaki Ohgaki

2024

In KU-FEL (Kyoto University FEL) 12-14 m FEL has been available by using a 40 MeV S-band linac and 1.6 m undulator. We are going to install 1.8 m undulator which was used in JAEA to extend the lasing range of KU-FEL. We measured the... more

descriptionView Paper arrow_downwardDownload

How does testing affect the availability of aging software systems?

by Michael Grottke

2024, Performance Evaluation

This paper proposes an approach to examining how testing affects the operational behavior of aging software systems. Such an approach requires models for the testing phase and the operational phase that explicitly account for crash... more

descriptionView Paper arrow_downwardDownload

Avoiding Transient Loops Through Interface-Specific Forwarding

by Zifei Zhong

2024, Lecture Notes in Computer Science

Under link-state routing protocols such as OSPF and IS-IS, when there is a change in the topology, propagation of link-state announcements, path recomputation, and updating of forwarding tables (FIBs) will all incur some delay before... more

descriptionView Paper arrow_downwardDownload

Mitigating transient loops through interface-specific forwarding

by Zifei Zhong

2024, Computer Networks

Under link-state routing protocols such as OSPF and IS-IS, when there is a change in the topology, propagation of link-state advertisements, path recomputation, and updating of forwarding tables (FIBs) will all incur some delay before... more

descriptionView Paper arrow_downwardDownload

Overlay multicast tree recovery scheme using a proactive approach

by JinHan Jeon

2024, Computer Communications

Overlay multicast scheme has been regarded as an alternative to conventional IP multicast since it can support multicast functions without infrastructural level changes. However, multicast tree reconstruction procedure is required when a... more

descriptionView Paper arrow_downwardDownload

A Hybrid Packet/Circuit Optical Transport Architecture for DCN

by Josep Solé-Pareta

2024, 2019 21st International Conference on Transparent Optical Networks (ICTON)

The aim of this paper is to move away from today's multi-tier, manually operated, and performance limited Data Centre Network (DCN) towards more scalable, flexible, and optimized architecture of tomorrow. We propose a new hybrid optical... more

descriptionView Paper arrow_downwardDownload

Design methods for optimal resource allocation in wireless networks

by MOHAMMAD IMAM UDDIN

2024

Wireless communications have seen remarkable progress over the past two decades and perceived tremendous success due to their agile nature and capability to provide fast and ubiquitous internet access. Maturation of 3G wireless network... more

descriptionView Paper arrow_downwardDownload

Failure Recovery

Key research themes

1. How can recovery-oriented computing methodologies optimize system failure recovery to improve availability and reduce total cost of ownership?

2. What are the formal models and programming paradigms that enable systematic recovery and self-healing in software systems after failures?

3. How can failure recovery be optimized in storage and network systems through algorithmic and architectural techniques to ensure minimum performance degradation during faults?

Related Topics

All papers in Failure Recovery