High Availability

description2,942 papers

group127 followers

lightbulbAbout this topic

High Availability (HA) refers to the design and implementation of systems and components that ensure a high level of operational performance and uptime, minimizing downtime and service interruptions. It involves redundancy, failover mechanisms, and fault tolerance to maintain continuous service availability, particularly in critical applications and environments.

lightbulbAbout this topic

Key research themes

1. How can systems maintain continuous high availability despite sensor or component faults in layered cloud and edge computing environments?

This research theme explores fault detection and fault tolerance mechanisms at the sensor and component level within multi-layered cloud and edge computing systems. It focuses on maintaining high availability (HA) despite sensor failures that may otherwise disable fault detection capabilities. The relevance lies in ensuring uninterrupted service delivery in complex infrastructures composed of multiple interdependent layers, addressing both hardware and software component failures without human intervention to avoid downtime.

High-Availability Computing Platform with Sensor Fault Resilience

by shinta arizky

2021, Sensors

Key finding: Proposes a novel high availability mechanism using dynamic fault model reconstruction to tolerate sensor faults by reconfiguring the fault detection logic with remaining healthy sensors without human intervention. The... Read more

articleView Paper downloadDownload

Validation of High-Availability Model for Edge Devices and IIoT

by Alzbeta Kanalikova

2024, Sensors

Key finding: Emphasizes the importance of HA and fault tolerance at the edge in industrial IoT deployments and presents an architecture whereby edge devices act as middleware to integrate various systems, ensuring minimal latency and... Read more

articleView Paper downloadDownload

High availability for parallel computers

by E. Fadón

2022, journal.info.unlp.edu.ar

Key finding: Investigates fault tolerance strategies suitable for large-scale parallel computing systems to maintain availability and data consistency. The paper shows that by deploying software-layer checkpointing and rollback recovery... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the effective proactive and coordinated fault tolerance mechanisms to preserve reliability and minimize downtime for cloud virtual machines and parallel applications?

This theme focuses on proactive fault tolerance strategies that anticipate failures based on system health indicators within cloud infrastructures hosting parallel applications across virtual machines (VMs). It addresses coordinated fault tolerance involving VM migration and resource optimization to prevent failures and reduce system unavailability by minimizing checkpoint frequency and downtime, enhancing reliability in cloud data centers with large-scale parallel workloads.

Using Proactive Fault-Tolerance Approach to Enhance Cloud Service Reliability

by Sathish Kumar

2023, IEEE Transactions on Cloud Computing

Key finding: Introduces a two-step proactive fault tolerance approach for cloud virtual clusters: CPU temperature modeling predicts deteriorating physical machines, and VM coordinated migration via an improved particle swarm optimization... Read more

articleView Paper downloadDownload

Optimizing Clustering Approaches in Cloud Environments

by Dr.Abdel-rahman Al-Ghuwairi

2024, International journal of interactive mobile technologies

Key finding: Compares local single cluster, local multiple clusters, and multiple cloud clusters, showing that multi-cluster approaches improve scalability, fault tolerance, resource allocation, and availability. Experimental results... Read more

articleView Paper downloadDownload

A Kubernetes controller for managing the availability of elastic microservice based stateful applications

by Ferhat Khendek

2023, Journal of Systems and Software

Key finding: Identifies Kubernetes' built-in repair actions as insufficient for achieving carrier-grade high availability for stateful microservices. Proposes an HA State Controller that handles application state replication and automatic... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can distributed database systems and cloud orchestrations be architected to achieve strong consistency, fault tolerance, and global high availability?

This area investigates architectural design patterns, replication protocols, and orchestration mechanisms that support high availability in distributed data storage and cloud infrastructure. It includes hybrid replication protocols for ensuring data consistency and availability, strategies for global database replication with failover capabilities, and container orchestration tools for resilient service deployment and scaling. These insights aim to guide systems supporting geo-distributed workloads with minimal downtime and strong data guarantees.

Styx++: Reliable Data Access and Availability Using a Hybrid Paxos and Chain Replication Protocol

by Ather Sharif

2022, CHI Conference on Human Factors in Computing Systems Extended Abstracts

Key finding: Develops Styx++, a hybrid replication system integrating Paxos for configuration management and Chain Replication for database nodes, balancing strong consistency and fault tolerance with minimal performance degradation.... Read more

articleView Paper downloadDownload

CockroachDB: The Resilient Geo-Distributed SQL Database

by Rebecca Taft

2025, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Key finding: Describes CockroachDB’s architecture for geo-distributed scalable SQL workloads with fault tolerance and high availability, using replication across diverse geographic zones and a novel transaction protocol for strong... Read more

articleView Paper downloadDownload

Using Docker Swarm to Improve Performance in Distributed Web Systems

by Marian Ileana

2025, 2024 International Conference on Development and Application Systems (DAS)

Key finding: Examines how Docker Swarm, a container orchestration tool, facilitates efficient resource management, auto-scaling, and load balancing across multi-node clusters to enhance availability and performance of distributed web... Read more

articleView Paper downloadDownload

High Availability in Computer Systems

by Mostafa Abd-El-Barr

2024, PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO. eBooks

Key finding: Provides foundational insights into availability classes and the challenge of building systems with 'five nines' (99.999%) availability. It emphasizes the necessity of fault tolerance mechanisms including redundancy,... Read more

articleView Paper downloadDownload

High Availability Using Virtualization

by Amartya Dasgupta and

2012, ISCA 24th International Conference on Computer Applications in Industry and Engineering (CAINE 2011), Honolulu, Hawaii, United States of America, November 2011

Key finding: Proposes a cost-effective HA architecture using virtualization where virtual servers migrate and mirror system states between primary and secondary servers, employing delta encoding to minimize data transfer overhead. The... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

4. How can AI-driven techniques improve disaster recovery, fault tolerance, and high availability in dynamic cloud systems?

This research theme evaluates the application of artificial intelligence (AI) and machine learning methods to enhance cloud reliability. AI enables predictive failure detection, automated fault management, intelligent load balancing, and self-healing capabilities, exceeding limitations of static rule-based resilience methods. Insights cover how AI supports adaptive resource optimization, reduces downtime, and improves recovery speed, while acknowledging challenges such as model bias and data privacy.

AI-Driven Cloud Services for Guaranteed Disaster Recovery, Improved Fault Tolerance, and Transparent High Availability in Dynamic Cloud Systems

by Akshay Sharma and

2025, International Journal of Scientific Research in Science, Engineering and Technology

Key finding: Presents an integrative review and empirical analysis demonstrating that AI-based cloud resilience solutions leveraging predictive analytics and self-healing architectures notably reduce service downtime, enhance recovery... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in High Availability

Master Data Management Strategies for Improving Data Quality and Accuracy: A Comprehensive Framework for Enterprise Excellence

by Madhusudan Sharma Vadigicherla

2025, Journal of Computer Science and Technology Studies

This article presents a strategic framework for implementing master data management excellence within enterprise organizations, with particular emphasis on manufacturing and distribution environments. The article explores the critical... more

descriptionView Paper arrow_downwardDownload

Proposta de Otimização do Tráfego da Rede da Universidade Federal de Lavras Utilizando a Técnica de Spanning Tree Protocol

by Anderson Bispo Dos Santos

2025, allnetcom.com.br

Com a expansão das organizações e de suas redes computacionais, sua alta disponibilidade é um requisito chave. Até mesmo curtos períodos de inatividade de uma rede podem gerar perdas de produtividade. Diante disso, o presente trabalho... more

descriptionView Paper arrow_downwardDownload

The Need for Long-Term Disaster Recovery Systems

by Carlos Martin Nieto

2025

Disasters are an inevitable part or our lives. Much of the work and tools used currently are intended to address the first stage to a disaster which is response. Work related to a later stage, long-term disaster recovery, is scarce and an... more

descriptionView Paper arrow_downwardDownload

Enhancing TCP throughput of highly available virtual machines via speculative communication

by Yutaka Ishikawa

2025, Sigplan Notices

Checkpoint-recovery based virtual machine (VM) replication is an attractive technique for accommodating VM installations with high-availability. It provides seamless failover for the entire software stack executed in the VM regardless the... more

descriptionView Paper arrow_downwardDownload

Intelligent Platform-Management Controller for Low-Level RF Control System ATCA Carrier Board

by António Batista

2025, IEEE Transactions on Nuclear Science

High availability and reliability are among the most desirable features of control systems in modern High-Energy Physics (HEP) and other big-scale scientific experiments. One of the recent developments that has influenced this field was the emergence of the Advanced Telecommunications Computing Architecture (ATCA). Designed for the telecommunications industry it has been successfully applied in other domains such as accelerator control systems. A good example is the application of ATCA standard for the design of Low Level RF (LLRF) control system for the X-Ray Free Electron Laser (XFEL) being developed in Deutsches Elektronen Synchrotron (DESY). Reliability and availability requirements for such a device play a crucial role among other parameters. Thus, the ATCA standard, with fivenines availability, is considered one of the best candidates for this system. This article focuses on the central management unit of every ATCA board, namely the Intelligent Platform Management Controller (IPMC), developed for the LLRF ATCA Carrier Board (CB). It also argues that it is possible to create a fully functional IPMC using base specifications only which is a much more economical solution than acquiring such products from various vendors dealing with ATCA-related products. The solution presented here fully complies with all the most recent revisions of specifications that are required for an ATCA board to properly operate in an ATCA shelf, communicate with the redundant Shelf Manager (ShM) and host Advanced Mezzanine Cards (AMCs). Full Electronic-Keying (EK) functionality is present on the LLRF CB supporting such protocols as PCI Express (PCIe), Gigabit Ethernet (GbE) and proprietary Low Latency Links (LLL) making it possible to route connections between all the boards in the system. The IPMC solution presented here is mainly hardware independent as proper code organization allowed to separate low-level device drivers and high-level application logic dealing with the ATCA standard, which makes it portable to new carrier board designs.

descriptionView Paper arrow_downwardDownload

Enhancing Business Process Efficiency through SAP BW/4HANA in Financial Management

by QIT Press

2025, Quality Institute of Technology Press (QIT Press)

The evolving dynamics of global finance necessitate the adoption of advanced digital platforms to streamline operations, enhance data-driven decision-making, and ensure regulatory compliance. SAP BW/4HANA, as a next-generation data... more

descriptionView Paper arrow_downwardDownload

Optimizing High Availability in Educational Systems Using Xen Paravirtualization

by Eda Tabaku

2025, Journal of Educational and Social Research

In today's world, where technology plays a crucial role in development, education is also benefiting from advancements in network infrastructure and virtualized services. Virtualization technology and highavailability (HA) infrastructures... more

descriptionView Paper arrow_downwardDownload

Maximizing Business Value through Artificial Intelligence and Machine Learning in SAP Platforms

by Dr. Aryendra Dalal

2025, INTERNATIONAL JOURNAL OF RESEARCH IN ELECTRONICS AND COMPUTER ENGINEERING

The integration of Artificial Intelligence (AI) and Machine Learning (ML) into SAP platforms is revolutionizing enterprise resource planning by driving automation, predictive insights, and intelligent decision-making. This paper explores... more

descriptionView Paper arrow_downwardDownload

on storage in Linux cluster

by Gianluca Argentini

2025

descriptionView Paper arrow_downwardDownload

Harnessing the Power of SAP Applications to Optimize Enterprise Resource Planning and Business Analytics

by Dr. Aryendra Dalal

2025, INTERNATIONAL JOURNAL OF RESEARCH IN ELECTRONICS AND COMPUTER ENGINEERING

The rapid digital transformation in the financial sector has created an urgent demand for deploying machine learning and AI models at scale in a secure, efficient, and scalable manner. Traditional model deployment techniques often lack... more

descriptionView Paper arrow_downwardDownload

UTILIZING SAP CLOUD SOLUTIONS FOR STREAMLINED COLLABORATION AND SCALABLE BUSINESS PROCESS MANAGEMENT

by Dr. Aryendra Dalal

2025, INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

The integration of cloud-based solutions into enterprise environments has transformed the way organizations manage business processes and foster collaboration. SAP Cloud Solutions, particularly through platforms like SAP Business... more

descriptionView Paper arrow_downwardDownload

Chemolithoautotrophic production mediating the cycling of the greenhouse gases N<sub>2</sub>O and CH<sub>4</sub> in an upwelling ecosystem

by Juan Faundez

2025, Biogeosciences

The high availability of electron donors occurring in coastal upwelling ecosystems with marked oxyclines favours chemoautotrophy, in turn leading to high N 2 O and CH 4 cycling associated with aerobic NH + 4 (AAO) and CH 4 oxidation... more

descriptionView Paper arrow_downwardDownload

Business Continuity & Disaster Recovery

by Ijaems Journal

2025

The world around us is moving at an extremely fast pace than ever before and Information technology plays a pivotal role in driving this momentum forward. With its growth and innovations, the biggest challenge information technology... more

descriptionView Paper arrow_downwardDownload

AUTOMATED CROSS-PLATFORM DATABASE MIGRATION AND HIGH AVAILABILITY IMPLEMENTATION

by VEERAVENKATA MARUTHI LAKSHMI GANESH NERELLA

2025, Turkish Journal of Computer and Mathematics Education (TURCOMAT)

The increasing reliance on data-driven decision-making in businesses and organizations has made database migration a critical aspect of modern IT infrastructures. Cross-platform database migration, while vital for evolving technological... more

descriptionView Paper arrow_downwardDownload

The NSCL cyclotron gas stopper – Entering commissioning

by Antonio C.C. Villari

2025, Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms

Linear gas stopping cells have been used successfully at NSCL to slow down ions produced by projectile fragmentation from the 100 MeV/u to the keV energy range. These 'stopped beams' have first been used for low-energy high precision... more

descriptionView Paper arrow_downwardDownload

Trunked Radio Solutions for Special Applications

by Sławomir Gajewski

2025, International Journal of Electronics and Telecommunications

In the paper modern concepts of radio communication trunking-dispatch systems for special applications are presented. Basic standards of TETRA, DMR, and cdma2000 are mentioned. The aim of the paper is to present innovative trunking... more

descriptionView Paper arrow_downwardDownload

Effect of age on the pharmacokinetics of sulphadimidine in West African Dwarf (WAD) goats following a single intramuscular administration

by Saganuwan Saganuwan

2025, GSC Biological and Pharmaceutical Sciences

Due to the low cost and effectiveness of sulphadimidine against a wide variety of animal diseases, it is still being widely used in veterinary medicine. The present study was carried out to determine the effect of age on the... more

descriptionView Paper arrow_downwardDownload

Availability Evaluation and Maintenance Policy of Data Center Infrastructure

by Kádna Camboim

2025

The convergence of communication networks and the demand for storage and processing capacities for large amounts of information, especially in recent years, has driven requests for everything-as-a-service and has been generating, on an... more

descriptionView Paper arrow_downwardDownload

Global Financial Operations with SAP S/4HANA Cloud Central Finance

by Surendra Annanki

2025, Global Journal of Engineering and Technology Advances

SAP S/4HANA Cloud Central Finance represent a strategic enabler for enterprises aiming to unify financial operations across fragmented SAP and non-SAP ERP environments. By centralizing financial data and processes within a single,... more

descriptionView Paper arrow_downwardDownload

Brief announcement

by Ricardo Gonçalves

2025, Proceedings of the 2012 ACM symposium on Principles of distributed computing - PODC '12

Version vectors (VV) are used pervasively to track dependencies between replica versions in multi-version distributed storage systems. In these systems, VV tend to have a dual functionality: identify a version and encode causal... more

descriptionView Paper arrow_downwardDownload

Modeling Service Availability in Web Clusters Architectures

by Magnos Martinello

2025, Anais do VII Workshop de Testes e Tolerância a Falhas (WTF 2006)

Internet is often used for transaction based applications such as online banking, stock trading, among many others where the service outages are unacceptable. It is important for designers of such applications to analyze how hardware,... more

descriptionView Paper arrow_downwardDownload

Wireless communications deployment in industry: a review of issues, options and technologies

by Javier Vales Alonso

2025, Computers in Industry

Present basis of knowledge management is the efficient share of information. The challenges that modern industrial processes have to face are multimedia information gathering and system integration, through large investments and adopting... more

descriptionView Paper arrow_downwardDownload

Building High Availability and Disaster Recovery Strategies for SQL Server with Real-Time Protection for Critical Systems

by Padma Rama Divya Achanta

2025, IRE Journals

In today's data-intensive business scenario, the integrity and availability of missioncritical systems are of utmost importance. Microsoft SQL Server is a heavily used product by organizations to store and manage critical business information. Accordingly, maintaining uninterrupted data access through strong High Availability (HA) and Disaster Recovery (DR) solutions is crucial to reduce downtime, avoid data loss, and ensure business continuity. This paper discusses the architecture, implementation, and best practices of constructing effective HA and DR plans particularly for SQL Server environments with emphasis on real-time protection of mission-critical applications. The paper discusses the comparative study of native SQL Server features like Always On Availability Groups, Failover Cluster Instances (FCIs), Log Shipping, Database Mirroring, and Backup/Restore approaches. It also addresses how third-party offerings and real-time data replication technologies aid in the optimization of recovery point objectives (RPO) and recovery time objectives (RTO). It looks into cloud DR solutions and hybrid models integration to discover how they can complement on-premises infrastructures with economical, scalable alternatives. In addition, the research discusses real-time HA/DR challenges including network latency, storage replication consistency, application failover compatibility, and the administrative burden in configuring and monitoring HA/DR solutions. Automation of failover processes, round-the-clock monitoring, and performing periodic DR drills to be well-prepared is also of much importance. Using case studies and actual implementations, the paper illustrates how businesses from various industries have been able to achieve near-zero downtime and fractional data loss by implementing sophisticated HA and DR mechanisms. Business size, budget limitations, compliance demands, and system criticality are used as guidelines to make recommendations for selecting the appropriate HA/DR strategy. Finally, this paper provides value to the IT practitioner and system architect by providing a well-defined method for designing, implementing, and managing high availability and disaster recovery plans for SQL Server. Highlighting real-time data protection is to ensure that mission-critical business systems continue to function even if there is hardware failure, cyber-attack, or natural disasters.

descriptionView Paper arrow_downwardDownload

Extending High Availability with Distributed Always on Using Patterns for Global SQL Server Continuity

by Padma Rama Divya Achanta

2025, International Journal of Computational Engineering Research

In the era of real-time data processing and global operations, ensuring the high availability (HA) of mission-critical databases is a cornerstone of IT infrastructure strategy.[1] SQL Server, a widely adopted relational database... more

descriptionView Paper arrow_downwardDownload

CockroachDB: The Resilient Geo-Distributed SQL Database

by Rebecca Taft

2025, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs with cloud-based systems capable of... more

descriptionView Paper arrow_downwardDownload

On the exploration of innovative concepts for fusion chamber technology

by Mohamed Sawan

2025, Fusion Engineering and Design

This study, called APEX, is exploring novel concepts for fusion chamber technology that can substantially improve the attractiveness of fusion energy systems. The emphasis of the study is on fundamental understanding and advancing the... more

descriptionView Paper arrow_downwardDownload

Building High Availability and Disaster Recovery Strategies for SQL Server with Real-Time Protection for Critical Systems

by Padma Rama Divya Achanta

2025, IRE Journals

descriptionView Paper arrow_downwardDownload

The evolution of a distributed operating system

by Andrew S Tanenbaum

2025, Springer eBooks

AMOEBA is a research project to build a true distributed operating system using the object model. Under the COST11-ter MANDIS project this work was extended to cover wide-area networks. Besides describing the system, this paper discusses... more

descriptionView Paper arrow_downwardDownload

Using Sparse Capabilities in a Distributed Operating System

by Andrew S Tanenbaum

2025, International Conference on Distributed Computing Systems

Most distributed operating systems constructed to date have lacked a unifying mechanism for naming and protection. In this paper we discuss a system, Amoeba, that uses capabilities for naming and protecting objects. In contrast to traditional, centralized operating systems, in which capabilities are managed by the operating system kernel, in Amoeba all the capabilities are managed directly by user code. To prevent tampering, the capabilities are protected cryptographically. The paper describes a variety of the issues involved, and gives four different ways of dealing with the access rights. This paper describes a scheme in which user processes manipulate capabilities directly in their own address spaces. Except for some very special parts of it, the kernel does not even know that capabilities are in use. To prevent users from forging new capabilities or tampering with existing ones, capabilities are protected cryptographically. This cryptographic protection scheme will first be described in some detail, followed by a discussion of how these capabilities are used in the Amoeba distributed operating system. Amoeba is an object-oriented distributed operating system. Its semantic model is based on having client processes perform operations on objects managed by server processes. Objects are specified by capabilities. Operations are carried out by having processes exchange messages, generally in the form of a request from a client followed later by a reply from a server. The standard message format provides a place for one capability in the header, typically for the object being operated on, but users are free to put other capabilities in the data field as required. The header also contains room for the operation code and some parameters. After making a request, a client blocks until the reply comes in, so the approach can be regarded as a simple remote procedure call mechanism . The system does not use "connections" or virtual circuits or any other long-lived communication structures.

descriptionView Paper arrow_downwardDownload

Fault tolerance using group communication

by Andrew S Tanenbaum

2025

We propose group communication as an efficient mechanism to support fault tolerance. Our approach is based on an efficient reliable broadcast protocol that requires on average only two messages per broadcast. To illustrate our approach we... more

descriptionView Paper arrow_downwardDownload

A wide-area Distribution Network for free software

by Andrew S Tanenbaum

2025, ACM Transactions on Internet Technology

The Globe Distribution Network (GDN) is an application for the efficient, worldwide distribution of freely redistributable software packages. Distribution is made efficient by encapsulating the software into special distributed objects... more

descriptionView Paper arrow_downwardDownload

Enhancing application robustness in cloud data centers

by Andrew Trossman

2025, … of CASCON 2011

We propose OX, a runtime system that uses application-level availability constraints and application topologies discovered on the fly to enhance resilience to infrastructure anomalies for cloud applications. OX allows application owners... more

descriptionView Paper arrow_downwardDownload

Residual gas fluorescence monitor for relativistic heavy ions at RHIC

by Tony Tsang

2025, Physical Review Special Topics - Accelerators and Beams

A residual gas fluorescence beam profile monitor at the relativistic heavy ion collider (RHIC) has successfully recorded vertical beam sizes of Au-ion beams from 3.85 to 100 GeV/n during the 2010 beam runs. Although the fluorescence cross... more

descriptionView Paper arrow_downwardDownload

OpenSESAME—the simple but extensive, structured availability modeling environment

by Arndt Bode Prof. Dr.

2025, Reliability Engineering & System Safety

This article describes the novel stochastic modeling tool OpenSESAME which allows for a quantitative evaluation of fault-tolerant High-Availability systems. The input models are traditional reliability block diagrams (RBD) which can be... more

descriptionView Paper arrow_downwardDownload

Free cupric ions in contaminated agricultural soils around a copper mine in eastern Nanjing City, China

by Xiao-San Luo

2025, Journal of Environmental Sciences

To determine the environmental free metal ion activity was a recent hot issue. A method to measure low-level free cupric ion activity in soil solution extracted with 0.01 mol/L KNO, was developed by using cupric ion-selective electrode... more

descriptionView Paper arrow_downwardDownload

International Linear Collider Accelerator Physics R

by George Gollin

2025

descriptionView Paper arrow_downwardDownload

Corrosive effects of Pb17Li/water interaction

by Pietro Agostini

2025, Fusion Engineering and Design

The interaction between Pb-17Li and water, as a consequence of a localized tube microcrack, has been studied. Two experiments were performed in which a low quantity of steam was injected into the lithium lead. The artificially machined... more

descriptionView Paper arrow_downwardDownload

A Replication Strategy for Surviving Network Failures in a Web-Based Sales System

by 100k Sales System

2025

Web-based systems include comprehensive interaction between component-based system objects in various situations on a wide-area network-based environment ]. Therefore, Web-based applications are vulnerable to network partitioning failures. A network partitioning splits the network into two or more disjoint parts. Processes of a Web-based application within the same part can communicate with each other, but they cannot communicate with processes of the application located in other parts. In order for the application to be continuously operational, data and processes must be replicated in the network. However, application processes may perform some incompatible operations that can result in inconsistent data during the network partitioning. The challenge is to let the application continue its operations during a network partitioning, yet to reconcile the effects of incompetible operations when the communication is restored ]. The application under our consideration is a Web-based sales system spread across three cities: Melbourne, Sydney and Geelong. Each city has a database that records the inventory of the regional warehouse and a group of salespersons that rely on the database for their sales activities. Each city also stores replications of databases of other cities. Therefore, a salesperson can also sale things stored in other cities' inventory databases, although majority of sales will be from the local inventory database. The aim of this paper is to develop a strategy that allows every part of the Web-based sales system to continue its operations during a network partitioning. When the network partitioning is recovered, a reconciliation process will bring the system to a consistent state. operations. In the back end, we use Java Database Connectivity (JDBC) for the servers in the transaction management system to access the physical data sources in various locations. This paper is structured as follows. In Section 2, we describe the background and the architecture of our Web-based sales system. In Section 3 we present the replication strategy for normal operations. The replication strategy for dealing with network partitioning and recovery is described in Section 4. Section 5 discusses some performance issues. In Section 6 we conclude the paper.

descriptionView Paper arrow_downwardDownload

Pembagian Beban Trafik pada Cluster Server

by Mila Kusumawardani

2025, Jurnal EECCIS (Electrics, Electronics, Communications, Controls, Informatics, Systems)

Permasalahan yang ada pada server adalah banyaknya user yang mengakses dalam waktu bersamaan. Untuk mengatasinya dapat digunakan konsep cluster server. Metode yang digunakan dalam artikel ini adalah membangun 3 cluster server web Â dan 1... more

descriptionView Paper arrow_downwardDownload

Network-aware heuristics for inter-domain meta-scheduling in Grids

by María Caminero

2025, Journal of Computer and System Sciences

Grid computing generally involves the aggregation of geographically distributed resources in the context of a particular application. As such resources can exist within different administrative domains, requirements on the communication... more

descriptionView Paper arrow_downwardDownload

A Fault-Tolerant Mobile Computing Model Based On Scalable Replica

by manish raj

2025, International Journal of Interactive Multimedia and Artificial Intelligence

The most frequent challenge faced by mobile user is stay connected with online data, while disconnected or poorly connected store the replica of critical data. Nomadic users require replication to store copies of critical data on their... more

descriptionView Paper arrow_downwardDownload

Tutorial on Networking for Digital Substations

by Gustavo Silvano

2025

descriptionView Paper arrow_downwardDownload

Tutorial on Networking for Digital Substations

by Gustavo Silvano

2025, 2019 72nd Conference for Protective Relay Engineers (CPRE)

descriptionView Paper arrow_downwardDownload

Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers

by Frank Somers

2025, Handbook of Nature-Inspired and Innovative Computing

However, cluster computing did not gain momentum until the convergence of three important trends in the 1980s: high-performance microprocessors, high-speed networks, and standard tools for high performance distributed computing. A... more

descriptionView Paper arrow_downwardDownload

An adaptable workflow system architecture on the Internet for electronic commerce applications

by Ibrahim Cingil

2025, Proceedings of the International Symposium on Distributed Objects and Applications

An electronic commerce (EC) process is a business process and defining it as a workflow provides all the advantages that come with this technology. Yet electronic commerce processes place certain demands on the workflow technology like... more

descriptionView Paper arrow_downwardDownload

Implementation of ICMP flood detection and mitigation system based on software-defined network and sFlow-RT

by TELKOMNIKA Team

2025, TELKOMNIKA Telecommunication Computing Electronics and Control

This study evaluates internet control message protocol (ICMP) flood detection and mitigation in software-defined networks (SDN) using an SDN architecture with sFlow-RT for real-time traffic monitoring. OpenFlow switches and sFlow agents... more

descriptionView Paper arrow_downwardDownload

Live Forensics for Distributed Storage Systems

by Zbigniew Kalbarczyk

2025, ArXiv

We present Kaleidoscope an innovative system that supports live forensics for application performance problems caused by either individual component failures or resource contention issues in large-scale distributed storage systems. The... more

descriptionView Paper arrow_downwardDownload

Driving factors of the phytoplankton functional groups in a deep Mediterranean reservoir

by Luciano Caputo

2025, Water Research

The control of phytoplankton growth is mainly related to the availability of light and nutrients. Both may select phytoplankton species, but only if they occur in limiting amounts. During the last decade, the functional groups approach, based on the physiological, morphological and ecological attributes of the species, has proved to be a more efficient way to analyze seasonal changes in phytoplankton biomass. We analysed the dynamics of the phytoplankton functional groups sensu Reynolds, recognising the driving forces (light, mixing regime, and nutrients) in the Sau Reservoir, based on a oneeyear cycle (monthly surface-water sampling). The Sau Reservoir is a Mediterranean water-supply reservoir with a canyon-shaped basin and a clear and mixed epilimnion layer. The long stratification period and high light availability led to high phytoplankton biomass (110.8 fresh-weight mg L À1 ) in the epilimnion during summer. The reservoir showed P-limitation for phytoplankton growth in this period. All functional groups included one or more species (X2-Rhodomonas spp.; Y-Cryptomonas spp.; F-Oocystis lacustris; K-Aphanocapsa spp.) selected by resources, especially phosphorus. Species of Cryptomonas (group Y) dominated during the mixing period (winter season) in conditions of low light and relatively high availability of dissolved nutrients. Increases in water-column stability during spring stratification led to phytoplankton biomass increases due to the dominance of small flagellate functional groups (X2 and X3, chrysophyceans). The colonial chlorophycean O. lacustris (group F) peaked during the mid-summer stratification, when the mixed epilimnion was clearly depleted in nutrients, especially SRP. High temperature and increases in nutrient concentration during the end-summer and mid-autumn resulted in a decrease of green algae (group F) and increase of Aphanocapsa spp. (cyanobacteria, group K) and dinoflagellates (group L o ). The study also revealed the important role of physical processes in the seasonal gradient, in selecting phytoplankton functional groups, and consequently in the assessment of ecological status. The Q index (assemblage index) based on functional

descriptionView Paper arrow_downwardDownload

AI-Driven Cloud Services for Guaranteed Disaster Recovery, Improved Fault Tolerance, and Transparent High Availability in Dynamic Cloud Systems

by Akshay Sharma and

2025, International Journal of Scientific Research in Science, Engineering and Technology

Cloud computing alters the way organizations manage and deploy their IT resource. It provides an organization with scalable, inexpensive, and flexible options. The complexity and dynamic nature of cloud environments pose a challenge to maintaining high availability at all times, especially when the system fails or a disaster arises. The legacy techniques of disaster recovery, fault tolerance, and high availability leave behind much to be desired. These techniques are mostly static, slow to respond, and have a dismal ability to adapt to continuously changing conditions in contemporary cloud systems. Such techniques largely depend on manual configurations and predefined policies; resulting in lots of inefficiencies and increases in the risk of service downtime. This research investigates the way Artificial Intelligence (AI) changes the paradigm on cloud resilience to promote adoption of intelligent systems for guaranteed disaster recovery, better fault-tolerant behavior, or transparent high availability. With machine learning algorithms, AI-based cloud services utilize large data volumes to reveal patterns within system logs, performance metrics, and user behavior data; thereby offering real-time anomaly detection and predictive failure analysis. For example, techniques like predictive analytics help cloud providers predict likely system outages, optimize the resources to be used, and automate failover processes (Xu et al., 2021; Lee & Kumar, 2022). AI-aided disaster recovery techniques employ complex algorithms to produce an adaptive backup mechanism, thereby minimizing loss and reducing restoration time. Fault tolerance in AI cloud systems comes from intelligent error correction, automatic isolation of faults, and self-healing features, i.e. repair of faults without the need for human supervision (Chen et al., 2020). Besides that, AI also contributes to high availability through intelligent load balancing, which ensures that at any given time, resources are optimally distributed throughout the network to sustain continuous service even during peak demand or unanticipated failures (Patel & Zhang, 2023). The approach is a comprehensive review of the various existing literature on the topic, empirical analysis of the current AI-driven cloud solutions available in the market, and case studies for comparison analysis on the different AI systems. The study scenario reveals that AI-driven solutions noticeably reduce downtimes, lead to improved recovery times, and contribute to overall system reliability as compared to traditional methods. However, other areas include model bias, data privacy, and continuous training of AI models. This study expands the trends of AI in the field of cloud computing by documenting the significance of intelligent systems in bridging traditional weaknesses of resilience strategies. It further signifies the need for AI into predictive maintenance, automated disaster response, and proactive fault management of rapidly changing dynamic cloud environments. Future studies will focus on AI integration along with edge computing and blockchain technologies for even more robust and secure services in cloud operations.

descriptionView Paper arrow_downwardDownload

AI-Driven Cloud Services for Guaranteed Disaster Recovery, Improved Fault Tolerance, and Transparent High Availability in Dynamic Cloud Systems

by bhushan chaudhari and

2025, International Journal of Scientific Research in Science, Engineering and Technology

The approach is a comprehensive review of the various existing literature on the topic, empirical analysis of the current AI-driven cloud solutions available in the market, and case studies for comparison analysis on the different AI... more

descriptionView Paper arrow_downwardDownload

High Availability

Key research themes

1. How can systems maintain continuous high availability despite sensor or component faults in layered cloud and edge computing environments?

2. What are the effective proactive and coordinated fault tolerance mechanisms to preserve reliability and minimize downtime for cloud virtual machines and parallel applications?

3. How can distributed database systems and cloud orchestrations be architected to achieve strong consistency, fault tolerance, and global high availability?

4. How can AI-driven techniques improve disaster recovery, fault tolerance, and high availability in dynamic cloud systems?

Related Topics

All papers in High Availability