Academia.eduAcademia.edu

High Availability

description2,942 papers
group127 followers
lightbulbAbout this topic
High Availability (HA) refers to the design and implementation of systems and components that ensure a high level of operational performance and uptime, minimizing downtime and service interruptions. It involves redundancy, failover mechanisms, and fault tolerance to maintain continuous service availability, particularly in critical applications and environments.
lightbulbAbout this topic
High Availability (HA) refers to the design and implementation of systems and components that ensure a high level of operational performance and uptime, minimizing downtime and service interruptions. It involves redundancy, failover mechanisms, and fault tolerance to maintain continuous service availability, particularly in critical applications and environments.

Key research themes

1. How can systems maintain continuous high availability despite sensor or component faults in layered cloud and edge computing environments?

This research theme explores fault detection and fault tolerance mechanisms at the sensor and component level within multi-layered cloud and edge computing systems. It focuses on maintaining high availability (HA) despite sensor failures that may otherwise disable fault detection capabilities. The relevance lies in ensuring uninterrupted service delivery in complex infrastructures composed of multiple interdependent layers, addressing both hardware and software component failures without human intervention to avoid downtime.

Key finding: Proposes a novel high availability mechanism using dynamic fault model reconstruction to tolerate sensor faults by reconfiguring the fault detection logic with remaining healthy sensors without human intervention. The... Read more
Key finding: Emphasizes the importance of HA and fault tolerance at the edge in industrial IoT deployments and presents an architecture whereby edge devices act as middleware to integrate various systems, ensuring minimal latency and... Read more
Key finding: Investigates fault tolerance strategies suitable for large-scale parallel computing systems to maintain availability and data consistency. The paper shows that by deploying software-layer checkpointing and rollback recovery... Read more

2. What are the effective proactive and coordinated fault tolerance mechanisms to preserve reliability and minimize downtime for cloud virtual machines and parallel applications?

This theme focuses on proactive fault tolerance strategies that anticipate failures based on system health indicators within cloud infrastructures hosting parallel applications across virtual machines (VMs). It addresses coordinated fault tolerance involving VM migration and resource optimization to prevent failures and reduce system unavailability by minimizing checkpoint frequency and downtime, enhancing reliability in cloud data centers with large-scale parallel workloads.

Key finding: Introduces a two-step proactive fault tolerance approach for cloud virtual clusters: CPU temperature modeling predicts deteriorating physical machines, and VM coordinated migration via an improved particle swarm optimization... Read more
Key finding: Compares local single cluster, local multiple clusters, and multiple cloud clusters, showing that multi-cluster approaches improve scalability, fault tolerance, resource allocation, and availability. Experimental results... Read more
Key finding: Identifies Kubernetes' built-in repair actions as insufficient for achieving carrier-grade high availability for stateful microservices. Proposes an HA State Controller that handles application state replication and automatic... Read more

3. How can distributed database systems and cloud orchestrations be architected to achieve strong consistency, fault tolerance, and global high availability?

This area investigates architectural design patterns, replication protocols, and orchestration mechanisms that support high availability in distributed data storage and cloud infrastructure. It includes hybrid replication protocols for ensuring data consistency and availability, strategies for global database replication with failover capabilities, and container orchestration tools for resilient service deployment and scaling. These insights aim to guide systems supporting geo-distributed workloads with minimal downtime and strong data guarantees.

Key finding: Develops Styx++, a hybrid replication system integrating Paxos for configuration management and Chain Replication for database nodes, balancing strong consistency and fault tolerance with minimal performance degradation.... Read more
Key finding: Describes CockroachDB’s architecture for geo-distributed scalable SQL workloads with fault tolerance and high availability, using replication across diverse geographic zones and a novel transaction protocol for strong... Read more
Key finding: Examines how Docker Swarm, a container orchestration tool, facilitates efficient resource management, auto-scaling, and load balancing across multi-node clusters to enhance availability and performance of distributed web... Read more
Key finding: Provides foundational insights into availability classes and the challenge of building systems with 'five nines' (99.999%) availability. It emphasizes the necessity of fault tolerance mechanisms including redundancy,... Read more
Key finding: Proposes a cost-effective HA architecture using virtualization where virtual servers migrate and mirror system states between primary and secondary servers, employing delta encoding to minimize data transfer overhead. The... Read more

4. How can AI-driven techniques improve disaster recovery, fault tolerance, and high availability in dynamic cloud systems?

This research theme evaluates the application of artificial intelligence (AI) and machine learning methods to enhance cloud reliability. AI enables predictive failure detection, automated fault management, intelligent load balancing, and self-healing capabilities, exceeding limitations of static rule-based resilience methods. Insights cover how AI supports adaptive resource optimization, reduces downtime, and improves recovery speed, while acknowledging challenges such as model bias and data privacy.

Key finding: Presents an integrative review and empirical analysis demonstrating that AI-based cloud resilience solutions leveraging predictive analytics and self-healing architectures notably reduce service downtime, enhance recovery... Read more

All papers in High Availability

This article presents a strategic framework for implementing master data management excellence within enterprise organizations, with particular emphasis on manufacturing and distribution environments. The article explores the critical... more
Com a expansão das organizações e de suas redes computacionais, sua alta disponibilidade é um requisito chave. Até mesmo curtos períodos de inatividade de uma rede podem gerar perdas de produtividade. Diante disso, o presente trabalho... more
Disasters are an inevitable part or our lives. Much of the work and tools used currently are intended to address the first stage to a disaster which is response. Work related to a later stage, long-term disaster recovery, is scarce and an... more
Checkpoint-recovery based virtual machine (VM) replication is an attractive technique for accommodating VM installations with high-availability. It provides seamless failover for the entire software stack executed in the VM regardless the... more
High availability and reliability are among the most desirable features of control systems in modern High-Energy Physics (HEP) and other big-scale scientific experiments. One of the recent developments that has influenced this field was... more
The evolving dynamics of global finance necessitate the adoption of advanced digital platforms to streamline operations, enhance data-driven decision-making, and ensure regulatory compliance. SAP BW/4HANA, as a next-generation data... more
In today's world, where technology plays a crucial role in development, education is also benefiting from advancements in network infrastructure and virtualized services. Virtualization technology and highavailability (HA) infrastructures... more
The integration of Artificial Intelligence (AI) and Machine Learning (ML) into SAP platforms is revolutionizing enterprise resource planning by driving automation, predictive insights, and intelligent decision-making. This paper explores... more
The rapid digital transformation in the financial sector has created an urgent demand for deploying machine learning and AI models at scale in a secure, efficient, and scalable manner. Traditional model deployment techniques often lack... more
The integration of cloud-based solutions into enterprise environments has transformed the way organizations manage business processes and foster collaboration. SAP Cloud Solutions, particularly through platforms like SAP Business... more
The high availability of electron donors occurring in coastal upwelling ecosystems with marked oxyclines favours chemoautotrophy, in turn leading to high N 2 O and CH 4 cycling associated with aerobic NH + 4 (AAO) and CH 4 oxidation... more
The world around us is moving at an extremely fast pace than ever before and Information technology plays a pivotal role in driving this momentum forward. With its growth and innovations, the biggest challenge information technology... more
The increasing reliance on data-driven decision-making in businesses and organizations has made database migration a critical aspect of modern IT infrastructures. Cross-platform database migration, while vital for evolving technological... more
Linear gas stopping cells have been used successfully at NSCL to slow down ions produced by projectile fragmentation from the 100 MeV/u to the keV energy range. These 'stopped beams' have first been used for low-energy high precision... more
In the paper modern concepts of radio communication trunking-dispatch systems for special applications are presented. Basic standards of TETRA, DMR, and cdma2000 are mentioned. The aim of the paper is to present innovative trunking... more
Due to the low cost and effectiveness of sulphadimidine against a wide variety of animal diseases, it is still being widely used in veterinary medicine. The present study was carried out to determine the effect of age on the... more
The convergence of communication networks and the demand for storage and processing capacities for large amounts of information, especially in recent years, has driven requests for everything-as-a-service and has been generating, on an... more
SAP S/4HANA Cloud Central Finance represent a strategic enabler for enterprises aiming to unify financial operations across fragmented SAP and non-SAP ERP environments. By centralizing financial data and processes within a single,... more
Version vectors (VV) are used pervasively to track dependencies between replica versions in multi-version distributed storage systems. In these systems, VV tend to have a dual functionality: identify a version and encode causal... more
Internet is often used for transaction based applications such as online banking, stock trading, among many others where the service outages are unacceptable. It is important for designers of such applications to analyze how hardware,... more
Present basis of knowledge management is the efficient share of information. The challenges that modern industrial processes have to face are multimedia information gathering and system integration, through large investments and adopting... more
In today's data-intensive business scenario, the integrity and availability of missioncritical systems are of utmost importance. Microsoft SQL Server is a heavily used product by organizations to store and manage critical business... more
In the era of real-time data processing and global operations, ensuring the high availability (HA) of mission-critical databases is a cornerstone of IT infrastructure strategy.[1] SQL Server, a widely adopted relational database... more
We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs with cloud-based systems capable of... more
This study, called APEX, is exploring novel concepts for fusion chamber technology that can substantially improve the attractiveness of fusion energy systems. The emphasis of the study is on fundamental understanding and advancing the... more
In today's data-intensive business scenario, the integrity and availability of missioncritical systems are of utmost importance. Microsoft SQL Server is a heavily used product by organizations to store and manage critical business... more
AMOEBA is a research project to build a true distributed operating system using the object model. Under the COST11-ter MANDIS project this work was extended to cover wide-area networks. Besides describing the system, this paper discusses... more
Most distributed operating systems constructed to date have lacked a unifying mechanism for naming and protection. In this paper we discuss a system, Amoeba, that uses capabilities for naming and protecting objects. In contrast to... more
We propose group communication as an efficient mechanism to support fault tolerance. Our approach is based on an efficient reliable broadcast protocol that requires on average only two messages per broadcast. To illustrate our approach we... more
The Globe Distribution Network (GDN) is an application for the efficient, worldwide distribution of freely redistributable software packages. Distribution is made efficient by encapsulating the software into special distributed objects... more
We propose OX, a runtime system that uses application-level availability constraints and application topologies discovered on the fly to enhance resilience to infrastructure anomalies for cloud applications. OX allows application owners... more
A residual gas fluorescence beam profile monitor at the relativistic heavy ion collider (RHIC) has successfully recorded vertical beam sizes of Au-ion beams from 3.85 to 100 GeV/n during the 2010 beam runs. Although the fluorescence cross... more
This article describes the novel stochastic modeling tool OpenSESAME which allows for a quantitative evaluation of fault-tolerant High-Availability systems. The input models are traditional reliability block diagrams (RBD) which can be... more
To determine the environmental free metal ion activity was a recent hot issue. A method to measure low-level free cupric ion activity in soil solution extracted with 0.01 mol/L KNO, was developed by using cupric ion-selective electrode... more
The interaction between Pb-17Li and water, as a consequence of a localized tube microcrack, has been studied. Two experiments were performed in which a low quantity of steam was injected into the lithium lead. The artificially machined... more
Web-based systems include comprehensive interaction between component-based system objects in various situations on a wide-area network-based environment ]. Therefore, Web-based applications are vulnerable to network partitioning... more
Permasalahan yang ada pada server adalah banyaknya user yang mengakses dalam waktu bersamaan. Untuk mengatasinya dapat digunakan konsep cluster server. Metode yang digunakan dalam artikel ini adalah membangun 3 cluster server web  dan 1... more
Grid computing generally involves the aggregation of geographically distributed resources in the context of a particular application. As such resources can exist within different administrative domains, requirements on the communication... more
The most frequent challenge faced by mobile user is stay connected with online data, while disconnected or poorly connected store the replica of critical data. Nomadic users require replication to store copies of critical data on their... more
However, cluster computing did not gain momentum until the convergence of three important trends in the 1980s: high-performance microprocessors, high-speed networks, and standard tools for high performance distributed computing. A... more
An electronic commerce (EC) process is a business process and defining it as a workflow provides all the advantages that come with this technology. Yet electronic commerce processes place certain demands on the workflow technology like... more
This study evaluates internet control message protocol (ICMP) flood detection and mitigation in software-defined networks (SDN) using an SDN architecture with sFlow-RT for real-time traffic monitoring. OpenFlow switches and sFlow agents... more
We present Kaleidoscope an innovative system that supports live forensics for application performance problems caused by either individual component failures or resource contention issues in large-scale distributed storage systems. The... more
The control of phytoplankton growth is mainly related to the availability of light and nutrients. Both may select phytoplankton species, but only if they occur in limiting amounts. During the last decade, the functional groups approach,... more
Cloud computing alters the way organizations manage and deploy their IT resource. It provides an organization with scalable, inexpensive, and flexible options. The complexity and dynamic nature of cloud environments pose a challenge to... more
The approach is a comprehensive review of the various existing literature on the topic, empirical analysis of the current AI-driven cloud solutions available in the market, and case studies for comparison analysis on the different AI... more
Download research papers for free!