Academia.eduAcademia.edu

Outline

Recovery in Parallel State-Machine Replication

Abstract

State-machine replication is a popular approach to building fault-tolerant systems, which relies on the sequential execution of commands to guarantee strong consistency. Sequential execution, however, threatens performance. Recently, several proposals have suggested parallelizing the execution model of the replicas to enhance state-machine replication's performance. Despite their success in accomplishing high performance, the implications of these models on recovery is mostly left unaddressed. In this paper, we focus on the recovery problem in the context of Parallel State-Machine Replication. We propose two novel algorithms and assess them through simulation and a real implementation.

References (14)

  1. Attiya, H., Welch, J.: Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley-Interscience (2004)
  2. Bessani, A., Santos, M., Felix, J., Neves, N., Correia, M.: On the efficiency of durable state machine replication. In: ATC (2001)
  3. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed sys- tems. Journal of the ACM 43(2), 225-267 (1996)
  4. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial syn- chrony. Journal of the ACM (JACM) 35(2), 288-323 (1988)
  5. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM) 32(2), 374-382 (1985)
  6. Kapritsos, M., Wang, Y., Quema, V., Clement, A., Alvisi, L., Dahlin, M.: All about eve: execute-verify replication for multi-core servers. In: OSDI. pp. 237-250. USENIX Association (2012)
  7. Kotla, R., Dahlin, M.: High throughput byzantine fault tolerance. In: DSN (2004)
  8. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7), 558-565 (1978)
  9. Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems (TOCS) 16(2), 133-169 (1998)
  10. Marandi, P.J., Bezerra, C.E.B., Pedone, F.: Rethinking state-machine replication for parallelism. In: ICDCS (2013)
  11. Marandi, P.J., Primi, M., Pedone, F.: High performance state-machine replication. In: DSN (2011)
  12. Marandi, P.J., Primi, M., Pedone, F.: Multi-Ring Paxos. In: DSN (2012)
  13. Santos, N., Schiper, A.: Achieving high-throughput state machine replication in multi-core systems. In: ICDCS (2013)
  14. Schneider, F.B.: Implementing fault-tolerant services using the state machine ap- proach: A tutorial. ACM Computing Surveys (CSUR) 22(4), 299-319 (1990)