Academia.eduAcademia.edu

Outline

Computing the Number of Calls Dropped Due to Failures

2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering

https://doi.org/10.1109/ISSRE.2010.18

Abstract

Defects per million (DPM), defined as the number of calls out of a million dropped due to failures, is an important service (un)reliability measure for telecommunication systems. Most previous research derives the DPM from steady-state system availability model. In this paper, we develop a novel method for DPM computation which takes into consideration not only system availability, but also the impact of service application as well as the transient behavior of failure recovery. We illustrate this approach using a real system which is the IBM SIP SLEE cluster. Our method takes into account software/hardware failures, different stages of recovery, different phases of call flow, retry attempts and the interactions between call flow and failure/recovery behavior.

References (11)

  1. S. R. Ali. Digital Switching Systems: System Reliability and Analysis. McGraw-Hill Professional Publishing, 1997.
  2. S. Garg, Y. Huang, C. Kintala, K. Trivedi, and S. Yagnik. Perfor- mance and reliability evaluation of passive replication schemes in application level fault tolerance. In Proc. FTCS, 1999.
  3. J. F. Hayes and T. V. J. G. Babu. Modeling and Analysis of Telecommunications Networks. John Wiley and Sons, 2004.
  4. C. R. Johnson, Y. Kogan, Y. Levy, F. Saheban, and P. Tarapore. VoIP reliability: a service provider's perspective. IEEE Commu- nications Magazine, 42(7), 2004.
  5. M. Kaaniche, K. Kanoun, and M. Martinello. A user-perceived availability evaluation of a web based travel agency. Proc. DSN, 2003.
  6. G. E. Mahdy. Disaster Management in Telecommunications, Broadcasting and Computer Systems. John Wiley and Sons, 2001.
  7. M. Martinello. Availability Modeling and Evaluation of Web-based Services -A Pragmatic Approach. Ph.D. Thesis, LAAS, Toulouse, France, 2005.
  8. V. B. Mendiratta. Reliability analysis of clustered computing systems. In Proc. ISSRE, 1999.
  9. P. Stavroulakis. Reliability, Survivability and Quality of Large Scale Telecommunication Systems: Case Study: Olympic Games. John Wiley and Sons, 2003.
  10. K. Trivedi, D. Wang, J. Hunt, A. Rindos, W. E. Smith, and B. Vashaw. Availability modeling of sip protocol on ibm web- sphere. In Proc. PRDC, 2008.
  11. G. Bolch, S. Greiner, H. de Meer, and K. Trivedi. Queueing Net- works and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley, second edition, 2006.