Distributed operating systems
1995
Abstract
Distributed systems span a wide spectrum in the design space. In this paper we will look at the various kinds and discuss some of the reliability issues involved. In the first half of the paper we will concentrate on the causes of unreliability, illustrating these with some general solutions and examples. Among the issues treated are interprocess communication, machine crashes, server redundancy, and data integrity. In the second half of the paper, we will examine one distributed operating system, Amoeba, to see how reliability issues have been handled in at least one real system, and how the pieces fit together.
References (20)
- REFERENCES
- Tanenbaum, A.S., and van Renesse, R.: ''Distributed Operating Systems,'' Computing Surveys, vol. 17, pp. 419-470, Dec, 1985
- Lampson, B.W. ''Atomic Transactions,'' in Distributed Systems -Architecture and Implemen- tation, Berlin: Springer-Verlag, pp. 246-265, 1981
- Zimmermann, H. ''OSI Reference Model-The ISO Model of Architecture for Open Systems Interconnection,'' IEEE Trans. Commun. , vol. COM-28, pp. 425-432, April 1980.
- Birrell, A.D., and Nelson, B.J. ''Implementing Remote Procedure Calls,'' ACM Trans. Comput. Systems , vol. 2, pp. 39-59, Feb. 1984.
- Nelson, B.J. ''Remote Procedure Call,'' Tech. Rep. CSL-81-9, Xerox PARC, 1981.
- Spector, A.Z. ''Performing Remote Operations Efficiently on a Local Computer Network,'' Commun. ACM, vol. 25, pp. 246-260, April 1982.
- Birrell, A.D. ''Secure Communication Using Remote Procedure Calls,'' ACM Trans. Comput. Syst. , vol. 3, pp. 1-14, Feb. 1985.
- Borg, A., Baumbach, J., and Glazer, S. ''A Message System Supporting Fault Tolerance,'' Proc. Ninth Symp. Operating Syst. Prin., ACM, pp. 90-99, 1983.
- Powell, M.L., and Presotto, D.L. ''Publishing-A Reliable Broadcast Communication Mechan- ism,'' Proc. Ninth Symp. Operating Syst. Prin., ACM, pp. 100-109, 1983.
- Cooper, E. ''Replicated Distributed Programs,'' Proc. Tenth Symp. on Oper. Syst. Prin., ACM, 1985.
- Thomas, R.H.: ''A Majority Consensus Approach to Concurrency Control,'' ACM Trans. on Database Syst, vol 4, pp. 180-209, June 1979.
- Gifford, D.K.: ''Weighted Voting for Replicated Data,'' Proc. Seventh Symp. on Operating Syst, Prin, ACM, 1979.
- Pu, C., Noe, J.D., and Proudfoot, A. ''Regeneration of Replicated Objects: A Technique and its Eden Implementation,'' Proc. Second Int'l Conf. on Data Engineering , pp. 175-187, Feb. 1986.
- Mullender, S.J., and Tanenbaum, A.S. ''Protection and Resource Control in Distributed Operating Systems,'' Computer Networks , vol 8, pp. 421-432, Nov. 1984.
- Mullender, S.J., and Tanenbaum, A.S. ''A Distributed File Service Based on Optimistic Con- currency Control,'' Proc. Tenth Symp. Operating Syst. Prin., ACM, pp. 51-62, 1985.
- Mullender, S.J., and Tanenbaum, A.S. ''The Design of a Capability-Based Distributed Operat- ing System,'' Computer Journal, vol. 29, pp. 289-299, Aug. 1986.
- Tanenbaum, A.S., Mullender, S.J., and van Renesse, R.: ''Using Sparse Capabilities in a Dis- tributed Operating System,'' Proc. 6th Int'l Conf, on Distr. Comp. Syst. , IEEE, pp. 558-563, 1986.
- Evans, A., Kantrowitz, W., and Weiss, E.: ''A User Authentication Scheme not Requiring Secrecy in the Computer,'' Communications of the ACM, vol. 17, pp. 437-442, Aug. 1974.
- Bal, H.E., van Renesse, R., and Tanenbaum, A.S. ''A Distributed, Parallel, Fault Tolerant Computing System,'' Report IR-106, Dept. of Math. and Comp. Sci., Vrije Univ., Oct. 1985.