Academia.eduAcademia.edu

Outline

The Design of the Borealis Stream Processing Engine

2004

Abstract

Borealis is a second-generation distributed stream processing engine that is being developed at Brandeis University, Brown University, and MIT. Borealis inherits core stream processing functionality from Aurora and distribution functionality from Medusa . Borealis modifies and extends both systems in non-trivial and critical ways to provide advanced capabilities that are commonly required by newly-emerging stream processing applications.

References (51)

  1. D. J. Abadi, D. Carney, U. C ¸etintemel, M. Cherniack, C. Con- vey, C. Erwin, E. F. Galvez, M. Hatoun, J.-H. Hwang, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R. Yan, and S. B. Zdonik. Aurora: A Data Stream Management System. In ACM SIGMOD Conference, June 2003.
  2. D. J. Abadi, D. Carney, U. C ¸etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. VLDB Jour- nal, 12(2), August 2003.
  3. P. M. G. Apers. Data allocation in distributed database systems. ACM TODS, 13(3), 1988.
  4. A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nizhizawa, J. Rosenstein, and J. Widom. STREAM: The Stanford Stream Data Manager. In ACM SIGMOD Conference, June 2003.
  5. A. Arasu, S. Babu, and J. Widom. CQL: A Language for Continuous Queries over Streams and Relations. In DBPL Workshop, Sep. 2003.
  6. A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. Maskey, E. Ryvk- ina, M. Stonebraker, and R. Tibbetts. Linear Road: A Stream Data Management Benchmark. In VLDB Conference, Sept. 2004.
  7. R. Avnur and J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. In ACM SIGMOD Conference, May 2000.
  8. B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In ACM SIGMOD Conference, June 2003.
  9. B. Babcock, M. Datar, and R. Motwani. Load Shedding for Aggre- gation Queries over Data Streams. In IEEE ICDE Conference, April 2004.
  10. H. Balakrishnan, M. Balazinska, D. Carney, U. C ¸etintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tat- bul, R. Tibbetts, and S. Zdonik. Retrospective on Aurora. VLDB Journal, Special Issue on Data Stream Processing, 2004. to appear.
  11. M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker. Availability-consistency trade-offs in a fault-tolerant stream pro- cessing system. Technical Report TR974, MIT, November 2004.
  12. M. Balazinska, H. Balakrishnan, and M. Stonebraker. Contract- Based Load Management in Federated Distributed Systems. In NSDI Symposium, March 2004.
  13. E. A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46-55, 2001.
  14. D. Carney, U. C ¸etintemel, M. Cherniack, C. Convey, S. Lee, G. Sei- dman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring Streams -A New Class of Data Management Applications. In VLDB Conference, Hong Kong, China, August 2002.
  15. D. Carney, U. C ¸etintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator Scheduling in a Data Stream Manager. In VLDB Conference, Berlin, Germany, September 2003.
  16. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous Dataflow Processing. In ACM SIGMOD Conference, June 2003.
  17. S. Chandrasekaran, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous Dataflow Processing for an Un- certain World. In CIDR Conference, January 2003.
  18. M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. C ¸etintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. In CIDR Conference, Asilomar, CA, January 2003.
  19. C. Collett, P. Habraken, T. Coupaye, and M. Adiba. Active rules for the software engineering platform GOODSTEP. In 2nd Interna- tional Workshop on Database and Software Engineering, 1994.
  20. A. Das, J. Gehrke, and M. Riedewald. Approximate Join Processing Over Data Streams. In ACM SIGMOD Conference, June 2003.
  21. R. Gallager. A minimum delay routing algorithm using distributed computation. IEEE Transactions on Communication, 25(1), 1977.
  22. M. N. Garofalakis and Y. E. Ioannidis. Multi-dimensional resource scheduling for parallel queries. In ACM SIGMOD Conference, 1996.
  23. S. Ghandeharizadeh, R. Hull, and D. Jacobs. Heraclitus: Elevating deltas to be first-class citizens in a database programming language. ACM TODS, 21(3), 1996.
  24. J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of repli- cation and a solution. In ACM SIGMOD Conference, June 1996.
  25. A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques and applications. IEEE Data Engineering Bul- letin, 18(2), 1995.
  26. J. M. Hellerstein, R. Avnur, A. Chou, C. Olston, V. Raman, T. Roth, C. Hidber, and P. Haas. Interactive Data Analysis: The Control Project. IEEE Computer, August 1999.
  27. J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online Aggregation. In ACM SIGMOD Conference, May 1997.
  28. J.-H. Hwang, M. Balazinska, A. Rasin, U. C ¸etintemel, M. Stone- braker, and S. Zdonik. High-availability algorithms for distributed stream processing. In IEEE ICDE Conference, April 2005.
  29. D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In ACM STOC Symposium, 1997.
  30. O. Koremien, J. Kramer, and J. Magee. Scalable, adaptive load shar- ing for distributed systems. IEEE parallel and distributed technol- ogy: systems and applications, 1(3), 1993.
  31. S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. The Design of an Acquisitional Query Processor for Sensor Networks. In ACM SIGMOD Conference, June 2003.
  32. R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query Process- ing, Approximation, and Resource Management in a Data Stream Management System. In CIDR Conference, January 2003.
  33. P. Radoslavov, R. Govindan, and D. Estrin. Topology-informed in- ternet replica placement. In WCW'01: Web Caching and Content Distribution Workshop, Boston, MA, June 2001.
  34. V. Raman and J. M. Hellerstein. Partial results for online query processing. In ACM SIGMOD Conference, June 2002.
  35. V. Raman, B. Raman, and J. M. Hellerstein. Online Dynamic Re- ordering. VLDB Journal, 9(3), 2000.
  36. R. Reiter. On specifying database updates. Journal of Logic Pro- gramming, 25(1), 1995.
  37. D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Ofir. Deciding when to forget in the Elephant file system. In ACM SOSP Symposium, December 1999.
  38. K. Schloegel, G. Karypis, and V. Kumar. Graph partitioning for high performance scientific simulations. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000.
  39. M. Shah, J. Hellerstein, and E. Brewer. Highly-available, fault- tolerant, parallel dataflows. In ACM SIGMOD Conference, June 2004.
  40. N. G. Shivaratri, P. Krueger, and M. Singhal. Load distributing for locally distributed systems. Computer, 25(12), 1992.
  41. E. Simon and J. Kiernan. The A-RDL System. 1996.
  42. U. Srivastava and J. Widom. Flexible time management in data stream systems. In ACM PODS Symposium, June 2004.
  43. M. Stonebraker. The Design of the POSTGRESS Storage System. In VLDB Conference, Brighton, England, September 1987.
  44. N. Tatbul, U. C ¸etintemel, S. Zdonik, M. Cherniack, and M. Stone- braker. Load Shedding in a Data Stream Manager. In VLDB Con- ference, Berlin, Germany, September 2003.
  45. P. A. Tucker, D. Maier, and T. Sheard. Applying punctuation schemes to queries over continuous data streams. IEEE Data En- gineering Bulletin, 26(1), Mar. 2003.
  46. P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting Punc- tuation Semantics in Continuous Data Streams. TKDE, 15(3), 2003.
  47. J. Widom and S. J. Finkelstein. A syntax and semantics for set- oriented production rules in relational database systems (extended abstract). SIGMOD Record, 18(3), 1989.
  48. M. Willebeek and A. Reeves. Strategies for dynamic load balanc- ing on highly parallel computers. IEEE Trans. on parallel and dis- tributed systems, 4(9), September 1993.
  49. O. Wolfson, S. Jajodia, and Y. Huang. An adaptive data replication algorithm. ACM TODS, 22(2), 1997.
  50. Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In IEEE ICDE Conference, April 2005.
  51. S. Zdonik, M. Stonebraker, M. Cherniack, U. C ¸etintemel, M. Bal- azinska, and H. Balakrishnan. The Aurora and Medusa Projects. IEEE Data Engineering Bulletin, 26(1), March 2003.