The Design of the Borealis Stream Processing Engine
2004
Abstract
Borealis is a second-generation distributed stream processing engine that is being developed at Brandeis University, Brown University, and MIT. Borealis inherits core stream processing functionality from Aurora and distribution functionality from Medusa . Borealis modifies and extends both systems in non-trivial and critical ways to provide advanced capabilities that are commonly required by newly-emerging stream processing applications.
References (51)
- D. J. Abadi, D. Carney, U. C ¸etintemel, M. Cherniack, C. Con- vey, C. Erwin, E. F. Galvez, M. Hatoun, J.-H. Hwang, A. Maskey, A. Rasin, A. Singer, M. Stonebraker, N. Tatbul, Y. Xing, R. Yan, and S. B. Zdonik. Aurora: A Data Stream Management System. In ACM SIGMOD Conference, June 2003.
- D. J. Abadi, D. Carney, U. C ¸etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A New Model and Architecture for Data Stream Management. VLDB Jour- nal, 12(2), August 2003.
- P. M. G. Apers. Data allocation in distributed database systems. ACM TODS, 13(3), 1988.
- A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nizhizawa, J. Rosenstein, and J. Widom. STREAM: The Stanford Stream Data Manager. In ACM SIGMOD Conference, June 2003.
- A. Arasu, S. Babu, and J. Widom. CQL: A Language for Continuous Queries over Streams and Relations. In DBPL Workshop, Sep. 2003.
- A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. Maskey, E. Ryvk- ina, M. Stonebraker, and R. Tibbetts. Linear Road: A Stream Data Management Benchmark. In VLDB Conference, Sept. 2004.
- R. Avnur and J. M. Hellerstein. Eddies: Continuously Adaptive Query Processing. In ACM SIGMOD Conference, May 2000.
- B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In ACM SIGMOD Conference, June 2003.
- B. Babcock, M. Datar, and R. Motwani. Load Shedding for Aggre- gation Queries over Data Streams. In IEEE ICDE Conference, April 2004.
- H. Balakrishnan, M. Balazinska, D. Carney, U. C ¸etintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tat- bul, R. Tibbetts, and S. Zdonik. Retrospective on Aurora. VLDB Journal, Special Issue on Data Stream Processing, 2004. to appear.
- M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker. Availability-consistency trade-offs in a fault-tolerant stream pro- cessing system. Technical Report TR974, MIT, November 2004.
- M. Balazinska, H. Balakrishnan, and M. Stonebraker. Contract- Based Load Management in Federated Distributed Systems. In NSDI Symposium, March 2004.
- E. A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46-55, 2001.
- D. Carney, U. C ¸etintemel, M. Cherniack, C. Convey, S. Lee, G. Sei- dman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring Streams -A New Class of Data Management Applications. In VLDB Conference, Hong Kong, China, August 2002.
- D. Carney, U. C ¸etintemel, A. Rasin, S. Zdonik, M. Cherniack, and M. Stonebraker. Operator Scheduling in a Data Stream Manager. In VLDB Conference, Berlin, Germany, September 2003.
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous Dataflow Processing. In ACM SIGMOD Conference, June 2003.
- S. Chandrasekaran, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous Dataflow Processing for an Un- certain World. In CIDR Conference, January 2003.
- M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. C ¸etintemel, Y. Xing, and S. Zdonik. Scalable Distributed Stream Processing. In CIDR Conference, Asilomar, CA, January 2003.
- C. Collett, P. Habraken, T. Coupaye, and M. Adiba. Active rules for the software engineering platform GOODSTEP. In 2nd Interna- tional Workshop on Database and Software Engineering, 1994.
- A. Das, J. Gehrke, and M. Riedewald. Approximate Join Processing Over Data Streams. In ACM SIGMOD Conference, June 2003.
- R. Gallager. A minimum delay routing algorithm using distributed computation. IEEE Transactions on Communication, 25(1), 1977.
- M. N. Garofalakis and Y. E. Ioannidis. Multi-dimensional resource scheduling for parallel queries. In ACM SIGMOD Conference, 1996.
- S. Ghandeharizadeh, R. Hull, and D. Jacobs. Heraclitus: Elevating deltas to be first-class citizens in a database programming language. ACM TODS, 21(3), 1996.
- J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of repli- cation and a solution. In ACM SIGMOD Conference, June 1996.
- A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques and applications. IEEE Data Engineering Bul- letin, 18(2), 1995.
- J. M. Hellerstein, R. Avnur, A. Chou, C. Olston, V. Raman, T. Roth, C. Hidber, and P. Haas. Interactive Data Analysis: The Control Project. IEEE Computer, August 1999.
- J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online Aggregation. In ACM SIGMOD Conference, May 1997.
- J.-H. Hwang, M. Balazinska, A. Rasin, U. C ¸etintemel, M. Stone- braker, and S. Zdonik. High-availability algorithms for distributed stream processing. In IEEE ICDE Conference, April 2005.
- D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In ACM STOC Symposium, 1997.
- O. Koremien, J. Kramer, and J. Magee. Scalable, adaptive load shar- ing for distributed systems. IEEE parallel and distributed technol- ogy: systems and applications, 1(3), 1993.
- S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. The Design of an Acquisitional Query Processor for Sensor Networks. In ACM SIGMOD Conference, June 2003.
- R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query Process- ing, Approximation, and Resource Management in a Data Stream Management System. In CIDR Conference, January 2003.
- P. Radoslavov, R. Govindan, and D. Estrin. Topology-informed in- ternet replica placement. In WCW'01: Web Caching and Content Distribution Workshop, Boston, MA, June 2001.
- V. Raman and J. M. Hellerstein. Partial results for online query processing. In ACM SIGMOD Conference, June 2002.
- V. Raman, B. Raman, and J. M. Hellerstein. Online Dynamic Re- ordering. VLDB Journal, 9(3), 2000.
- R. Reiter. On specifying database updates. Journal of Logic Pro- gramming, 25(1), 1995.
- D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Ofir. Deciding when to forget in the Elephant file system. In ACM SOSP Symposium, December 1999.
- K. Schloegel, G. Karypis, and V. Kumar. Graph partitioning for high performance scientific simulations. In CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000.
- M. Shah, J. Hellerstein, and E. Brewer. Highly-available, fault- tolerant, parallel dataflows. In ACM SIGMOD Conference, June 2004.
- N. G. Shivaratri, P. Krueger, and M. Singhal. Load distributing for locally distributed systems. Computer, 25(12), 1992.
- E. Simon and J. Kiernan. The A-RDL System. 1996.
- U. Srivastava and J. Widom. Flexible time management in data stream systems. In ACM PODS Symposium, June 2004.
- M. Stonebraker. The Design of the POSTGRESS Storage System. In VLDB Conference, Brighton, England, September 1987.
- N. Tatbul, U. C ¸etintemel, S. Zdonik, M. Cherniack, and M. Stone- braker. Load Shedding in a Data Stream Manager. In VLDB Con- ference, Berlin, Germany, September 2003.
- P. A. Tucker, D. Maier, and T. Sheard. Applying punctuation schemes to queries over continuous data streams. IEEE Data En- gineering Bulletin, 26(1), Mar. 2003.
- P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting Punc- tuation Semantics in Continuous Data Streams. TKDE, 15(3), 2003.
- J. Widom and S. J. Finkelstein. A syntax and semantics for set- oriented production rules in relational database systems (extended abstract). SIGMOD Record, 18(3), 1989.
- M. Willebeek and A. Reeves. Strategies for dynamic load balanc- ing on highly parallel computers. IEEE Trans. on parallel and dis- tributed systems, 4(9), September 1993.
- O. Wolfson, S. Jajodia, and Y. Huang. An adaptive data replication algorithm. ACM TODS, 22(2), 1997.
- Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the borealis stream processor. In IEEE ICDE Conference, April 2005.
- S. Zdonik, M. Stonebraker, M. Cherniack, U. C ¸etintemel, M. Bal- azinska, and H. Balakrishnan. The Aurora and Medusa Projects. IEEE Data Engineering Bulletin, 26(1), March 2003.