A Survey of State Management in Big Data Processing Systems

Volker Markl

doi:10.48550/ARXIV.1702.01596

Outline

A Survey of State Management in Big Data Processing Systems

Volker Markl

2017, arXiv (Cornell University)

https://doi.org/10.48550/ARXIV.1702.01596

visibility

…

description

25 pages

link

1 file

Abstract

The concept of state and its applications vary widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Heron, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays, particularly, for iterative batch and stream processing, in this survey, we present examples of state as an enabler, discuss the alternative approaches used to handle and implement state, capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to open problems.

References (139)

C. Aggarwal, J. Han, J. Wang, P. Yu. A Framework for Clustering Evolving Data Streams. In VLDB, pages 81-92, 2003.
C. Aggarwal, P. Yu. A survey of synopsis construction in data streams. In Data Streams, Advances in Database Systems, vol. 31. Springer, New York, 2007.
D. Agrawal et al. Road to Freedom in Big Data Analytics. In EDBT, pages 479-484, 2016.
D. Agrawal et al. Rheem: Enabling Multi-Platform Task Execution. In SIGMOD, pages 2069-2072, 2016.
Y. Ahmad, O. Kennedy, C. Koch, M. Nikolic. DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. In PVLDB, 5(10):968-979, 2012.
A. Alexandrov et al. The Stratosphere platform for big data analytics. VLDB Journal, 23(6):939-964, 2014.
A. Alexandrov et al. Implicit Parallelism through Deep Language Embedding. In SIGMOD, pages 47-61, 2015.
R. Ananthanarayanan, et al. Photon: fault-tolerant and scalable joining of continuous data streams. In SIGMOD, pages 577- 588, 2013.
A. Arasu, S. Babu, and J. Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal, 15(2):121-142, 2006.
B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. In PVLDB, 4(3):173-184, 2010.
B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable k-means++. In PVLDB, 5(7):622-633, 2012.
O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. ULDBs: Databases with Uncertainty and Lineage. In VLDB, pages 953-964, 2006.
M. S. Bouguerra, D. Trystram, F. Wagner. Complexity Analysis of Checkpoint Scheduling with Variable Costs. IEEE Transactions on Computers, 62(6):1269-1275, 2013.
O. Boykin, S. Ritchie, I. O'Connell, J. Lin. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. In PVLDB, 7(13):1441-1451, 2014.
A. Brito, C. Fetzer, H. Sturzrehm, and P. Felber. Speculative Out-of-Order Event Processing with Software Transaction Memory. In DEBS, pages 265-275, 2008.
A. Z. Broder, R. Lempel, F. Maghoul, and J. Pedersen. Efficient PageRank approximation via graph aggregation. Information Retrieval, 9(2):123-138, 2006.
Y. Cai, P. G. Giarrusso, T. Rendel, and K. Ostermann. A theory of changes for higher-order languages: incrementalizing λ-calculi by static differentiation. In Programming Language Design and Implementation (PLDI), pages 145-155, 2014.
P. Carbone, G. Fóra, S. Ewen, S. Haridi, K. Tzoumas. Lightweight Asynchronous Snapshots for Distributed Dataflows. The Computing Research Repository (CoRR), abs/1506.08603, 2015.
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, K. Tzoumas. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Engineering Bulletin, 38(4):28- 38, 2015.
Z. Chen, J. Dongarra. Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing. IEEE Transactions on Computers, 58(11):1512-1524, 2009.
R. Chitta, R. Jin, T. C. Havens, and A. K. Jain. Approximate kernel k-means: solution to large scale kernel clustering. In Knowledge discovery and data mining (KDD), pages 895-903, 2011.
M. B. Cohen, S. Elder, C. Musco, and M. Persu. Dimensionality Reduction for k-means Clustering and Low Rank Approximation. In Symposium on Theory of Computing (STOC), pages 163-172, 2015.
T. Condie, N. Conway, P. Alvaro, and J. M. Hellerstein. MapReduce online. In NSDI, 2010.
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communication of ACM, 51(1):107-113, 2008.
L. Ding, N. Mehta, E. A. Rundensteiner, G. T. Heineman. Joining Punctuated Streams. In EDBT, pages 587-604, 2004.
J. Ding et al. Efficient Operator State Migration for Cloud- Based Data Stream Management Systems. The Computing Research Repository (CoRR), abs/1501.03619, 2016.
S. Dudoladov, C. Xu, S. Schelter, A. Katsifodimos, S. Ewen, K. Tzoumas, V. Markl. Optimistic Recovery for Iterative Dataflows in Action. In SIGMOD, pages 1439-1443, 2015.
C. Doulkeridis and K. Nørvåg. A Survey of Large-Scale Analytical Query Processing in MapReduce. VLDB Journal, 23(3):355-380, 2014.
J. Ekanayake and G. Fox. High performance parallel computing with clouds and cloud technologies. In CloudComp, 2009.
A. Elmore et al. A Demonstration of the BigDAWG Polystore System. In PVLDB, 8(12), 2015.
S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning fast iterative data flows. In PVLDB, 5(11):1268-1279, 2012.
S. Ewen, S. Schelter, K. Tzoumas, D. Warneke, V. Markl. Iterative parallel data processing with stratosphere: an inside look. In SIGMOD, pages 1053-1056, 2013.
L. Fegaras. Incremental Query Processing on Big Data Streams. TKDE, 2016.
L. Fegaras. An Algebra for Distributed Big Data Analytics. Technical Report, 2016.
Y.-H. Feng, et al. Efficient and Adaptive Stateful Replication for Stream Processing Engines in High-Availability Cluster. TPDS, 22(11):1788-1796, 2011.
R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management. In SIGMOD, 2013.
R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and P. Pietzuch. Making State Explicit for Imperative Big Data Processing. In USENIX ATC, 2014.
R. C. Fernandez, P. Garefalakis, P. Pietzuch. Java2SDG: Stateful Big Data Processing for the Masses. In ICDE, pages 1390-1393, 2016.
Y. Fujiwara, M. Nakatsuji, M. Onizuka, and M. Kitsuregawa. Fast and exact top-k search for random walk with restart. In PVLDB, 5(5):442-453, 2012.
Y. Fujiwara, M. Nakatsuji, T. Yamamuro, H. Shiokawa, and M. Onizuka. Efficient personalized pagerank with accuracy assurance. In Knowledge discovery and data mining (KDD), pages 15-23, 2012.
Y. Fujiwara, M. Nakatsuji, H. Shiokawa, T. Mishima, and M. Onizuka. Fast and exact top-k algorithm for pagerank. In Conference on Artificial Intelligence (AAAI), pages 1106- 1112, 2013.
M. Garofalakis, J. Gehrke, R. Rastogi. Querying and mining data streams: you only get one look (a tutorial). In SIGMOD, 2002.
B. Gedik. Partitioning functions for stateful data parallelism in stream processing. VLDB Journal, 23(4):517-539, 2014.
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 151-162, 2006.
V. Gulisano, R. J.-Peris, M. P.-Martínez, C. Soriente, P. Valduriez. StreamCloud: An Elastic and Scalable Data Stream System. TPDS, 23(12):2351-2365, 2012.
D. Hakkarinen and Z. Chen. Multilevel Diskless Checkpointing. IEEE Transactions on Computers, 62(4):772- 783, 2013.
M. A. Hammer, J. Dunfield, K. Headley, N. Labich, J. S. Foster, M. Hicks, and D. V. Horn. 2015. Incremental computation with names. SIGPLAN, 50(10):748-766, 2015.
M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Computing Surveys (CSUR), 46(4), 2014.
J. H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, S. Zdonik. High-availability algorithms for distributed stream processing. In ICDE, pages 779-790, 2005.
J. H. Hwang, Y. Xing, U. Cetintemel, S. Zdonik. A Cooperative, Self-Configuring High-Availability Solution for Stream Processing. In ICDE, 2007.
I. Jangjaimon and N.-F. Tzeng. Adaptive incremental checkpointing via delta compression for networked multicore systems. In IEEE IPDPS, pages 7-18, 2013.
T. Johnson, S. Muthukrishnan, and I. Rozenbaum. Sampling algorithms in a stream operator. In SIGMOD, pages 1-12, 2005.
T. Kanungo et al. A local search approximation algorithm for k-means clustering. In symposium on Computational geometry (SCG), pages 10-18, 2002.
C. Koch. Incremental query evaluation in a ring of databases. In PODS, pages 87-98, 2010.
C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, A. Shaikhha. DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB Journal, 23(2):253-278, 2014.
C. Koch, D. Lupei, V. Tannen. Incremental View Maintenance for Collection Programming. In PODS, pages 75-90, 2016.
Y. Kwon, M. Balazinska, and A. Greenberg. Fault-tolerant stream processing using a distributed, replicated file system. In PVLDB, 1(1):574-585, 2008.
B. Koldehofe, R. Mayer, U. Ramachandran, K. Rothermel, M. Völz. Rollback-recovery without checkpoints in distributed event processing systems. In DEBS, pages 27-38. 2013.
R. Kuntschke, B. Stegmaier, A. Kemper. Data Stream Sharing. Technical Report, TU Munich, 2005.
H. G. Li, S. Chen, J. Tatemura, D. Agrawal, K. S. Candan, W. P. Hsiung. Safety Guarantee of Continuous Join Queries over Punctuated Data Streams. In VLDB, pages 19-30, 2006.
J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: a new architecture for high-performance stream systems. In PVLDB, 1(1):274-288, 2008.
X. Lin, H. Lu, J. Xu, J.X. Yu. Continuously maintaining quantile summaries of the most recent N elements over a data stream. In ICDE, pages 362-373, 2004.
B. Liu, Y. Zhu, E. A. Rundensteiner. Run-Time Operator State Spilling for Memory Intensive Long-Running Queries. In SIGMOD, pages 347-358, 2006.
M. Liu, Z. G. Ives, and B. T. Loo. Enabling Incremental Query Re-Optimization. In SIGMOD, pages 1705-1720, 2016.
W. Liu, G. Li, and J. Cheng. Fast PageRank approximation by adaptive sampling. Knowledge of Information System, 42(1):127-146, 2015.
D. Logothetis, K. Yocum. Data Indexing for Stateful, Large- scale Data Processing. In NETDB, 2009.
D. Logothetis, C. Olston, B. Reed, K.C. Webb, K. Yocum. Stateful Bulk Processing for Incremental Analytics. In ACM Symposium on Cloud Computing (SoCC), pages 51-62, 2010.
G. Losa et al. CAPSULE: language and system support for efficient state sharing in distributed stream processing systems. In DEBS, pages 268-277, 2012.
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. In PVLDB, 5(8):716-727, 2012.
V. Markl: Breaking the Chains: On Declarative Data Analysis and Data Independence in the Big Data Era. In PVLDB, 7(13):1730-1733, 2014.
N. Marz and J. Warren. Big Data: Principles and best practices of scalable realtime data systems. ISBN 9781617290343, 328 pages, 2015.
T. D. Matteis, G. Mencagli. Parallel Patterns for Window- Based Stateful Operators on Data Streams: An Algorithmic Skeleton Approach. Journal of Parallel Programming, pages 1-20, 2016.
F. McSherry, R. Isaacs, M. Isard, D. G. Murray. Composable Incremental and Iterative Data-Parallel Computation with Naiad. Technical Report number MSR-TR-2012-105. Microsoft Research Silicon Valley, 2012.
F. McSherry, D. G. Murray, R. Isaacs, M. Isard. Differential Dataflow. In CIDR, 2013.
J. Meehan et al. S-Store: streaming meets transaction processing. In PVLDB, 8(13):2134-2145, 2015.
J. Meehan, S. Zdonik, S. Tian, Y. Tian, N. Tatbul, A. Dziedzic, A. Elmore. Integrating Real-Time and Batch Processing in a Polystore. In High-Performance Extreme Computing Conference (HPEC), 2016.
I. Mitliagkas, M. Borokhovich, A. G. Dimakis, and C. Caramanis. FrogWild!: fast PageRank approximations on graph engines. In PVLDB, 8(8):874-885, 2015.
M. Mokbel, M. Lu, and W. Aref. Hash-merge join: A non- blocking join algorithm for producing fast and early join results. In ICDE, pages 251-262, 2004.
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In ACM Symposium on Operating Systems Principles (SOSP), pages 439-455, 2013.
K. G. S. Madsen, Y. Zhou. Dynamic Resource Management in a Massively Parallel Stream Processing Engine. In CIKM, pages 13-22, 2015.
K. G. S. Madsen, Y. Zhou, J. Cao. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine. The Computing Research Repository (CoRR), abs/1602.03770, 2016.
N. Naksinehaboon et al. Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments. In CCGRID, pages 783-788, 2008.
B. Nicolae and F. Cappello. AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing. In High-performance parallel and distributed computing (HPDC), pages 155-166, 2013.
M. Nikolic, M. Elseidy, C. Koch. LINVIEW: incremental view maintenance for complex analytical queries. In SIGMOD, pages 253-264, 2014.
M. Nikolic, M. Dashti, C. Koch. How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates. In SIGMOD, pages 511-526, 2016.
B. Ottenwalder, B. Koldehofe, K. Rothermel, and U. Ramachandran. MigCEP: operator migration for mobility driven distributed complex event processing. In DEBS, pages 183-194, 2013.
S. Padmanabhan, T. Malkemus, A. Jhingran, and R. Agarwal. Block Oriented Processing of Relational Database Operations in Modern Computer Architectures. In ICDE, pages 567-574, 2001.
J. X. Parreira, D. Donato, S. Michel, and G. Weikum. Efficient and decentralized PageRank approximation in a peer-to-peer web search network. In PVLDB, 415-426, 2006.
J. X. Parreira et al. The JXP Method for Robust PageRank Approximation in a Peer-to-Peer Web Search Network. The VLDB Journal, 17(2):291-313, 2008.
M. Paun et al. Incremental Checkpoint Schemes for Weibull Failure Distribution. Journal on Foundation of Computer Science, 21(3):329-344, 2010.
P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, M. Seltzer. Network-aware operator placement for stream-processing systems. In ICDE, 2006.
K. Ren, T. Diamond, D. J. Abadi, and A. Thomson. Low- Overhead Asynchronous Checkpointing in Main-Memory Database Systems. In SIGMOD, pages 1539-1551, 2016.
Y. Robert, F. Vivien, D. Zaidouni. On the complexity of scheduling checkpoints for computational workflows. In DSN, pages 1-6, 2012.
N. E. Sayed and B. Schroeder. Checkpoint/Restart in Practice: When Simple is Better. In IEEE International Conference on Cluster Computing (CLUSTER), pages 84-92, 2014.
S. Schelter, S. Ewen, K. Tzoumas, V. Markl. "All Roads Lead to Rome:" Optimistic Recovery for Distributed Iterative Data Processing. In CIKM, pages 1919-1928, 2013.
Z. Sebepou, and K. Magoutis. CEC: Continuous Eventual Checkpointing for data stream processing operators. In DSN, pages 145-156, 2011.
J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache Aware Optimization of Stream Programs. In Languages, Compiler, and Tool Support for Embedded Systems (LCTES), pages 115-126, 2005.
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran and M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. In ICDE, 2003.
G. J. Silva, B. Gedik, H. Andrade, K.-L. Wu. Language Level Checkpointing Support for Stream Processing Applications. In DSN, 2009.
L. Su and Y. Zhou. Tolerating correlated failures in Massively Parallel Stream Processing Engines. In ICDE, pages 517-528, 2016.
N. Tatbul et al. Handling Shared, Mutable State in Stream Processing with Correctness Guarantees. IEEE Data Engineering Bulletin, 38(4):94-104, 2015.
A. Toshniwal et al. Storm@twitter. In SIGMOD, pages 147-156, 2014.
Y. -C. Tu, S. Liu, S. Prabhakar, and B. Yao. Load shedding in stream databases: a control-based approach. In VLDB, pages 787-798, 2006.
P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. TKDE, 15(3):555-568, 2003.
P. Upadhyaya et al. A Latency and Fault-Tolerance Optimizer for Online Parallel Query Plans. In SIGMOD, pages 241-252, 2011.
T. Urhan and M. J. Franklin. Xjoin: A reactively- scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27-33, 2000.
S. Viglas, J. F. Naughton, and J. Burger. Maximizing the output rate of multi-way join queries over streaming information sources. In VLDB, pages 285-296, 2003.
H. Wang, L.-S. Peh, E. Koukoumidis, S. Tao, M. C. Chan. Meteor Shower: A Reliable Stream Processing System for Commodity Data Centers. In IEEE IPDPS, pages 1180- 1191, 2012.
M. Weimer, T. Condie, and R. Ramakrishnan. Machine learning in ScalOps, a higher order cloud computing language. In NIPS BigLearn, Vol. 9, pages 389-396, 2011.
X. Wu, et al. Top 10 algorithms in data mining. Knowledge Information System, 14(1):1-37, 2007.
Y. Wu, K. Tan. elastic stateful stream computation in the cloud. In ICDE, pages 723-734, 2015.
C. Xu, M. Holzemer, M. Kaul, V. Markl. Efficient fault- tolerance for iterative graph processing on distributed dataflow systems. In ICDE, pages 613-624, 2016.
Z. B. Yossef and L. Mashiach. Local approximation of PageRank and reverse PageRank. In Research and development in information retrieval (SIGIR), pages 865-866, 2008.
J. W. Young. A first order approximation to the optimum checkpoint interval. Communication of ACM, 17(9), 1974.
W. Yu, X. Lin, W. Zhang. Fast incremental SimRank on link-evolving graphs. In ICDE, pages 304-315, 2014.
M. Zaharia et al. Resilient Distributed Datasets: A Fault- Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012.
G. Zeng. Fast approximate k-means via cluster closures.
In Computer Vision and Pattern Recognition (CVPR), pages 3037-3044, 2012.
H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, M. Zhang. In- memory big data management and processing: A survey. TKDE, 27 (7):1920-1948, 2015.
F. Zhu, Y. Fang, K. C.-C. Chang, and J. Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. In PVLDB, 6(6):481-492, 2013.
F. Zhu, Y. Fang, K. C.-C. Chang, and J. Ying. Scheduled approximation for Personalized PageRank with Utility-based Hub Selection. VLDB Journal, 24(5):655-679, 2015.
Y. Zhu, E. Rundensteiner, G. T. Heineman. Dynamic Plan Migration for Continuous Queries over Data Streams. In SIGMOD, 2004.
M. Zinkevich, M. Weimer, A. J. Smola, and L. Li. Parallelized stochastic gradient descent. In Neural Information Processing Systems (NIPS), pages 2595-2603, 2010.
P. Van Roy, S. Haridi. Concepts, Techniques, and Models of Computer Programming. MIT Press, Cambridge, 2004.
S. Sakr, A. Liu, A. Fayoumi. The Family of MapReduce and Large Scale Data Processing Systems. Journal of ACM Computing Surveys (ACM CSUR), 46(1), 2013.
P. Carbone, S. Ewen, G. Fóra, S. Haridi, S. Richter, K. Tzoumas. State Management in Apache Flink: Consistent Stateful Distributed Stream Processing. PVLDB 10(12): 1718-1729, 2017.
N. R Katsipoulakis, A. Labrinidis, P.K. Chrysanthis. A Holistic View of Stream Partitioning Costs. PVLDB, 10(11): 1286-1297, 2017.
MAU Nasir, GDF Morales, D García-Soriano, N Kourtellis, M Serafini. The power of both choices: Practical load balancing for distributed stream processing engines. In ICDE, pages 137-148, 2015.
L. Wang, T. Z. J. Fu, R. T. B. Ma, M. Winslett, Z. Zhang. Elasticutor: Rapid Elasticity for Realtime Stateful Stream Processing. The Computing Research Repository (CoRR), abs/1711.01046, 2017.
C. Hochreiner, M. Vögler, S. Schulte, S. Dustdar. Elastic Stream Processing for the Internet of Things. In CLOUD, pages 100-107, 2016.
S. Kulkarni et al. Twitter Heron: Stream Processing at Scale. In SIGMOD, pages 239-250, 2015.
D. J. Abadi et al. The design of the Borealis stream processing engine. In CIDR, pages 277-289, 2005.
M. Balazinska, H. Balakrishnan, S. Madden, M. Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. In SIGMOD, pages 13-24, 2005.
M. Balazinska, H. Balakrishnan, S. Madden, M. Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. TODS, 33(1):1-44, 2008.
Apache Flink, http://flink.apache.org/
Apache Samza, http://samza.apache.org/
Apache Spark, http://spark.apache.org/
Kappa Architecture, http://kappa-architecture.com
MAU Nasir, GDF Morales, N. Kourtellis, M. Serafini. When two choices are not enough: Balancing at scale in Distributed Stream Processing. In ICDE, pages 589-600, 2016

A Survey of State Management in Big Data Processing Systems

Sign up for access to the world's latest research

Abstract

Related papers

References (139)

Related papers

Related topics