Enabling Real-Time Querying of Live and Historical Stream Data
2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)
https://doi.org/10.1109/SSDBM.2007.34Abstract
Applications that query data streams in order to identify trends, patterns, or anomalies can often benefit from comparing the live stream data with archived historical stream data. However, searching this historical data in real time has been considered so far to be prohibitively expensive. One of the main bottlenecks is the update costs of the indices over the archived data. In this paper, we address this problem by using our highly-efficient bitmap indexing technology (called FastBit) and demonstrate that the index update operations are sufficiently efficient for this bottleneck to be removed. We describe our prototype system based on the TelegraphCQ streaming query processor and the FastBit bitmap index. We present a detailed performance evaluation of our system using a complex query workload for analyzing real network traffic data. The combined system uses TelegraphCQ to analyze streams of traffic information and FastBit to correlate current behaviors with historical trends. We demonstrate that our system can simultaneously analyze (1) live streams with high data rates and (2) a large repository of historical stream data.
References (29)
- A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. Tech- nical Report 2003-67, Stanford University, 2003.
- E. W. Bethel, S. Campbell, E. Dart, K. Stockinger, and K. Wu. Accelerating Network Traffic Analysis Using Query-Driven Visualization. In 2006 IEEE Symposium on Visual Analytics Science and Technology (to appear), 2006.
- C. Y. Chan and Y. E. Ioannidis. An Efficient Bitmap Encoding Scheme for Selection Queries. In SIGMOD, 1999.
- S. Chandrasekaran and M. Franklin. Remembrance of Streams Past: Overload-Sensitive Management of Archived Streams. In VLDB, 2004.
- S. Chandrasekaran and O. Cooper et al. TelegraphCQ: Con- tinuous Dataflow Processing for an Uncertain World. In CIDR, 2003.
- B.-C. Chen, V. Yegneswaran, P. Barford, and R. Ramakrish- nan. Toward a Query Language for Network Attack Data. In NetDB Workshop, 2006.
- C. D. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. Gigascope: A Stream Database for Network Applications. In SIGMOD, 2003.
- H. Dreger, A. Feldmann, V. Paxson, and R. Sommer. Opera- tional experiences with high-volume network intrusion detec- tion. In CCS, pages 2-11. ACM Press, 2004.
- M. Franklin, S. Jeffery, S. Krishnamurthy, F. Reiss, S. Rizvi, E. Wu, O. Cooper, A. Edakkunni, and W. Hong. Design Con- siderations for High Fan-in Systems: The HiFi Approach. In CIDR, 2005.
- L. Golab and M. T. Özsu. Issues in Data Stream Management. SIGMOD Record, 32(2), 2003.
- G. Graefe. Goetz graefe. SIGMOD Record, 18(9):509-516, 2006.
- P. J. Haas and J. M. Hellerstein. Ripple joins for online ag- gregation. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD, pages 287-298. ACM Press, 1999.
- G. Iannaccone. CoMo: An Open Infrastructure for Network Monitoring -Research Agenda, 2005.
- S. Kornexl, V. Paxson, H. Dreger, A. Feldmann, and R. Som- mer. Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic. In Internet Mea- surement Conference, 2005.
- P. O'Neil and D. Quass. Improved Query Performance with Variant Indices. In SIGMOD, 1997.
- V. Paxson. Bro: A System for Detecting Network Intruders in Real-Time. In USENIX Security Symposium, January 1998.
- T. Plagemann, V. Goebel, A. Bergamini, G. Tolu, G. Urvoy- Keller, and E. W. Biersack. Using Data Stream Management Systems for Traffic Analysis -A Case Study. In Passive and Active Measurements, 2004.
- F. Reiss. Data Triage. PhD thesis, University of California, Berkeley, 2006.
- F. Reiss and J. M. Hellerstein. Declarative Network Moni- toring with an Underprovisioned Query Processor. In ICDE, 2006.
- M. Roesch. Snort-Lightweight Intrusion Detection for Net- works. In USENIX LISA, 1999.
- D. Rotem, K. Stockinger, and K. Wu. Optimizing Candidate Check Costs for Bitmap Indices. In CIKM, 2005.
- M. A. Shah, J. M. Hellerstein, and E. Brewer. Highly Avail- able, Fault-Tolerant, Parallel Dataflows. In SIGMOD, 2004.
- A. Shoshani, L. Bernardo, H. Nordberg, D. Rotem, and A. Sim. Multidimensional Indexing and Query Coordination for Tertiary Storage Management. In SSDBM, July 1999.
- M. Siekkinen, E. W. Biersack, V. Goebel, T. Plagemann, and G. Urvoy-Keller. InTraBase: Integrated Traffic Analysis Based on a Database Management System. In Workshop on End-to-End Monitoring Techniques and Services, May 2005.
- K. Stockinger and K.Wu et al. Network Traffic Analysis With Query Driven Visualization -SC 2005 HPC Analytics Re- sults. In Super Computing, 2005.
- M. Sullivan and A. Heybey. Tribeca: A system for Managing Large Databases of Network Traffic. In USENIX, 1998.
- K. Wu, E. Otoo, and A. Shoshani. A Performance Compari- son of Bitmap Indices. In CIKM, 2001.
- K. Wu, E. Otoo, and A. Shoshani. On the Performance of Bitmap Indices for High Cardinality Attributes. In VLDB, 2004.
- K. Wu, E. J. Otoo, and A. Shoshani. An Efficient Com- pression Scheme For Bitmap Indices. ACM Transactions on Database Systems, 31:1-38, 2006.