Academia.eduAcademia.edu

Outline

NoSQL database systems: a survey and decision guidance

2016, Computer Science - Research and Development

https://doi.org/10.1007/S00450-016-0334-3

Abstract

Today, data is generated and consumed at unprecedented scale. This has lead to novel approaches for scalable data management subsumed under the term NoSQL database systems to handle the everincreasing data volume and request loads. However, the heterogeneity and diversity of the numerous existing systems impede the well-informed selection of a data store appropriate for a given application context. Therefore, this article gives a top-down overview of the eld: Instead of contrasting the implementation specics of individual representatives, we propose a comparative classication model that relates functional and non-functional requirements to techniques and algorithms employed in NoSQL databases. This NoSQL Toolbox allows us to derive a simple decision tree to help practitioners and researchers lter potential system candidates based on central application requirements.

Key takeaways
sparkles

AI

  1. The article introduces a NoSQL Toolbox for selecting appropriate NoSQL systems based on application requirements.
  2. NoSQL systems are categorized by data models: key-value, document, and wide-column stores.
  3. The CAP theorem outlines trade-offs between consistency, availability, and partition tolerance in distributed systems.
  4. PACELC extends the CAP theorem by addressing trade-offs during normal operation and partitions.
  5. Sharding and replication techniques are crucial for achieving scalability and fault tolerance in NoSQL databases.

References (50)

  1. Abadi, D.: Consistency tradeos in modern distributed database system design: Cap is only part of the story. Computer 45(2), 3742 (2012)
  2. Attiya, H., Bar-Noy, A., Dolev, D., other: Sharing memory robustly in message- passing systems. JACM 42(1), 124142 (1995)
  3. Bailis, P., Kingsbury, K.: The network is reliable. Commun. ACM 57(9), 4855 (2014)
  4. Baker, J., Bond, C., Corbett, J.C., other: Megastore: Providing scalable, highly available storage for interactive services. In: CIDR. pp. 223234 (2011)
  5. Bernstein, P.A., Cseri, I., Dani, N., other: Adapting microsoft sql server for cloud computing. In: 27th ICDE. pp. 12551263. IEEE (2011)
  6. Boykin, O., Ritchie, S., O'Connell, I., Lin, J.: Summingbird: A framework for integrating batch and online mapreduce computations. VLDB 7(13), 14411451 (2014)
  7. Brewer, E.A.: Towards robust distributed systems. (2000)
  8. Calder, B., Wang, J., Ogus, A., other: Windows azure storage: a highly available cloud storage service with strong In: 23th SOSP. ACM (2011)
  9. Chang, F., Dean, J., Ghemawat, S., other: Bigtable: A distributed storage system for structured data. In: 7th OSDI. pp. 1515. USENIX Association (2006)
  10. Charron-Bost, B., Pedone, F., Schiper, A. (eds.): Replication: Theory and Practice, Lecture Notes in Computer Science, vol. 5959. Springer (2010)
  11. Cooper, B.F., Ramakrishnan, R., Srivastava, U., other: Pnuts: Yahoo!'s hosted data serving platform. Proceedings of the VLDB Endowment 1(2), 12771288 (2008)
  12. Corbett, J.C., Dean, J., Epstein, M., other: Spanner: Google's globally-distributed database. In: Proceedings of OSDI. pp. 251264. USENIX Association (2012)
  13. Curino, C., Jones, E., Popa, R.A., other: Relational cloud: A database service for the cloud. In: 5th CIDR (2011)
  14. Das, S., Agrawal, D., El Abbadi, A., other: G-store: a scalable data store for transactional multi key access in the cloud. In: 1st SoCC. pp. 163174. ACM (2010)
  15. Davidson, S.B., Garcia-Molina, H., Skeen, D., other: Consistency in a partitioned network: a survey. SUR 17(3), 341370 (1985)
  16. Dean, J.: Designs, lessons and advice from building large distributed systems (2009), keynote talk at LADIS 2009
  17. Dean, J., Ghemawat, S.: Mapreduce: simplied data processing on large clusters. Communications of the ACM 51(1), 107113 (2008)
  18. DeC andia, G., Hastorun, D., other: Dynamo: amazon's highly available key-value store. In: 21th SOSP. pp. 205220. ACM (2007)
  19. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374382 (Apr 1985)
  20. Gessert, F., Schaarschmidt, M., Wingerath, W., Friedrich, S., Ritter, N.: The cache sketch: Revisiting expiration-based caching in the age of cloud data management. In: BTW. pp. 5372 (2015)
  21. Gilbert, S., Lynch, N.: Brewer's conjecture and the feasibility of consistent, avail- able, partition-tolerant web services. SIGACT News 33(2), 5159 (June 2002)
  22. Gray, J., Hell and, P., other: The dangers of replication and a solution. SIGMOD Rec. 25(2), 173182 (Jun 1996)
  23. Haerder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287317 (Dec 1983)
  24. Hamilton, J.: On designing and deploying internet-scale services. In: 21st LISA. pp. 18:118:12. USENIX Association (2007)
  25. Hellerstein, J.M., Stonebraker, M., Hamilton, J.: Architecture of a database system. Now Publishers Inc (2007)
  26. Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463492 (Jul 1990)
  27. Hoelzle, U., Barroso, L.A.: The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers (2009)
  28. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: USENIXATC. USENIX Association (2010)
  29. Kallman, R., Kimura, H., Natkins, J., other: H-store: a high-performance, dis- tributed main memory transaction processing system. VLDB Endowment (2008)
  30. Karger, D., Lehman, E., Leighton, T., other: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: 29th STOC. pp. 654663. ACM (1997)
  31. Kleppmann, M.: Designing data-intensive applications. O Reilly, to appear (2016)
  32. Kraska, T., Pang, G., Franklin, M.J., other: Mdcc: Multi-data center consistency. In: 8th EuroSys. pp. 113126. ACM (2013)
  33. Kreps, J.: Questioning the lambda architecture (2014), accessed: 2015-12-17
  34. Lakshman, A., Malik, P.: Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 3540 (Apr 2010)
  35. Laney, D.: 3d data management: Controlling data volume, velocity, and variety. Tech. rep., META Group (February 2001)
  36. Lloyd, W., Freedman, M.J., Kaminsky, M., other: Don't settle for eventual: Scalable causal consistency for wide-area storage with cops. In: 23th SOSP. ACM (2011)
  37. Mahajan, P., Alvisi, L., Dahlin, M., other: Consistency, availability, and conver- gence. University of Texas at Austin Tech Report 11 (2011)
  38. Mao, Y., Junqueira, F.P., Marzullo, K.: Mencius: building ecient replicated state machines for wans. In: OSDI. vol. 8, pp. 369384 (2008)
  39. Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications Co. (2015)
  40. Min, C., Kim, K., Cho, H., other: Sfs: random write considered harmful in solid state drives. In: FAST. p. 12 (2012)
  41. "Ozsu, M.T., Valduriez, P.: Principles of distributed database systems. Springer Science & Business Media (2011)
  42. Pritchett, D.: Base: An acid alternative. Queue 6(3), 4855 (May 2008)
  43. Qiao, L., Surlaker, K., Das, S., other: On brewing fresh espresso: Linkedin's dis- tributed data serving platform. In: SIGMOD. pp. 11351146. ACM (2013)
  44. Sadalage, P.J., Fowler, M.: NoSQL distilled : a brief guide to the emerging world of polyglot persistence. Addison-Wesley, Upper Saddle River, NJ (2013)
  45. Shapiro, M., Preguicc a, N., Baquero, C., other: A comprehensive study of conver- gent and commutative replicated data types. Ph.D. thesis, INRIA (2011)
  46. Shukla, D., Thota, S., Raman, K., other: Schema-agnostic indexing with azure documentdb. Proceedings of the VLDB Endowment 8(12), 16681679 (2015)
  47. Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo- replicated systems. In: 23th SOSP. pp. 385400. ACM (2011)
  48. Stonebraker, M., Madden, S., Abadi, D.J., other: The end of an architectural era: (it's time for a complete rewrite). In: 33rd VLDB. pp. 11501160 (2007)
  49. Wiese, L., other: Advanced Data Management: For SQL, NoSQL, Cloud and Dis- tributed Databases. Walter de Gruyter GmbH & Co KG (2015)
  50. Zhang, H., Chen, G., Ooi, B.C., other: In-memory big data management and processing: A survey. TKDE (2015)