Scalable Machine Learning and Related Technologies

Karthik Ramasubramanian

doi:10.1007/978-1-4842-2334-5_9

Outline

Title

Abstract

Generating Data and Storing It in a Local File

Loading the Data Into the Hive Table

Loading the Data

Creating the Table and Put Data

Scalable Machine Learning and Related Technologies

Karthik Ramasubramanian

2016, Machine Learning Using R

https://doi.org/10.1007/978-1-4842-2334-5_9

visibility

…

description

37 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract
AI

Scalable machine learning is increasingly relevant due to advancements in infrastructure, data accessibility, and software development. This chapter discusses the importance of distributed computing in handling large datasets, emphasizes the transition from traditional algorithms to scalable solutions, and introduces key big data technologies like Apache Hadoop and Spark. It highlights how these technologies facilitate efficient data processing and computation, enabling organizations to harness the potential of big data in real-world applications.

Vijay Agneeswaran

Big Data, 2013

The article explains the three generations of machine learning algorithms -with all three trying to operate on Big-data. The first generation tools are SAS, SPSS etc., while second generations realizations include

downloadDownload free PDF View PDFchevron_right

A Cloud-Based Framework for Machine Learning Workloads and Applications

Marica Antonacci

IEEE Access

In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid-DataCloud framework allows transparent access to existing e-Infrastructures, effectively exploiting distributed resources for the most compute-intensive tasks coming from the machine learning development cycle. Moreover, it provides scientists with a set of Cloud-oriented services to make their models publicly available, by adopting a serverless architecture and a DevOps approach, allowing an easy share, publish and deploy of the developed models. INDEX TERMS Cloud computing, computers and information processing, deep learning, distributed computing, machine learning, serverless architectures.

downloadDownload free PDF View PDFchevron_right

Machine Learning in Production: From Experimented ML Model to System

Pritom Bhowmik

Production ML pipeline refers to a complete end-to-end workflow of a machine learning product ready for deployment. In recent years, companies have vastly invested in Machine Learning research; developers are developing new tools and technologies to make ML more flexible. Now, we can experience AI in most devices around us, from home appliances to cars. When we want to develop an AI-powered product, it is vital to understand the crucial workflows of the ML. Academic research to develop an ML model and a production ML pipeline are entirely different scenarios. From business problems, data collection to deploying the model is an acutely iterative process. Most of the time, Data scientists and Machine Learning Engineers need to deal with issues like data shift, concept shift, model decay, etc. Sometimes, there are need to change the complete ML architecture or how the features are engineered in the dataset. It will become tedious if someone is working in such an environment and lacks a...

downloadDownload free PDF View PDFchevron_right

Statistics and Machine Learning at Scale New Technologies Apply Machine Learning to Big Data

thomas mcclure

This paper is on Statistics and Machine Learning at Scale New Technologies Apply Machine Learning to Big Data.

downloadDownload free PDF View PDFchevron_right

Machine Learning at Work : Speeding up Discovery

Abdallah Bari

Machine Learning (ML), which is a subset of Artificial intelligence (AI), enhances the ability of a computer to learn, from data, without being explicitly programmed end-to-end. As ML and AI learn they acquire the ability to carry out cognitive functions, such as perceiving, learning, reasoning and automatically digging deeper to identify important insights or leading to new discovery. With the advance in machine learning, in particular its Deep Learning (DL) subset, ML is rapidly spreading across sectors and will continue to do so at an even higher rate with the ever increasing growth of Big Data. Gartner predicts that companies will combine Big Data and Machine Learning to carry out some or most of their service processes by 40% in 2022, up from 5% in 2017. ML is used to accelerate data-driven discovery in research and development. Recently, it has enabled scientists to discover largely unknown diversity of viruses, amounting to thousands of previously unknown viruses. The book refers to previous as well recent research work, with colleagues, where ML was used to capture subtle variation and to discover rare items, such as rare genes which researchers have so long sought for in vain. Such processes to identify genes or medicine can be daunting, as it may take years and can be expensive and the outcome can be uncertain. ML is used today to shorten the time and even help to identify medicine that can be more effective for people with a particular gene, which will help in turn in personalized medicine. ML is a critical ingredient for intelligent applications and provides the opportunity to further accelerate discovery processes as well as enhancing decision making processes. These trends promise that every sector will be data-driven and will be using machine learning in the cloud to incorporate artificial intelligence applications and to ultimately supplement existing analytical and decision making tools. The book introduces ML and its potential along with some ML applications using Spark and R platforms combined. While Spark has the possibility to scale and speed up analytics, it harness R language‘s machine learning capabilities beyond what is available on Spark or any other Big Data system. R and Spark can share codes and different types of data and carry out powerful large scale machine learning capabilities. Machine learning with Spark and R language combined can not only speed up but also light up Big Data Discovery.

downloadDownload free PDF View PDFchevron_right

The Data Science Revolution How learning machines changed the way we work and do business

Wil van der Aalst

Data science technology is rapidly changing the role of information technology in society and all economic sectors. Artificial Intelligence (AI) and Machine Learning (ML) are at the forefront of attention. However, data science is much broader and also includes data extraction, data preparation, data exploration, data transformation, storage and retrieval, computing infrastructures, other types of mining and learning, presentation of explanations and predictions, and the exploitation of results taking into account ethical, social, legal, and business aspects. This paper provides an overview of the field of data science also showing the main developments, thereby focusing on (1) the growing importance of learning from data (rather than modeling or programming), (2) the transfer of tasks from humans to (software) robots, and (3) the risks associated with data science (e.g., privacy problems, unfair or nontransparent decision making, and the market dominance of a few platform providers).

downloadDownload free PDF View PDFchevron_right

The Evolution and Impact of Google Cloud Platform in Machine Learning and AI

Praveen Borra

International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), 2024

Google Cloud Platform (GCP) has emerged as a leader in Machine Learning (ML) and Artificial Intelligence (AI), known for its cutting-edge technologies and inclusive accessibility. GCP not only drives innovation but also democratizes access to powerful ML and AI tools, empowering organizations of all sizes to harness data-driven insights for enhanced innovation, efficiency, and scalable growth. GCP's impact transcends technological advancements, representing a significant shift in digital transformation across diverse industries. This paper delves into GCP's transformative influence through real-world examples and practical applications across sectors such as healthcare, finance, retail, and entertainment. By showcasing GCP's scalable computing resources and robust data analytics capabilities, it illuminates how these technologies enable businesses to discover new opportunities and operational efficiencies. GCP's holistic approach to ML and AI fosters a culture of continuous innovation, empowering enterprises to excel in the era of intelligent computing and data-driven decision-making.

downloadDownload free PDF View PDFchevron_right

Machine Learning Application Development: Practitioners' Insights

Md Saidur Rahman

ArXiv, 2021

Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts, we still do not have a clear understanding of the challenges of developing ML-based applications and the current industry practices. Moreover, it is unclear where software engineering researchers should focus their efforts to better support ML application developers. In this paper, we report about a survey that aimed to understand the challenges and best practices of ML application development. We synthesize the results obtained from 80 practitioners (with diverse skills, experience, and application domains) into 17 findings; outlining challenges and best practices for ML apMd Saidur Rahm...

downloadDownload free PDF View PDFchevron_right

Data by data, Big Data

Branimir Hackenberger

Croatian Medical Journal, 2019

downloadDownload free PDF View PDFchevron_right

MLSys: The New Frontier of Machine Learning Systems

Hanie Sedghi

2019

Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (2)

Download pre-built for Hadoop 2.7 and later Spark release from http://spark.apache.org/downloads.html.
Extract the files into the C:-2.0.0-bin-hadoop2.7 folder (you can choose your own location).

IAEME Publication

IAEME PUBLICATION, 2024

This comprehensive article explores the critical role of scalable data architectures in machine learning, addressing the challenges posed by exponential data growth and increasing model complexity. It delves into the core components of such architectures, including data ingestion, storage, processing, and model deployment, while examining key architectural patterns like Lambda, Kappa, and Microservices. The article discusses various technologies and tools essential for implementing scalable ML infrastructures, and emphasizes the importance of integrating machine learning with data engineering processes. A case study on predictive maintenance in manufacturing illustrates the practical impact of these architectures, demonstrating significant improvements in equipment downtime reduction and cost savings.

downloadDownload free PDF View PDFchevron_right

Data Infrastructure for Machine Learning

Samridhi Jha

2019

Data quality is critical for effective machine learning, and this makes data a first-class citizen in the context of machine learning, on par with algorithms, software, and infrastructure. As a result, machine-learning platforms need to support data analysis and validation in a principled manner, throughout the lifecycle of the machine learning process. This paper reviews the data infrastructure we built at Google to address these challenges in the context of large-scale production machine learning pipelines.

downloadDownload free PDF View PDFchevron_right

Design and Development of Modern day Machine Learning Applications - A Survey

International Journal of Scientific Research in Science, Engineering and Technology IJSRSET

International Journal of Scientific Research in Science, Engineering and Technology, 2022

This paper is an overview of the Machine Learning Operations (MLOps) area. Our aim is to de?ne the operation and the components of such systems by highlighting the current problems and trends. In this context we present the different tools and their usefulness in order to provide the corresponding guidelines. Machine learning operations (MLOps) is quickly becoming a critical component of successful data science project deployment in the enterprise. It’s a process that helps organisations and business leaders generate long-term value and reduce risk associated with data science, machine learning, and AI initiatives. Yet it’s a relatively new concept; so why has it seemingly skyrocketed into the data science lexicon overnight? This introductory chapter delves into what MLOps is at a high level, its challenges, why it has become essential to a successful data science strategy in the enterprise, and, critically, why it is coming to the forefront now.

downloadDownload free PDF View PDFchevron_right

Chapter : Distributed Platforms and Cloud Services Enabling Machine Learning for Big Data . An Overview

Gabriel Iuhasz

2016

Applying popular machine learning algorithms to large amounts of data raised new challenges for machine learning practitioners. Traditional libraries does not support properly the processing of huge data sets, so that new approaches are needed. Using modern distributed computing paradigms, such as MapReduce, or in-memory processing novel machine learning libraries have been devised. In parallel, the advance of Cloud computing in the past ten years could not be ignored by machine learning community, thus a rise of Cloud-based platforms have been put in place as well. This chapter aims at presenting an overview of novel platforms, libraries and Cloud services that can be used by data scientists to extract knowledge from un-/semi-structured, large data sets. The overview covers several popular approaches, such as packages enabling distributed computing in popular machine learning environments, distributed platforms for machine learning and Cloud services for machine learning, known as ...

downloadDownload free PDF View PDFchevron_right

Mathematical and Algorithmic Aspects of Scalable Machine Learning

Gananath Bhuyan

Studies in Computational Intelligence

Although a number of machine learning models have been proposed and successfully deployed in organizations, there is a new emerging challenge that the organizations are going to face in the upcoming years. As various organizations are relying on data for decision-making and optimization of processes, the volume of data is an important factor for developing precise models. The ever-increasing volume of data and its storage is contributed by the advancement of communication technology and storage services like cloud computing. The large volume of data collected is usually stored in a distributed storage and computing environment to ensure fault tolerance and scalability. The development of machine learning models is quite inefficient in a distributed environment using traditional machine learning algorithms. The inefficiency is attributed to the distributed nature of the dataset and computing. The development of the models needs to be carried out in a distributed manner. Thus, additional challenges related to distributed computing need to be addressed by the machine learning algorithms. Scalable machine learning is an updation of traditional machine learning in a distributed environment. As the nature of computing changes, the mathematical formulas and equations need to be revisited along with the algorithms to make it suitable for a distributed environment. This chapter discusses the challenges faced by the traditional machine learning algorithms in distributed environments, the various mathematical backgrounds of scalable machine learning models, and the state-of-the-art distributed algorithms for scalable machine learning models.

downloadDownload free PDF View PDFchevron_right

Machine Learning: A Conceptual Framework, Historical Evolution, and Sectoral Applications

ömer melik gökalp

2025

Machine Learning (ML) is recognized as a foundational component within the broader field of Artificial Intelligence (AI), representing a transformative technology that enables systems to autonomously learn from vast amounts of data and make adaptive decisions. This study aims to articulate the conceptual framework of machine learning, trace its historical development, and systematically analyze its contemporary application domains. The three principal learning paradigms-supervised, unsupervised, and semi-supervised learning-are comparatively examined from both theoretical and practical standpoints. Employing a descriptive methodological approach, widely adopted algorithms such as decision trees, support vector machines (SVM), and k-nearest neighbors (k-NN) are presented and critically assessed in terms of their strengths, limitations, and use-case scenarios. Particular attention is devoted to applications in automatic speech recognition, recommendation systems, and fraud detection. Furthermore, prominent ML platforms including TensorFlow, Scikit-learn, and IBM Watson are evaluated based on their performance, scalability, and user accessibility. Real-world project examples are incorporated to demonstrate the practical integration and sector-specific adaptation of machine learning technologies. The analysis underscores that machine learning is not only essential for the technology sector but is increasingly becoming indispensable in critical domains such as education, healthcare, finance, and transportation. A significant rise in the demand for qualified professionals in this interdisciplinary field is projected in the coming years.

downloadDownload free PDF View PDFchevron_right

Machine learning with big data to solve real-world problems

maryam rahmaty

Journal of Data Analytics

Machine learning algorithms use big data to learn future trends and predict them for businesses. Machine learning can be very efficient for deciphering data in industries where understanding consumer patterns can lead to big improvements. The use of machine learning can be a giant leap for businesses and cannot simply be integrated as the top layer. This requires redefining workflow, architecture, data collection and storage, analytics, and other modules. The magnitude of the system overhaul should be assessed and clearly communicated to the appropriate stakeholders. The main focus of machine learning is to develop computer programs that can access data and use it to learn. The learning process starts with observations or data, to find a pattern in the data and make better decisions. The main goal of data analysis using machine learning is that it allows the computer to learn automatically without human intervention and help and can adjust its actions accordingly. Considering the ma...

downloadDownload free PDF View PDFchevron_right

Special Issue "Advance in Machine Learning"

Kostantinos Demertzis

Machine learning has increasingly become the bridge between theoretical knowledge and practical applications, transforming countless aspects of modern life. With the development of advanced machine learning algorithms, we can now address complex real-world problems once reserved for human experts. Specifically, by leveraging vast amounts of data and powerful computing resources, machine learning algorithms can learn to recognize patterns and make predictions or decisions based on those patterns. In addition, in many cases, machine learning algorithms can outperform humans in these tasks by analyzing data more quickly and accurately than humans could. Machine learning enables new solutions to real-world problems and changes how we live, work, and interact with technology.

downloadDownload free PDF View PDFchevron_right

Machine Learning: When and Where the Horses Went Astray?

Emanuel Diamant

Corr, 2009

Machine Learning is usually defined as a subfield of AI, which is busy with information extraction from raw data sets. Despite of its common acceptance and widespread recognition, this definition is wrong and groundless. Meaningful information does not belong to the data that bear it. It belongs to the observers of the data and it is a shared agreement and a convention among them. Therefore, this private information cannot be extracted from the data by any means. Therefore, all further attempts of Machine Learning apologists to justify their funny business are inappropriate.

downloadDownload free PDF View PDFchevron_right

Scalable Machine Learning and Related Technologies

Sign up for access to the world's latest research

AbstractAI

Related papers

References (2)

Related papers

Related topics

Abstract
AI