Scalable Machine Learning and Related Technologies
2016, Machine Learning Using R
https://doi.org/10.1007/978-1-4842-2334-5_9…
37 pages
1 file
Sign up for access to the world's latest research
Abstract
AI
AI
Scalable machine learning is increasingly relevant due to advancements in infrastructure, data accessibility, and software development. This chapter discusses the importance of distributed computing in handling large datasets, emphasizes the transition from traditional algorithms to scalable solutions, and introduces key big data technologies like Apache Hadoop and Spark. It highlights how these technologies facilitate efficient data processing and computation, enabling organizations to harness the potential of big data in real-world applications.
Related papers
Big Data, 2013
The article explains the three generations of machine learning algorithms -with all three trying to operate on Big-data. The first generation tools are SAS, SPSS etc., while second generations realizations include
IEEE Access
In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid-DataCloud framework allows transparent access to existing e-Infrastructures, effectively exploiting distributed resources for the most compute-intensive tasks coming from the machine learning development cycle. Moreover, it provides scientists with a set of Cloud-oriented services to make their models publicly available, by adopting a serverless architecture and a DevOps approach, allowing an easy share, publish and deploy of the developed models. INDEX TERMS Cloud computing, computers and information processing, deep learning, distributed computing, machine learning, serverless architectures.
Production ML pipeline refers to a complete end-to-end workflow of a machine learning product ready for deployment. In recent years, companies have vastly invested in Machine Learning research; developers are developing new tools and technologies to make ML more flexible. Now, we can experience AI in most devices around us, from home appliances to cars. When we want to develop an AI-powered product, it is vital to understand the crucial workflows of the ML. Academic research to develop an ML model and a production ML pipeline are entirely different scenarios. From business problems, data collection to deploying the model is an acutely iterative process. Most of the time, Data scientists and Machine Learning Engineers need to deal with issues like data shift, concept shift, model decay, etc. Sometimes, there are need to change the complete ML architecture or how the features are engineered in the dataset. It will become tedious if someone is working in such an environment and lacks a...
This paper is on Statistics and Machine Learning at Scale New Technologies Apply Machine Learning to Big Data.
Machine Learning (ML), which is a subset of Artificial intelligence (AI), enhances the ability of a computer to learn, from data, without being explicitly programmed end-to-end. As ML and AI learn they acquire the ability to carry out cognitive functions, such as perceiving, learning, reasoning and automatically digging deeper to identify important insights or leading to new discovery. With the advance in machine learning, in particular its Deep Learning (DL) subset, ML is rapidly spreading across sectors and will continue to do so at an even higher rate with the ever increasing growth of Big Data. Gartner predicts that companies will combine Big Data and Machine Learning to carry out some or most of their service processes by 40% in 2022, up from 5% in 2017. ML is used to accelerate data-driven discovery in research and development. Recently, it has enabled scientists to discover largely unknown diversity of viruses, amounting to thousands of previously unknown viruses. The book refers to previous as well recent research work, with colleagues, where ML was used to capture subtle variation and to discover rare items, such as rare genes which researchers have so long sought for in vain. Such processes to identify genes or medicine can be daunting, as it may take years and can be expensive and the outcome can be uncertain. ML is used today to shorten the time and even help to identify medicine that can be more effective for people with a particular gene, which will help in turn in personalized medicine. ML is a critical ingredient for intelligent applications and provides the opportunity to further accelerate discovery processes as well as enhancing decision making processes. These trends promise that every sector will be data-driven and will be using machine learning in the cloud to incorporate artificial intelligence applications and to ultimately supplement existing analytical and decision making tools. The book introduces ML and its potential along with some ML applications using Spark and R platforms combined. While Spark has the possibility to scale and speed up analytics, it harness R language‘s machine learning capabilities beyond what is available on Spark or any other Big Data system. R and Spark can share codes and different types of data and carry out powerful large scale machine learning capabilities. Machine learning with Spark and R language combined can not only speed up but also light up Big Data Discovery.
Data science technology is rapidly changing the role of information technology in society and all economic sectors. Artificial Intelligence (AI) and Machine Learning (ML) are at the forefront of attention. However, data science is much broader and also includes data extraction, data preparation, data exploration, data transformation, storage and retrieval, computing infrastructures, other types of mining and learning, presentation of explanations and predictions, and the exploitation of results taking into account ethical, social, legal, and business aspects. This paper provides an overview of the field of data science also showing the main developments, thereby focusing on (1) the growing importance of learning from data (rather than modeling or programming), (2) the transfer of tasks from humans to (software) robots, and (3) the risks associated with data science (e.g., privacy problems, unfair or nontransparent decision making, and the market dominance of a few platform providers).
International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), 2024
Google Cloud Platform (GCP) has emerged as a leader in Machine Learning (ML) and Artificial Intelligence (AI), known for its cutting-edge technologies and inclusive accessibility. GCP not only drives innovation but also democratizes access to powerful ML and AI tools, empowering organizations of all sizes to harness data-driven insights for enhanced innovation, efficiency, and scalable growth. GCP's impact transcends technological advancements, representing a significant shift in digital transformation across diverse industries. This paper delves into GCP's transformative influence through real-world examples and practical applications across sectors such as healthcare, finance, retail, and entertainment. By showcasing GCP's scalable computing resources and robust data analytics capabilities, it illuminates how these technologies enable businesses to discover new opportunities and operational efficiencies. GCP's holistic approach to ML and AI fosters a culture of continuous innovation, empowering enterprises to excel in the era of intelligent computing and data-driven decision-making.
ArXiv, 2021
Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts, we still do not have a clear understanding of the challenges of developing ML-based applications and the current industry practices. Moreover, it is unclear where software engineering researchers should focus their efforts to better support ML application developers. In this paper, we report about a survey that aimed to understand the challenges and best practices of ML application development. We synthesize the results obtained from 80 practitioners (with diverse skills, experience, and application domains) into 17 findings; outlining challenges and best practices for ML apMd Saidur Rahm...
Croatian Medical Journal, 2019
2019
Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a new systems machine learning research community at the intersection of the traditional systems and ML communities, focused on topics such as hardware systems for ML, software systems for ML, and ML optimized for metrics beyond predictive accuracy. To do this, we describe a new conference, MLSys, that explicitly targets research at the intersection of systems and machine learning with a program committee split evenly between experts in systems and ML, and an explicit focus on topics at the intersection of the two.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (2)
- Download pre-built for Hadoop 2.7 and later Spark release from http://spark.apache.org/downloads.html.
- Extract the files into the C:-2.0.0-bin-hadoop2.7 folder (you can choose your own location).