Academia.eduAcademia.edu

Scientific Workflows

description599 papers
group12,442 followers
lightbulbAbout this topic
Scientific workflows are structured sequences of computational and data processing tasks designed to automate and manage scientific research processes. They facilitate the integration of diverse tools, data sources, and methodologies, enabling reproducibility, efficiency, and collaboration in scientific investigations.
lightbulbAbout this topic
Scientific workflows are structured sequences of computational and data processing tasks designed to automate and manage scientific research processes. They facilitate the integration of diverse tools, data sources, and methodologies, enabling reproducibility, efficiency, and collaboration in scientific investigations.

Key research themes

1. How can provenance capture and dataflow management enhance transparency and runtime analysis in scientific workflows?

This research area focuses on developing systems and methods that enable efficient capture, integration, and tracking of data provenance and dataflows during scientific workflow executions. By explicitly representing data transformations and execution dependencies, these approaches aim to enhance transparency, reproducibility, and runtime monitoring capabilities, which are critical for complex, distributed, and multi-workflow scientific analyses.

Key finding: Introduces ProvLake, a low-overhead system that efficiently captures and integrates provenance data across multiple workflows running in heterogeneous execution environments. ProvLake enables runtime multiworkflow data... Read more
Key finding: Demonstrates a FAIR-compliant data pipeline that annotates data consumption and traces scientific outputs through open software and primary data, increasing transparency in fast-evolving epidemiological models such as those... Read more
Key finding: Presents NiW, a tool that systematically converts Jupyter Notebooks into workflow descriptions within the WINGS system, explicitly capturing dataflow across components. This facilitates tracking provenance, comparing... Read more

2. What methodologies and languages can improve specification, reuse, and interoperability of scientific workflow designs in data-intensive applications?

This theme addresses the development of domain-specific languages (DSLs) and structured frameworks that abstract workflow design from specific platforms, improving portability, modularity, and reuse across heterogeneous execution environments. It highlights innovations in workflow modeling that separate workflow intent from execution technology, allow structured composition of control and dataflows, and support collaborative development in complex scientific domains.

Key finding: Introduces SWEL, a platform-independent domain-specific modeling language (DSML) for abstractly specifying data-intensive workflows. SWEL covers high-level task definitions, data sources, platform requirements, and mappings... Read more
Key finding: Proposes a framework embedding control-flow intensive subtasks within dataflow process networks using workflow templates and frames, separating control-flow and dataflow concerns. This structured composition enables robust,... Read more
Key finding: Presents a cooperative methodology integrating workflow management systems (WfMS) into information system development that actively involves end-users through meta-CASE tools. The approach emphasizes mapping and adapting... Read more

3. How can scheduling strategies and resource management be optimized for scientific workflows in cloud environments considering workflow structure and priorities?

Research under this theme develops scheduling algorithms and resource utilization strategies tailored to the structural characteristics of scientific workflows to minimize execution time and cost in cloud platforms. It investigates workload partitioning based on task dependencies, priority assignment, balancing computational requirements, and leveraging virtualization. These techniques aim to optimize performance in cloud-based workflow execution while managing the tradeoff between resource expenses and throughput.

Key finding: Proposes a scheduling approach that exploits workflow structural information by partitioning tasks into groups with minimized interdependencies and assigning virtual machines proportionally to computational load. This... Read more
Key finding: Introduces a multi-priority scheduling algorithm that orders and groups workflow tasks logically according to data dependencies and locality. By determining group priorities and leveraging available virtual machines, it... Read more
Key finding: Extends previous level-based scheduling approaches by incorporating structure-aware fair-share resource allocation that balances computational loads across workflow task partitions. By accounting for varying task dependencies... Read more

All papers in Scientific Workflows

Where there was once delineation between banking processes that a consumer could do from the both the mobile and branch account opening experience. With 70% of likely checking account applicants saying they would prefer to submit a... more
Co-authoring Dofiles can be challenging as most Stata users have idiosyncratic preferences and methods for organizing and writing Dofiles. Which standards and practices can research teams adopt to improve the cohesion of this group work?... more
In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These... more
BioOne Complete (complete.BioOne.org) is a full-text database of 200 subscribed and open-access titles in the biological, ecological, and environmental sciences published by nonprofit societies, associations, museums, institutions, and... more
This is the penultimate typescript of Chapter 2 _of Scientific Communication: Practices, Theories, and Pedagogies_, edited by Han Yu and Kathy Northcut. The chapter attempts to contribute to our understanding of scientific communication... more
Recently, the emergence of Function-as-a-Service (FaaS) has gained increasing attention by researchers. FaaS, also known as serverless computing, is a new concept in cloud computing that allows the services computation that triggers the... more
This Impact Report identifies and summarises the diverse impacts, resulting from the £500m of UK funding of Science and Technology in 2013, using numerous quantitative metrics and short case study extracts. It shows how the varied... more
Weka is a mature and widely used set of Java software tools for machine learning, data-driven modelling and data mining – and is regarded as a current gold standard for the practical application of these techniques. This paper describes... more
In the fast developing world of scholarly communication it is good to take a step back and look at the patterns and processes of innovation in this field. To this end, we have selected 101 innovations (in the form of tools & sites) and... more
Workflow languages are a form of high-level programming language designed for coordinating tasks implemented by different pieces of software, often executed across multiple computers using technologies such as web services. Advantages of... more
Building Information Modeling is receiving an ever-increasing acceptance in the building industry and in construction-related education or research. More and more, Architects, Engineers, but also Contractors and Building Owners are... more
This paper offers an account of two Documentary Linguistics Workshops held in Tokyo based on the author's personal experience. The workshops have been held for nine consecutive years at the Research Institute for Languages and Cultures of... more
Scientific Workflows are abstractions used to model in silico scientific experiments. Cloud environments are still incipient in collecting and recording prospective and retrospective provenance. This paper presents an approach to support... more
The paper shows how error statistical theory can be deployed to grasp the deeper epistemic logic of the peer-review process. The intent is to provide the readers with a novel lens through which to make sense of the practices of academic... more
It is time to escape the constraints of the Systematics Wars narrative and pursue new questions that are better positioned to establish the relevance of the field in this time period to broader issues in the history of biology and history... more
Despite their wide range of applications, work¯ow systems still suer from lack of an agreed and standard modelling technique. It is a motivating research area and some researchers have proposed dierent modelling techniques. Petri nets,... more
The paper describes a new cloud-oriented workflow system called Flowbster. It was designed to create efficient data pipelines in clouds by which large compute-intensive data sets can efficiently be processed. The Flowbster workflow can be... more
The continuous quest for knowledge stimulates companies and research institutions not only to investigate new ways to improve the quality of scientific experiments, but also to reduce the time and costs needed for its implementation in... more
Modern computational experiments imply that the resources of the cloud computing environment are often used to solve a large number of tasks, which differ only in the values of a relatively small set of simulation parameters. Such sets of... more
Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable... more
BackgroundWorkflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful... more
In this work we focus on the analysis of process schemas in order to extract common substructures. In particular, we represent processes as graphs, and we apply a graph-based hierarchical clustering technique to group similar... more
A scientific workflow management system can be considered as a binding agent which brings together scientists and distributed resources. A workflow graph plays the central role in such a system as it is the component understood by both... more
by Nur Siyam and 
1 more
Recently, the emergence of Function-as-a-Service (FaaS) has gained increasing attention by researchers. FaaS, also known as serverless computing, is a new concept in cloud computing that allows the services computation that triggers the... more
Distributed computing has always been a challenge due to the NP-completeness of finding optimal underlying management routines. The advent of big data increases the dimensionality of the problem whereby data partitionability, processing... more
Laser scanners enable bridge inspectors to collect dense 3D point clouds, which capture detailed geometries of bridges. While these data sets contain rich geometric information, they bring unique challenges related to geometric... more
Understanding the core function of the brain is one the major challenges of our times. In the areas of neuroscience and education, several new studies try to correlate the learning difficulties faced by children and youth with behavioral... more
Many new websites and online tools have come into existence to support scholarly communication in all phases of the research workflow. To what extent researchers are using these and more traditional tools has been largely unknown. This... more
Software Product Line (SPL) engineering is a paradigm shift towards modeling and developing software system families rather than individual systems. It focuses on the means of efficiently producing and maintaining multiple similar... more
The article introduces the peer review process, its importance within the cycle of publication of scientific journals with particular attention to the role of the referee and the Workflow process.
The introduction of Next Generation Sequencing into the disciplines of plant systematics, ecology, and metagenomics, among others, has resulted in a phenomenal increase in the collecting and storing of tissue samples and their respective... more
Bei der Bearbeitung von historischen Fragestellungen mit digitalen Tools waren die von historischen Forschungsprojekten entwickelte Programme bisher sehr spezialisiert. Die Software wurde genau auf die historische Fragestellung angepasst... more
• Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Work- ows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate... more
In many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is... more
Workflows have been used to represent a variety of applications involving high processing and storage demands. As a solution to supply this necessity, the cloud computing paradigm has emerged as an on-demand resources provider. While... more
A significant amount of recent research in scientific workflows aims to develop new techniques, algorithms and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex... more
Data-oriented workflows are often used in scientific applications for executing a set of dependent tasks across multiple computers. We discuss how these can be modeled using lambda calculus, and how ideas from functional programming are... more
Cloud Computing is an ubiquitous model that enables clients to access different services in a fast and easy manner. In this context, one of the most used models isSoftware as a Service (SaaS), which means that software is deployed and... more
A leading, international, engineering and construction company has carried out efforts to engage a new tool set and work process. Four-Dimensional Planning and Scheduling (4D-PS) is the new work process that aims toward better, more... more
We propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies... more
The optimal workflow scheduling is one of the most important issues in heterogeneous distributed computational environment. Existing heuristic and evolutionary scheduling algorithms have their advantages and disadvantages. In this work we... more
Schedulers for cloud computing determine on which processing resource jobs of a workflow should be allocated. In hybrid clouds, jobs can be allocated either on a private cloud or on a public cloud on a pay per use basis. The capacity of... more
Download research papers for free!