Scientific Workflows

description599 papers

group12,442 followers

lightbulbAbout this topic

Scientific workflows are structured sequences of computational and data processing tasks designed to automate and manage scientific research processes. They facilitate the integration of diverse tools, data sources, and methodologies, enabling reproducibility, efficiency, and collaboration in scientific investigations.

lightbulbAbout this topic

Key research themes

1. How can provenance capture and dataflow management enhance transparency and runtime analysis in scientific workflows?

This research area focuses on developing systems and methods that enable efficient capture, integration, and tracking of data provenance and dataflows during scientific workflow executions. By explicitly representing data transformations and execution dependencies, these approaches aim to enhance transparency, reproducibility, and runtime monitoring capabilities, which are critical for complex, distributed, and multi-workflow scientific analyses.

Efficient Runtime Capture of Multiworkflow Data Using Provenance

by Renan Souza

2024, 2019 15th International Conference on eScience (eScience)

Key finding: Introduces ProvLake, a low-overhead system that efficiently captures and integrates provenance data across multiple workflows running in heterogeneous execution environments. ProvLake enables runtime multiworkflow data... Read more

articleView Paper downloadDownload

FAIR data pipeline: provenance-driven data management for traceable scientific workflows

by Lisa Boden

2023, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

Key finding: Demonstrates a FAIR-compliant data pipeline that annotates data consumption and traces scientific outputs through open software and primary data, increasing transparency in fast-evolving epidemiological models such as those... Read more

articleView Paper downloadDownload

NiW: Converting Notebooks into Workflows to Capture Dataflow and Provenance

by Regina Wang

2022

Key finding: Presents NiW, a tool that systematically converts Jupyter Notebooks into workflow descriptions within the WINGS system, explicitly capturing dataflow across components. This facilitates tracking provenance, comparing... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What methodologies and languages can improve specification, reuse, and interoperability of scientific workflow designs in data-intensive applications?

This theme addresses the development of domain-specific languages (DSLs) and structured frameworks that abstract workflow design from specific platforms, improving portability, modularity, and reuse across heterogeneous execution environments. It highlights innovations in workflow modeling that separate workflow intent from execution technology, allow structured composition of control and dataflows, and support collaborative development in complex scientific domains.

SWEL: A Domain-Specific Language for Modeling Data-Intensive Workflows

by José Raúl Romero

2024, Business & Information Systems Engineering

Key finding: Introduces SWEL, a platform-independent domain-specific modeling language (DSML) for abstractly specifying data-intensive workflows. SWEL covers high-level task definitions, data sources, platform requirements, and mappings... Read more

articleView Paper downloadDownload

Enabling ScientificWorkflow Reuse through Structured Composition of Dataflow and Control-Flow

by Bertram Ludaescher

2023, 22nd International Conference on Data Engineering Workshops (ICDEW'06)

Key finding: Proposes a framework embedding control-flow intensive subtasks within dataflow process networks using workflow templates and frames, separating control-flow and dataflow concerns. This structured composition enables robust,... Read more

articleView Paper downloadDownload

Workflow as a Tool in the Development of Information Systems

by Jose Luis Caro Herrero

2024, Social, Managerial, and Organizational Dimensions of Enterprise Information Systems

Key finding: Presents a cooperative methodology integrating workflow management systems (WfMS) into information system development that actively involves end-users through meta-CASE tools. The approach emphasizes mapping and adapting... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can scheduling strategies and resource management be optimized for scientific workflows in cloud environments considering workflow structure and priorities?

Research under this theme develops scheduling algorithms and resource utilization strategies tailored to the structural characteristics of scientific workflows to minimize execution time and cost in cloud platforms. It investigates workload partitioning based on task dependencies, priority assignment, balancing computational requirements, and leveraging virtualization. These techniques aim to optimize performance in cloud-based workflow execution while managing the tradeoff between resource expenses and throughput.

Structure-Aware Scheduling Methods for Scientific Workflows in Cloud

by Farizah Yunus

2024, Applied sciences

Key finding: Proposes a scheduling approach that exploits workflow structural information by partitioning tasks into groups with minimized interdependencies and assigning virtual machines proportionally to computational load. This... Read more

articleView Paper downloadDownload

Multi-priority scheduling algorithm for scientific workflows in cloud

by Farizah Yunus

2024, Bulletin of Electrical Engineering and Informatics

Key finding: Introduces a multi-priority scheduling algorithm that orders and groups workflow tasks logically according to data dependencies and locality. By determining group priorities and leveraging available virtual machines, it... Read more

articleView Paper downloadDownload

Structure-Aware Scheduling Methods for Scientific Workflows in Cloud

by Farizah Yunus

2024, Applied sciences

Key finding: Extends previous level-based scheduling approaches by incorporating structure-aware fair-share resource allocation that balances computational loads across workflow task partitions. By accounting for varying task dependencies... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Scientific Workflows

DIGITAL MARKETING IN BANKS

by Rex Journal

Where there was once delineation between banking processes that a consumer could do from the both the mobile and branch account opening experience. With 70% of likely checking account applicants saying they would prefer to submit a... more

descriptionView Paper arrow_downwardDownload

Setting Up a Dofile for Team Research

by Ryan McWay

2021, SSRN Electronic Journal

Co-authoring Dofiles can be challenging as most Stata users have idiosyncratic preferences and methods for organizing and writing Dofiles. Which standards and practices can research teams adopt to improve the cohesion of this group work?... more

descriptionView Paper arrow_downwardDownload

Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey

by International Journal of Electrical and Computer Engineering (IJECE)

2018, International Journal of Electrical and Computer Engineering (IJECE)

In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These... more

descriptionView Paper arrow_downwardDownload

Digitization Workflows for Flat Sheets and Packets of Plants, Algae, and Fungi

by Patrick Sweeney

2015, Applications in Plant Sciences

BioOne Complete (complete.BioOne.org) is a full-text database of 200 subscribed and open-access titles in the biological, ecological, and environmental sciences published by nonprofit societies, associations, museums, institutions, and... more

descriptionView Paper arrow_downwardDownload

"Lines and Fields of Ethical Force in Scientific Authorship: The Legitimacy and Power of the Office of Research Integrity" (with C. Claiborne Linvill) (2 files) See a revised Abstract under "More Information"

by Steven B. Katz

2017, _Scientific Communication: Practices, Theories, and Pedagogies_ Han Yu and Kathryn Northcut, Editors. Routledge

This is the penultimate typescript of Chapter 2 _of Scientific Communication: Practices, Theories, and Pedagogies_, edited by Han Yu and Kathy Northcut. The chapter attempts to contribute to our understanding of scientific communication by exploring the ethics of scientific authorship. Specifically, we explore the historical, practical and theoretical implications of the (non)decision of the Office of Research Integrity (ORI), the nation's highest ethical office, not to include authorship issues in its definition of research misconduct.

Formed by the U.S. Department of Health and Human Services in 1992, the ORI defines research misconduct as "fabrication, falsification, or plagiarism" (FFP). However, a host of other authorship issues in addition to plagiarism are important in science (e.g., authorship credit, author order, etc.) and are regularly encountered and often violated by scientists. However, our research also reveals that there are many sources of ethics in scientific communication-some in conflict with each other, just as the ethics are often in tension (e.g., the drive for originality and credit vs. the need to collaborate and share)-and that scientific organizations and journals, combined with social, personal, and other systems of morality and values, create "ethical lines and fields of force." It is within these shifting ethical fields of force, rather than the power of the ORI, that scientists work every day. (In this, we recognize of both Actor-Network Theory, and posthumanistic understandings of the limitations of human agency in continuously emergent and mostly unpredictable ecologies, here applied specifically scientific/authorship ethics.)

The ORI's decision to exclude authorship ethics other than plagiarism provides a unique opportunity to interrogate the supposed and real power of the ORI; to begin to map the ethical lines and fields of force that impinge on scientists; and to explore the role and implications of ethics in authorship as a fundamental concern in scientific research as well as communication.

descriptionView Paper arrow_downwardDownload

Serverless Computing and Scheduling Tasks on Cloud: A Review

by Nur Siyam

2018

Recently, the emergence of Function-as-a-Service (FaaS) has gained increasing attention by researchers. FaaS, also known as serverless computing, is a new concept in cloud computing that allows the services computation that triggers the... more

customer. Figure 1 illustrates the three cloud computing models. vendor. The vendors manage everything including the applications and make the services available for the With the massive growth of data, scientists in different arenas of science, astronomy, engineering and physics

To address this problem, researchers have studied the deployment of the workflows across various environments across several virtual machines. This leads to the following general optimization problem: deadline on the execution. Conversly, in [5,12,13,14,15,16] the authors proposed scheduling methods that aim to

architecture, users only pay for the period of execution time. FaaS is expected to have a significant impact on the process of scheduling scientific workflows. As we

wasting the resources (reduce idle time). For the same factors, FaaS is expected to have a significant impact on

Tablel: summary of scheduling proposals in the literature below (TABLE I) summarizes the various scheduling proposals in the literature.

descriptionView Paper arrow_downwardDownload

THE IMPACTS OF NATIONAL SCIENCE AND TECHNOLOGY FUNDING IN THE UK: A PUBLIC MANAGEMENT EVALUATION SUMMARY

by Trevor Wren BSc MA MPhil PhD MBA

This Impact Report identifies and summarises the diverse impacts, resulting from the £500m of UK funding of Science and Technology in 2013, using numerous quantitative metrics and short case study extracts. It shows how the varied impacts of a national science and technology organization and its many funded programmes can be annually identified, captured, tracked, and summarised in a public document.

The report was produced for, and by, the Science and Technology Facilities Council (STFC) - the UK Research Council responsible for funding: (1) all UK astronomy, particle and nuclear physics research, (2) related UK science and technology facilities (e.g. as Harwell and Daresbury), (3) the UK contribution to international science (e.g. CERN, international telescopes and space missions), (4) innovation and business spin-off programmes, and (5) UK public engagement activities.

The Report was co-researched and co-authored by the STFC Impact Team, including myself while Impact Evaluation Manager in 2013, with Dr. Claire Dougan-McCallie (lead author/Head of Impact) and Jenny Beard, with additional input from senior managers, and many data and case study contributions from other key managers across STFC.

Varied key impacts are identified and reported including:

• Research amongst the best (by citation impact) in astronomy, nuclear and particle physics.
• Supported research of 226 Principal Investigators in 70 universities, led to 1,100 peer-reviewed papers in 2012.
• Facilities (the Diamond Light Source, ISIS and CLF) were used in 2012/13 by over 4,200 unique users , in over 2,400 experiments, producing over 1,000 papers in peer-reviewed journals.

• Technology from CERN benefited the UK economy by over £100 billion every year.
• Funding the development of the UK’s space, internet and computer animation industries, returned over £500 billion to the UK economy per annum.
• Funded Innovation Campuses hosted over 230 enterprises and supported over 5,000 jobs.
• UK industry won £43 million in contracts from the international subscriptions funded, and had accumulated a total of £150 million since 2005.

• Funded research inspired future generations to study STEM subjects, attracting 90% of UK undergraduates to study physics, funded 782 PhD students and 16,800 student training days.
• Between 2009 to 2012 public engagement programmes engaged people on 58 million occasions, with 2 million people engaged in face-to-face activities in 2013 alone.

Observation of the Higgs boson by the ATLAS experiment at CERN. (Credit: CERN).

Harwell Oxford, one of the UK's foremost Science and Innovation Campuses.

Professor Peter Higgs visits the CMS experiment at CERN. (Credit: CERN).

The structure of graphene. (Credit: Dreamstime) Characterising graphene

Image of the Galileo satellites, part of Europe's new global navigation satellite system. (Credit: ESA). manipulate increasingly large and more complex data set: The power of computing developed to analyse massive and mixed scientific data sets in turn transforms industry and everyday life. STFC and our predecessors have been at the forefront of computing knowhow for the past 50 years. During this time, we have led the way across the whole spectrum of computing capabilities, from high performance computing facilities and digital curation, to graphics and software, from networking to grid infrastructure and the World Wide Web.

Timeline of key space missions, in all of which the UK and STFC have played a significant role.

Daresbury Laboratory Tower. >IrC nas partnered with the lecnnology strategy board as a ‘cluster champion’ to deliver business support as part of its Launchpad competitions. The £2 million Materials and Manufacturing Northwest Launchpad? is designed to accelerate new business projects and stimulate start-up businesses centred around the cluster of materials and manufacturing companies at Sci-Tech Daresbury and the Runcorn Heath Business and Technical Park. A key aim of the scheme is also to strengthen this cluster and STFC, as its chosen cluster champion, will deliver business support services to successful applicants. The Technology Strategy Board are learning from STFC’s business and investment support activities to enhance and improve their future Launchpad competitions and overall SME support programmes.

Visualisation suite at the Hartree Centre, Daresbury Laboratory. The Hartree Centre focuses on partnerships with business and academia to unlock the commercial opportunities offered by high performance computing systems. This has the benefit of utilising and growing UK skills in this area and has attracted IBM as the major partner to develop these opportunities. The Centre is working closely with a number of major companies such as Unilever, where we have established a forma research partnership.

A division of De Beers, Element Six is an international

Astrophysics research supported by STFC led to the concept of a plasma device that generates ozone inside sealed containers. Ozone is a powerful disinfectant and this device provides a new way to make food safer for consumption, and prolongs its shelf-life. The device can be used after packaging and aims to cut down on the 7 million tonnes of food waste that is created every year in the UK®. Follow- on Funding from STFC was granted to develop the device and engage with industrial partners to explore possible applications. A company called Anacail was spun-out of Glasgow University in 2011°’ to commercialise this product. Anacail is initially being targeted at the food packaging industry, worth £9 billion in 2010%, and is set to undergo trials on production lines at several UK food processing plants®. Anacail have also been supported by EPSRC throiioh their Collahorative Trainino Accoriint fiind The Innovations Partnership Scheme is designed to transfer technology and expertise developed through STFC funding to the marketplace in partnership with industry and other academic disciplines. STFC technology or expertise must be integral to the project and can be developed with STFC funding at UK higher education institutes, STFC aboratories, CERN and ESO. In 2012/13, we awarded funding to 11 projects, each with an industrial partner. STFC committed £1.7million, which leveraged a further £800K from the industrial partners. University College London spin-out, Zeeko®, have benefited from the support of this scheme. In collaboration with Glyndwr University, they have been the first collaboration in the world to gain acceptance by ESO for their polished mirrors designed to be deployed in the E-ELT® - a significant achievement for the UK's optical manufacturing industry. Glasgow University in 2011°” to commercialise this product.

Students taking part in the ‘Explore your Universe’ programme, a joint programme between STFC and the UK Association for Science and Discovery Centres. Ernest Rutherford Fellowships STFC Ernest Rutherford Fellowships are prestigious awards awards 12 new fellowships each year and in the 2012/13

We currently host 24 apprentices and our scheme has Laboratory. This year the Science Mi pathway for 250 apprentices by 20 apprenticeships are intended to con been running for 20 years. Each apprenticeship offers 4 years paid training in electronic, electrical and mechanical engineering. 70% of the apprentices that have been taken on are still working at STFC’s Rutherford Appleton nister David Willetts aunched'” the first ever Higher Apprenticeship in Space Engineering! with the aim of providing a national 5. These new tribute to UK space industry by supplying advanced skil drive economic growth. s and knowledge to years paid training in electronic, electrical and mechanical

“The sections focus on areas of importance to demonstrate impact; publications, collaborations, further funding, staff development and next destina- tions, technology development, IP and licensing, spin-outs, measures of esteem, public engagement, and use of facilities.

Our performance compared to the leading scientific nations in this area can be illustrated by the following graph: The graph clearly shows that the UK is the best country in the world in particle physics citation impact and has held this position from 2008 to 2011.

The graph clearly shows that the UK is one of the best countries in the world in nuclear physics citation impact. We have been consistently 2nd in the world and have showed a year-on-year improvement in citation impact for the past 4 years. Our performance compared to the leading scientific nations in this area can be illustrated by the following graph:

Appendix 3 - Interim evaluation results for Explore your Universe

descriptionView Paper arrow_downwardDownload

Einführung in das wissenschaftliche Arbeiten - Introduction into scientific working

by Susann Hippler

2010

descriptionView Paper arrow_downwardDownload

Data mining and machine learning in e-Science Central using Weka

by Dominic P Searson

Weka is a mature and widely used set of Java software tools for machine learning, data-driven modelling and data mining – and is regarded as a current gold standard for the practical application of these techniques. This paper describes... more

Fig. 3. User settable parameters of the Weka NeuralNet regression modelling block.

Fig. 4. Use of Weka blocks to convert a CSV data file to ARFF format, resample it and export the resampled data as another ARFF file. Here, each connector between blocks transfers a FileWrapper containing an ARFF file.

Fig. 5. Example of using the Weka ARFF -ApplyFilter block to apply a filter created by ARFF-FilterFeatures on training data to testing data. Feature selection is often regarded as an integral part of the model building process and hence is performed on the training data to learn which features to keep; this is then applied to the testing data using the trained filter.

For example, the Weka Tools ARFFToData block can be used to convert an imported ARFF file to the DataWrapper format, and then the ColumnSelect block can be used to select the y data and the x data as DataWrapper input connections to the Weka regression modelling block. This is illustrated below in Fig. 6 with a Weka LinearRegression block.

Fig. 7. Example of exporting (highlighted in red) a trained LinearRegression model object in an ObjectWrapper connector and applying it to new testing data using the RegressionEvaluation block. Good performance on training data does not necessarily indicate a good model (overfitting is common) and so evaluation on unseen testing data is crucial. This is accomplished by connecting the trained model object to a RegressionEvaluation block, which accepts an ObjectWrapper connector containing a trained model as indicated below in Fig. 7. It also accepts connections for DataWrappers containing the modelled variable y and the inputs x for the testing data in much the same way as the regression modelling blocks do for training data. Regression model reports

RMS error - this is the root mean squared error over the data set. Small values are better. The RMS error is expressed in the units of the modelled variable y.

Fig. 12. A Regression Error Characteristic (REC) plot from the RegressionReport block PDF on the distillation tower training data. This shows (on the y-axis) the fraction (between 0 and 1) of data points predicted for a given absolute error value (on the x-axis). A naive reference model (e.g. from the ZeroR regression block) is shown in red and the trained linear model in blue. A model is better than another if its REC curve lies above and to the left of the comparison model.

Fig 14. Loading, modelling and generating C4.5 classifier model reports for the WBCI data using Weka blocks in an e-Science Central workflow. Step 8. Save and run the workflow. 6.4 Analysis of results The principal outputs of the workflow shown in Fig. 14 are, again, two PDF reports (training and testing data).

descriptionView Paper arrow_downwardDownload

101 Innovations in Scholarly Communication - the Changing Research Workflow

by Jeroen Bosman and

In the fast developing world of scholarly communication it is good to take a step back and look at the patterns and processes of innovation in this field. To this end, we have selected 101 innovations (in the form of tools & sites) and... more

descriptionView Paper arrow_downwardDownload

Applying Functional Programming Theory to the Design of Workflow Engines

by Peter Kelly

2011

Workflow languages are a form of high-level programming language designed for coordinating tasks implemented by different pieces of software, often executed across multiple computers using technologies such as web services. Advantages of workflow languages include automatic parallelisation, built-in support for accessing services, and simple programming models that abstract away many of the complexities associated with distributed and parallel programming. In this thesis, we focus on data-oriented workflow languages, in which all computation is free of side effects.

Despite their advantages, existing workflow languages sacrifice support for internal computation and data manipulation, in an attempt to provide programming models that are simple to understand and contain implicit parallelism. These limitations inconvenience users, who are forced to define additional components in separate scripting languages whenever they need to implement programming logic that cannot be expressed in the workflow language itself.

In this thesis, we propose the use of functional programming as a model for data-oriented workflow languages. Functional programming languages are both highly expressive and amenable to automatic parallelisation. Our approach combines the coordination facilities of workflow languages with the computation facilities of functional programming languages, allowing both aspects of a workflow to be expressed in the one language.

We have designed and implemented a small functional language called ELC, which extends lambda calculus with a minimal set of features necessary for practical implementation of workflows. ELC can either be used directly, or as a compilation target for other workflow languages. Using this approach, we developed a compiler for XQuery, extended with support for web services. XQuery's native support for XML processing makes it well-suited for manipulating the XML data produced and consumed by web services. Both languages make it easy to develop complex workflows involving arbitrary computation and data manipulation.

Our workflow engine, NReduce, uses parallel graph reduction to execute workflows. It supports both orchestration, where a central node coordinates all service invocation, and choreography, where coordination, scheduling, and data transfer are carried out in a decentralised manner across multiple nodes. The details of orchestration and choreography are abstracted away from the programmer by the workflow engine. In both cases, parallel invocation of services is managed in a completely automatic manner, without any explicit direction from the programmer.

Our study includes an in-depth analysis of performance issues of relevance to our approach. This includes a discussion of performance problems encountered during our implementation work, and an explanation of the solutions we have devised to these. Such issues are likely to be of relevance to others developing workflow engines based on a similar model. We also benchmark our system using a range of workflows, demonstrating high levels of performance and scalability.

descriptionView Paper arrow_downwardDownload

Bridging Building Information Modeling and Parametric Design

by Stefan Boeykens

Building Information Modeling is receiving an ever-increasing acceptance in the building industry and in construction-related education or research. More and more, Architects, Engineers, but also Contractors and Building Owners are... more

descriptionView Paper arrow_downwardDownload

The Importance of Documentary Linguistics Workshops: A Personal Account

by Jargal Badagarov

This paper offers an account of two Documentary Linguistics Workshops held in Tokyo based on the author's personal experience. The workshops have been held for nine consecutive years at the Research Institute for Languages and Cultures of... more

descriptionView Paper arrow_downwardDownload

Collecting Cloud Provenance Metadata With Matriohska: A Case Study With Genomic Workflows

by Sergio Serra and

Scientific Workflows are abstractions used to model in silico scientific experiments. Cloud environments are still incipient in collecting and recording prospective and retrospective provenance. This paper presents an approach to support... more

descriptionView Paper arrow_downwardDownload

Error, quality, and applied geography: An editorial on process

by Dragos Simandan

2011, Applied Geography

The paper shows how error statistical theory can be deployed to grasp the deeper epistemic logic of the peer-review process. The intent is to provide the readers with a novel lens through which to make sense of the practices of academic... more

descriptionView Paper arrow_downwardDownload

Moving Past the Systematics Wars

by Beckett Sterner

It is time to escape the constraints of the Systematics Wars narrative and pursue new questions that are better positioned to establish the relevance of the field in this time period to broader issues in the history of biology and history... more

descriptionView Paper arrow_downwardDownload

Petri net-based modelling of workflow systems: An overview

by Khodakaram Salimifard

2001, European journal of operational research

Despite their wide range of applications, work¯ow systems still suer from lack of an agreed and standard modelling technique. It is a motivating research area and some researchers have proposed dierent modelling techniques. Petri nets,... more

descriptionView Paper arrow_downwardDownload

The Flowbster Cloud-Oriented Workflow System to Process Large Scientific Data Sets

by Peter Kacsuk

The paper describes a new cloud-oriented workflow system called Flowbster. It was designed to create efficient data pipelines in clouds by which large compute-intensive data sets can efficiently be processed. The Flowbster workflow can be... more

descriptionView Paper arrow_downwardDownload

AN APPROACH TO SUPPORT THE MANAGEMENT OF DATA PROVENANCE OF SCIENTIFIC EXPERIMENTS

by Sergio Serra

The continuous quest for knowledge stimulates companies and research institutions not only to investigate new ways to improve the quality of scientific experiments, but also to reduce the time and costs needed for its implementation in... more

descriptionView Paper arrow_downwardDownload

Digitization Workflows for Flat Sheets and Packets of Plants, Algae, and Fungi

by Herrick Brown and

2015, Applications in Plant Sciences

descriptionView Paper arrow_downwardDownload

Implementation and Evaluation of the PO-HEFT Problem-oriented Workflow Scheduling Algorithm for Cloud Environments

by Gleb Radchenko and

Modern computational experiments imply that the resources of the cloud computing environment are often used to solve a large number of tasks, which differ only in the values of a relatively small set of simulation parameters. Such sets of... more

descriptionView Paper arrow_downwardDownload

Pegasus, a workflow management system for science automation

by Scott Callaghan and

2015, Future Generation Computer Systems

Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable... more

Figure 3: Hello World DAX (Directed Acyclic graph in XML). Figure 2: Python DAX generator code for Hello World DAX.

Figure 4: Translation of a simple Hello World DAX to an Executable Workflow

Figure 5: Data flow for a Pegasus Workflows planned with non-shared filesystem mode, running on the Open Science Grid.

The main concept in tl he new data management approach is to be able to place the data staging element anywhere in the execution environment. I ample, the data storage e n the previous Open Science Grid ex- ement was an SRM/GridFTP/iRODS server placed near the compute resources. Let’s consider the same workflow, but this time running on an HPC cluster with a shared filesystem (Figure oh. Figure 6: Data management on a shared filesystem resource (only the remote execution environment is shown).

Figure 9: Example of level-based clustering. Figure 10: Example of label-based clustering.

Figure 8: Translation from Abstract to Executable Workflow.

Figure 11: Hierarchical Workflows. Expansion ends when a DAX with only compute jobs is encountered.

Figure 12: The Galactic Plane workflow, showing the top-level workflow with 1,001 data find tasks and sub-workflows.

Figure 14: Workflow Gantt Chart and Job Distribution Pie Charts displayed in pegasus-dashboard. Figure 13: Workflow Statistics displayed in pegasus-dashboard.

Pegasus also includes a lightweight web dashboard t hat en- ables easy monitoring and online exploration of workflows based on an embedded board can display all the workflows (running and com: for a particular user. It allows users to browse their flows. For each workflow, users can see a summary workflow run including runtime statistics (snapshot dis in Figure|13 Figure[14p illustratin time and pie charts d web server written in Python. The dash- pleted) work- of the played such as the number of jobs completed, failed, and workflow walltime, and can browse data collected for each job by the system. The dashboard can also generate different types of charts such as workflow gantt charts (snapshot displayed in sis of count and runtime. g the the progress of the workflow through epicting the distribution of jobs on the ba-

Figure 16: Cybershake Hierarchical Workflow.

Table 1: Comparison of CyberShake Studies

descriptionView Paper arrow_downwardDownload

Implementation of workflow engine technology to deliver basic clinical decision support functionality

by Vojtech Huser

BMC Medical Research Methodology

BackgroundWorkflow engine technology represents a new class of software with the ability to graphically model step-based knowledge. We present application of this novel technology to the domain of clinical decision support. Successful implementation of decision support within an electronic health record (EHR) remains an unsolved research challenge. Previous research efforts were mostly based on healthcare-specific representation standards and execution engines and did not reach wide adoption. We focus on two challenges in decision support systems: the ability to test decision logic on retrospective data prior prospective deployment and the challenge of user-friendly representation of clinical logic.ResultsWe present our implementation of a workflow engine technology that addresses the two above-described challenges in delivering clinical decision support. Our system is based on a cross-industry standard of XML (extensible markup language) process definition language (XPDL). The core components of the system are a workflow editor for modeling clinical scenarios and a workflow engine for execution of those scenarios. We demonstrate, with an open-source and publicly available workflow suite, that clinical decision support logic can be executed on retrospective data. The same flowchart-based representation can also function in a prospective mode where the system can be integrated with an EHR system and respond to real-time clinical events. We limit the scope of our implementation to decision support content generation (which can be EHR system vendor independent). We do not focus on supporting complex decision support content delivery mechanisms due to lack of standardization of EHR systems in this area. We present results of our evaluation of the flowchart-based graphical notation as well as architectural evaluation of our implementation using an established evaluation framework for clinical decision support architecture.ConclusionsWe describe an implementation of a free workflow technology software suite (available at http://code.google.com/p/healthflow) and its application in the domain of clinical decision support. Our implementation seamlessly supports clinical logic testing on retrospective data and offers a user-friendly knowledge representation paradigm. With the presented software implementation, we demonstrate that workflow engine technology can provide a decision support platform which evaluates well against an established clinical decision support architecture evaluation framework. Due to cross-industry usage of workflow engine technology, we can expect significant future functionality enhancements that will further improve the technology's capacity to serve as a clinical decision support platform.

descriptionView Paper arrow_downwardDownload

Clustering of process schema by graph mining techniques

by Emanuele Storti

In this work we focus on the analysis of process schemas in order to extract common substructures. In particular, we represent processes as graphs, and we apply a graph-based hierarchical clustering technique to group similar... more

descriptionView Paper arrow_downwardDownload

Datafluo: A Scientific Workflow Engine

by Reginald Cushing and

A scientific workflow management system can be considered as a binding agent which brings together scientists and distributed resources. A workflow graph plays the central role in such a system as it is the component understood by both... more

descriptionView Paper arrow_downwardDownload

Serverless Computing and Scheduling Tasks on Cloud: A Review

by Nur Siyam and

descriptionView Paper arrow_downwardDownload

Data-centric Computing on Distributed Resources

by Reginald Cushing

Distributed computing has always been a challenge due to the NP-completeness of finding optimal underlying management routines. The advent of big data increases the dimensionality of the problem whereby data partitionability, processing... more

descriptionView Paper arrow_downwardDownload

Formalization of workflows for extracting bridge surveying goals from laser-scanned data

by Pingbo Tang

Laser scanners enable bridge inspectors to collect dense 3D point clouds, which capture detailed geometries of bridges. While these data sets contain rich geometric information, they bring unique challenges related to geometric... more

descriptionView Paper arrow_downwardDownload

Computational Neuroscience: Challenges and Implications for Brazilian Education

by Sergio Serra and

Understanding the core function of the brain is one the major challenges of our times. In the areas of neuroscience and education, several new studies try to correlate the learning difficulties faced by children and youth with behavioral... more

descriptionView Paper arrow_downwardDownload

Innovations in scholarly communication - global survey on research tool usage

by Jeroen Bosman and

Many new websites and online tools have come into existence to support scholarly communication in all phases of the research workflow. To what extent researchers are using these and more traditional tools has been largely unknown. This... more

Figure 1. Examples of survey questions with preset answer options. A) Question on sharing notebooks/protocols/workflows. B) Questior on measuring impact. A What tools/sites do you use to share notebooks / protocols / workflows? B What tools/sites do you use to measure impact?

Figure 2. Example of automatic feedback received by survey participants. Classification: Traditional tools (Trad) - Add no functionality compared to print era, except online accessibility; Modern tools (Mod) - Use scale and linking possibilities of the internet to increase speed and efficiency; Innovative tools (Inn) - Actually change ‘the way it’s always been done’ - e.g. user-driven, different business models, changes in the sequence of research activities, shifting stakeholder roles; Experimental tools (Exp) - Represent radical change, with sometimes uncertain technologies and outcomes; still under development. Tools were scored on a scale of 1 (traditional) to 4 (experimental); the chart shows average scores per workflow phase. Tools mentioned as ‘others’ are not included at this stage.

Figure 4. Survey response levels per 100 billion US$ GDP (2013). Number of survey responses per 100 billion US$ GDP for all countries; weighted mean of all countries with at least 1 response: 27.3, median: 27.0.

Table 1. Survey responses by distribution channel.

Table 2. Survey responses by language version of the survey.

Table 4. Survey responses by research role (n=20663).

Table 3. Mentions of research discipline(s) (multiple answers possible, 25820 answers given, N=20663).

Table 5. Survey responses by year of first publication (n=20663).

descriptionView Paper arrow_downwardDownload

Managing Multiple Feature Models: Foundations, Language, and Applications

by Mathieu Acher

Software Product Line (SPL) engineering is a paradigm shift towards modeling and developing software system families rather than individual systems. It focuses on the means of efficiently producing and maintaining multiple similar software products, exploiting what they have in common and managing what varies among them. This is analogous to what is practiced in the automotive industry, where the focus is on creating a single production line, out of which many customized but similar variations of a car model are produced. Feature models (FMs) are a fundamental formalism for specifying and reasoning about commonality and variability of SPLs. FMs are becoming increasingly complex, handled by several stakeholders or organizations, used to describe features at various levels of abstraction and related in a variety of ways. In different contexts and application domains, maintaining a single large FM is neither feasible nor desirable. Instead, multiple FMs are now used. In this thesis, we develop theoretical foundations and practical support for managing multiple FMs. We design and develop a set of composition and decomposition operators (aggregate, merge, slice) for supporting separation of concerns. The operators are formally defined, implemented with a fully automated algorithm and guarantee properties in terms of sets of configurations. We show how the composition and decomposition operators can be combined together or with other reasoning and editing operators to realize complex tasks. We propose a textual language, FAMILIAR (for FeAture Model scrIpt Language for manIpulation and Automatic Reasoning), which provides a practical solution for managing FMs on a large scale. An SPL practitioner can combine the different operators and manipulate a restricted set of concepts (FMs, features, configurations, etc.) using a concise notation and language facilities. FAMILIAR hides implementation details (e.g., solvers) and comes with a development environment. We report various applications of the operators and usages of FAMILIAR in different domains (medical imaging, video surveillance) and for different purposes (scientific workflow design, variability modeling from requirements to runtime, reverse engineering), showing the applicability of both the operators and the supporting language. Without the new capabilities brought by the operators and FAMILIAR, some analysis and reasoning operations would not be made possible in the different case studies. To conclude, we discuss different research perspectives in the medium term (regarding the operators, the language and validation elements) and in the long term (e.g., relationships between FMs and other models).

Figure 2.1: Framework for SPL engineering: domain and application engineering, problem and solution space

Figure 2.2: Challenge for SPL engineering: decreasing the proportion of application engi- neering effort [Deelstra et al. 2004]

Figure 2.3: Model, Metamodels, Modeling Language, SUS

Figure 3.1: A family of medical images described with a feature model Figure 3.1 gives a first visual representation of a feature model. Throughout the thesis, we will rely on the same graphical notation used in this figure, largely inspired by the one proposed in [Czarnecki and Eisenecker 2000]. Features are graphically represented as a rectangles while some graphical elements (e.g., unfilled circle) are used to describe the variability (e.g., a feature may be optional). Intuitively, the feature model depicted in Figure 3.1 compactly describes a family of medical images, where each member of the family is a medical image corresponding to an unique combination of features.

Figure 3.2: feature model, set of configurations and propositional logic encoding

Figure 3.3: Three logically equivalent feature models with different hierarchies

Figure 3.4: Properties of feature models: core features and anomalies

Figure 4.1: From workflow design to selection of services.

Figure 4.2: Medical Imaging Service: Variability and Concerns

Figure 4.3: Managing Multiple Feature Models: An Example

Figure 5.1: Different ways of composing feature models

Figure 5.2: Four insertions with different variability operators (optional, mandatory, Xor) and relationships between Base’,,, and Basery (generalization, refactoring)

Figure 5.3: Insertion and satifiability: the case of void feature model

Figure 5.5: Merging in strict union mode (fmms ®u, fmme = fmmse) and diff mode (fimms B\ fmme = fais p56)

Figure 5.6: Merging in intersection mode: (fimm3 ®@n fMma = fMm3a) An example is given in given in Figure 5.6: f7m3a (see Figure 5.6(c)) is the feature model resulting from the merge in intersection mode of fmm3 (see Figure 5.6(a)) and fimma (see Figure 5.6(b)).

Figure 5.7: Hierarchy and merge operations (strict union mode)

Figure 6.1: Handling multiple variability inputs (e.g., for segmentation services)

Table 10.2: Experimental results on three scientific workflows.

Figure 12.3: Process for Extracting FM arch

Table 10.1: Specification and checking of constraints within the process.

Figure 6.2: Three feature models from different suppliers (adapted from [Hartmann et al. 2009]) of the set of products defined by its constituent SPLs, SPL,,5PL2,...,S5PL,. FMs are used to describe the set of products of an SPL, and thus the semantics of a multiple SPL can be defined as a relationship between the FM of the multiple SPL, FMy,,,,, and the FMs of the constituent SPLs, F.M,, FM, ..., FM,. We formally define the semantics of competing multiple SPLs below.

Figure 6.5: Checking availability: a new feature model and suppliers’ feature model up- dates of Figure 6.2

Figure 6.6: Reference-based and merging techniques

Figure 7.2: feature model and its set of configurations

Figure 7.3: Example of slice operations applied on the feature model of Figure. 7.2(a)

Figure 7.4: Four possible feature models for the same slicing operation

Figure 7.5: Medical Imaging Service: Variability and Concerns

Figure 7.7: Another decomposition strategy and set of views

features of fmpz and fmesoftware is not necessarily one-to-one.

Figure 7.9: Software and PL Variability (adapted from [Metzger et al. 2007]) is violated since some existing products of fmPL are removed in fmPLPrime and no product is added.

Figure 7.10: Merge of three feature models, Jo, J; and I using the slicing operator

Figure 8.1: Example of slice operations applied on the feature model of Figure. 8.1(a).

Figure 12.4: Enforcing architectural FM using aggregation and slicing.

Figure 12.5: Process for Refining FM arch 188 CHAPTER 12. REVERSE ENGINEERING ARCHITECTURAL FEATURE MODELS

Figure 14.1: Relationships between Models and Feature Models

Figure .2: Merging of feature models: a simple example Let us consider the merging of Fd, and F'M2 (see Figure .2) in strict union mode. The resulting merged feature model, F'M,., is also depicted in Figure .2. The following relation truly holds:

Table 4.1: Multiple Feature Models (FMs) in the state-of-the-art: motivation, composition and decomposition mechanisms, tool and language support

Table 8.1: Merge: semantic properties and notation CHAPTER 8. A DOMAIN-SPECIFIC LANGUAGE FOR MANAGING FEATURE MODELS

Figure 8.2: Aggregated feature model using counting operations, i.e., the value of n is equal to [mi_sunion]. Aggregating feature models. Another form of composition can be applied using cross- tree constraints between features so that separated feature models are inter-related. The operator aggregate is used for producing a new feature model in which a synthetic root relates a set of feature models and integrates a set of propositional constraints.

We recall that in a competing multiple SPL, each constituent SPL describes a different fam- ily of products (e.g., services) in the same market segment (e.g., medical imaging domain) produced by competing suppliers (e.g., research teams). The example scenario that will be used to illustrate how FAMILIAR can be used to manage multiple SPLs is presented in Figure 8.3. The scenario involves three steps. In the first step the medical imaging ex- pert produces a feature model with no assumptions about the parts provided by external suppliers — see ©. In the next step, this family of workflow is viewed as an aggregation of 3In [Acher et al. 2011b], we have also shown how FAMILIAR can be applied using a "laptop" example.

Figure 8.4: Available suppliers and services. At this step, all services of original Reg can be provided by suppliers. For example, the relation holds considering the feature models depicted in Figure 8.4. Indeed, the set of configurations of original Reg is included or equal to the union of set of suppliers’ ser- vices since, according to set theory, the relation (C1) is equivalent to [originalReg] C ([Regspl_1] U | Regspl_2] U [| Regspl_3]). We check this property in line 27. Selecting Suppliers. At this point, the medical imaging expert needs to determine which suppliers can provide a subset of the services of original Reg (lines 28-40). Some suppli- ers cannot provide at least one service corresponding to any configuration of original Reg (lines 31-33) and so should not be considered. Figure 8.4 illustrates the situation: Suppliers is no longer available since the intersection between [originalReg] and [Regspl_3] is the empty set. Some suppliers offer services that correspond to a valid configuration of originalReg but also offer out-of scope services. To remove these services, a merge in intersection mode is systematically performed to restrict attention to the set of relevant supplier services (line 31). For example, the feature Multi is no longer included in the set of services of Supplier; and Supplier2 (see Regspl'_1 and Regspl'_2 in Figure 8.4) while Supplier2 is now able to deliver only one service.

Figure 9.1: Architecture of DSL processing

Figure 9.3: Calculation time and space complexity (strict union merge) 132 CHAPTER 9. IMPLEMENTATION DETAILS AND PERFORMANCE EVALUATION

Figure 10.2: Excerpt of workflow and service metamodel.

Figure 10.3: Extracting variability concerns from F’ McatatogAf fineRegistratic

Figure 10.4: Weaving variability concerns into services

Figure 10.5: Data compatibility between services.

Figure 10.7: Reasoning process: for each connected dataports in the workflow, we propa- gate variability choices within each service involved in the compatibility checking.

Figure 10.8: An example: reiterating compatibility checking and constraints propagation.

Figure 10.9: Tool support and Domain-Specific Languages.

Figure 11.1: A simplified video surveillance processing chain ~ At the implementation level, a typical VS processing chain ( Figure 11.1) starts with image acquisition, then segmentation of the acquired images, clustering, to group image regions into blobs, classification of possible objects, and tracking these objects from one frame to the other. The final steps depend on the precise task. Additional steps may be in- troduced, such as reference image updating (if segmentation steps case of multiple cameras) or even scenario recognition. All steps need it), data fusion (in correspond to software components that the designer must correctly assemble to obtain a processing chain. More- over, for each step, many variants exist, along different dimensions. For instance, there are various classification algorithms with different ranges of parameters, using different ge- ometrical models of physical objects, with different merge and split strategies to identify relevant image blobs. The situation is similar for the other algorithms.

Figure 11.2: VSAR, PFC feature models and transformation rules

Figure 11.3: From Requirements to Deployment and Runtime: Process

Figure 11.4: An example of specialization and transformation.

Figure 12.1: An excerpt of a possible architectural feature model

Figure 12.2: Variability Modeling from Software Artifacts

feature of the extracted feature model (foot) corresponds to the main composite (root) of Arch59. The child features of f;-o.¢ are the first-level components of root, the latter being considered as the main system features. The lower-level child features are produced by the AddChildF eatures function (Algorithm 2). This recursive function looks for all the op- tional references r of component c and, for each of them, creates an optional child feature f,, itself further decomposed through a XOR or an OR group (depending on the multiplic- ity of r). The child features f., of the group correspond to all components c, providing a service compatible with r. Since the extracted architectural feature model should represent the variability of the system of interest, we focus on its extension points, typically materialized by optional ref- erences. Algorithm 1 summarizes the behavior of the feature model extractor. The root

descriptionView Paper arrow_downwardDownload

Il processo di peer review e la sua importanza

by Elisa Mugnai

2021, Elisa Mugnai

The article introduces the peer review process, its importance within the cycle of publication of scientific journals with particular attention to the role of the referee and the Workflow process.

descriptionView Paper arrow_downwardDownload

Guidelines for collecting vouchers and tissues intended for genomic work (Smithsonian Institution): Botany Best Practices

by Morgan Gostel

The introduction of Next Generation Sequencing into the disciplines of plant systematics, ecology, and metagenomics, among others, has resulted in a phenomenal increase in the collecting and storing of tissue samples and their respective... more

descriptionView Paper arrow_downwardDownload

A notation and system for expressing and executing cleanly typed workflows on messy scientific data

by Luc Moreau

2005, ACM SIGMOD Record

descriptionView Paper arrow_downwardDownload

VIEW: a VIsual SciEntificWorkflow Management System

by Shiyong Lu

2007, Services, 2007 IEEE …

descriptionView Paper arrow_downwardDownload

Prototyp einer grafischen Arbeitsoberfläche zur Bearbeitung historischer Daten

by MEPHisto Jena

2022

Bei der Bearbeitung von historischen Fragestellungen mit digitalen Tools waren die von historischen Forschungsprojekten entwickelte Programme bisher sehr spezialisiert. Die Software wurde genau auf die historische Fragestellung angepasst... more

descriptionView Paper arrow_downwardDownload

A Workflow to Preserve Genome-Quality Tissue Samples From Plants in Botanical Gardens and Arboreta

by Morgan Gostel

• Premise of the study: Internationally, gardens hold diverse living collections that can be preserved for genomic research. Work- ows have been developed for genomic tissue sampling in other taxa (e.g., vertebrates), but are inadequate... more

descriptionView Paper arrow_downwardDownload

Mining usage patterns from a repository of scientific workflows

by Claudia Diamantini

2012

In many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is... more

descriptionView Paper arrow_downwardDownload

HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds

by Edmundo Madeira

2011, Journal of Internet Services and Applications

Workflows have been used to represent a variety of applications involving high processing and storage demands. As a solution to supply this necessity, the cloud computing paradigm has emerged as an on-demand resources provider. While... more

descriptionView Paper arrow_downwardDownload

Community Resources for Enabling Research in Distributed Scientific Workflows

by Rafael Ferreira da Silva

A significant amount of recent research in scientific workflows aims to develop new techniques, algorithms and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex... more

descriptionView Paper arrow_downwardDownload

Lambda Calculus as a Workflow Model

by Peter Kelly

2009, Concurrency and Computation: Practice and Experience

Data-oriented workflows are often used in scientific applications for executing a set of dependent tasks across multiple computers. We discuss how these can be modeled using lambda calculus, and how ideas from functional programming are... more

descriptionView Paper arrow_downwardDownload

Model of dynamic orchestration for SaaS

by Revista Ingenierías Universidad de Medellín

2017, Model of dynamic orchestration for SaaS

Cloud Computing is an ubiquitous model that enables clients to access different services in a fast and easy manner. In this context, one of the most used models isSoftware as a Service (SaaS), which means that software is deployed and... more

descriptionView Paper arrow_downwardDownload

4D-PS: Putting an IT new work process into effect

by Leonardo Rischmoller

2002, Proceedings of the CIB W78 Conference 2002

A leading, international, engineering and construction company has carried out efforts to engage a new tool set and work process. Four-Dimensional Planning and Scheduling (4D-PS) is the new work process that aims toward better, more... more

Figure No 1: 4D-PS Simplified Basic work flow diagram 4D applied to the case study

Figure No. 2: PDS 3D model of the case study project

descriptionView Paper arrow_downwardDownload

Pattern based feature construction in semantic data mining

by Agnieszka Ławrynowicz

We propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies... more

descriptionView Paper arrow_downwardDownload

Hybrid Evolutionary Workflow Scheduling Algorithm for Dynamic Heterogeneous Distributed Computational Environment

by Nikolay Butakov

The optimal workflow scheduling is one of the most important issues in heterogeneous distributed computational environment. Existing heuristic and evolutionary scheduling algorithms have their advantages and disadvantages. In this work we... more

descriptionView Paper arrow_downwardDownload

Scheduling in Hybrid Clouds

by Luiz F. Bittencourt

Schedulers for cloud computing determine on which processing resource jobs of a workflow should be allocated. In hybrid clouds, jobs can be allocated either on a private cloud or on a public cloud on a pay per use basis. The capacity of... more

descriptionView Paper arrow_downwardDownload

Scientific Workflows

Key research themes

1. How can provenance capture and dataflow management enhance transparency and runtime analysis in scientific workflows?

2. What methodologies and languages can improve specification, reuse, and interoperability of scientific workflow designs in data-intensive applications?

3. How can scheduling strategies and resource management be optimized for scientific workflows in cloud environments considering workflow structure and priorities?

Related Topics

All papers in Scientific Workflows