Academia.eduAcademia.edu

Scientific Workflows

description599 papers
group12,442 followers
lightbulbAbout this topic
Scientific workflows are structured sequences of computational and data processing tasks designed to automate and manage scientific research processes. They facilitate the integration of diverse tools, data sources, and methodologies, enabling reproducibility, efficiency, and collaboration in scientific investigations.
lightbulbAbout this topic
Scientific workflows are structured sequences of computational and data processing tasks designed to automate and manage scientific research processes. They facilitate the integration of diverse tools, data sources, and methodologies, enabling reproducibility, efficiency, and collaboration in scientific investigations.

Key research themes

1. How can provenance capture and dataflow management enhance transparency and runtime analysis in scientific workflows?

This research area focuses on developing systems and methods that enable efficient capture, integration, and tracking of data provenance and dataflows during scientific workflow executions. By explicitly representing data transformations and execution dependencies, these approaches aim to enhance transparency, reproducibility, and runtime monitoring capabilities, which are critical for complex, distributed, and multi-workflow scientific analyses.

Key finding: Introduces ProvLake, a low-overhead system that efficiently captures and integrates provenance data across multiple workflows running in heterogeneous execution environments. ProvLake enables runtime multiworkflow data... Read more
Key finding: Demonstrates a FAIR-compliant data pipeline that annotates data consumption and traces scientific outputs through open software and primary data, increasing transparency in fast-evolving epidemiological models such as those... Read more
Key finding: Presents NiW, a tool that systematically converts Jupyter Notebooks into workflow descriptions within the WINGS system, explicitly capturing dataflow across components. This facilitates tracking provenance, comparing... Read more

2. What methodologies and languages can improve specification, reuse, and interoperability of scientific workflow designs in data-intensive applications?

This theme addresses the development of domain-specific languages (DSLs) and structured frameworks that abstract workflow design from specific platforms, improving portability, modularity, and reuse across heterogeneous execution environments. It highlights innovations in workflow modeling that separate workflow intent from execution technology, allow structured composition of control and dataflows, and support collaborative development in complex scientific domains.

Key finding: Introduces SWEL, a platform-independent domain-specific modeling language (DSML) for abstractly specifying data-intensive workflows. SWEL covers high-level task definitions, data sources, platform requirements, and mappings... Read more
Key finding: Proposes a framework embedding control-flow intensive subtasks within dataflow process networks using workflow templates and frames, separating control-flow and dataflow concerns. This structured composition enables robust,... Read more
Key finding: Presents a cooperative methodology integrating workflow management systems (WfMS) into information system development that actively involves end-users through meta-CASE tools. The approach emphasizes mapping and adapting... Read more

3. How can scheduling strategies and resource management be optimized for scientific workflows in cloud environments considering workflow structure and priorities?

Research under this theme develops scheduling algorithms and resource utilization strategies tailored to the structural characteristics of scientific workflows to minimize execution time and cost in cloud platforms. It investigates workload partitioning based on task dependencies, priority assignment, balancing computational requirements, and leveraging virtualization. These techniques aim to optimize performance in cloud-based workflow execution while managing the tradeoff between resource expenses and throughput.

Key finding: Proposes a scheduling approach that exploits workflow structural information by partitioning tasks into groups with minimized interdependencies and assigning virtual machines proportionally to computational load. This... Read more
Key finding: Introduces a multi-priority scheduling algorithm that orders and groups workflow tasks logically according to data dependencies and locality. By determining group priorities and leveraging available virtual machines, it... Read more
Key finding: Extends previous level-based scheduling approaches by incorporating structure-aware fair-share resource allocation that balances computational loads across workflow task partitions. By accounting for varying task dependencies... Read more

All papers in Scientific Workflows

The Episteme Construction is a field manual for making claims auditable across domains. It introduces a contract-first calculus for generative collapse built around a conservation-style budget for log-integrity κ: Δκ = R·τ_R − (D_ω +... more
In this paper, we introduce a new class of continuous functions as an application of $\Lambda$-generalized closed sets (namely $\Lambda_g$-closed set, $\Lambda$-g-closed set and $g \Lambda$-closed set) namely $\Lambda$-generalized... more
As herbaria move to digitize their collections, the question remains of how to efficiently digitize collections other than standard herbarium sheets, such as wood slide collections. Beginning in September 2018, the Harvard University... more
élete és munkássága Bedy Vince 1866. március 31-én született Bedy István és Galambos Katalin gyermekeként a gy®ri székeskáptalan birtokán, Gyirmóton. 1 Családja els® ismert tagja, Bedy György a falu 1714-es újratelepítésekor az els®k... more
Abstract. This paper presents a deployed semantic web application in the cultural domain: the semantic portal MuseumFinland. It is a demonstration of a community portal and a publication channel by which heterogeneous collection database... more
This document summarizes the contributions of the Electromagnetic $\gamma_vNN^*$ Transition Form Factors workshop participants that provide theoretical support of the excited baryon program at the 12 GeV energy upgrade at JLab. The main... more
This paper presents the rst part of a comparative analysis between the Standard Model (SM) of fundamental interactions and the Tensor Model of Discrete Dynamics (TMDD), proposed as an ontologically grounded fundamental theory underlying... more
SHARP is a Linked Data approach for harmonizing cross-workflow provenance. In this demo, we demonstrate SHARP through a real-world omic experiment involving workflow traces generated by Taverna and Galaxy systems. SHARP starts by... more
Les systèmes de workflows ont largement contribué à améliorer la reproductibilité des expériences scientifiques. Cependant, relativement peu de travaux ont porté sur la réutilisation des données produites au cours de l'exécution. Dans cet... more
HERBARIUM: CONCEPT AND DEFINITION The term herbarium, used in the strictest sense today, is simply a collection of dried specimens. Lawrence (1951) and others include in their definitions the arrangement of specimens in the sequence of an... more
This paper seeks to delve into how shifting focus to AI-based predictive analytics. It has the potential to change the outlook in personalized medicine. Chronic diseases have become widespread and offer great challenges to the world's... more
This paper presents, for Frequent Query Discovery (FQD), an algorithm which employs a novel relation of equivalence in order to remove redundant queries in the output. An FQD algorithm returns a set of frequent queries from a data base of... more
This paper presents, for Frequent Query Discovery (FQD), an algorithm which employs a novel relation of equivalence in order to remove redundant queries in the output. An FQD algorithm returns a set of frequent queries from a data base of... more
Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under... more
Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under... more
Large computational problems may often be modelled using multiple scientific workflows with similar structure. These workflows can be grouped into ensembles, which may be executed on distributed platforms such as the Cloud. In this paper,... more
This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of... more
This paper introduces a cost optimization model for scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous VM instances, with limited number of instances per cloud and hourly... more
Motivation: We our efforts to work with HUBzero in a significant collaboration-oriented project, as well as our impressions of HUBzero after nearly a year of interaction with the platform. Our assessment is a mixed one: while the HUBzero... more
We address the problem of task planning on multiple clouds formulated as a mixed integer nonlinear programming problem (MINLP). Its specification with AMPL modeling language allows us to apply solvers such as Bonmin and Cbc. Our model... more
Artykuł stanowi prywatne wyznanie autora, który nie lubi przymusu wakacyjnych wyjazdów. Tekst jest częściowo żartobliwy, ale zawiera wiele prawdziwych stwierdzeń dotyczących sposobów spędzania czasu na urlopie. Namawiam do jego... more
Abstract. Mapping XML documents into relational database is a promising solution because relational databases are mature and scale very well and they have the advantages that in a relational database XML data and structured data can... more
Cloud computing systems provide on demand access to computational resources for dedicated use. Grid computing allows users to share heterogeneous resources from multiple administrative domains applied to common tasks. In this paper we... more
In this paper we present DFScala, a library for constructing and executing dataflow graphs in the Scala language. Through the use of Scala this library allows the programmer to construct coarse grained dataflow graphs that take advantage... more
Bioinformatics workflows require large amounts of resources and are commonly executed in clusters. Determining the adequate amount of resources for bioinformatics applications is a tricky matter, since the resource usage of a single... more
Although simulators provide approximate, faster and easier simulation of an application execution in Clouds, still many researchers argue that these results cannot be always generalized for complex application types, which consist of many... more
Cloud computing provides a cheap and elastic platform for executing large scientific workflow applications, but it rises two challenges in prediction of makespan (total execution time): performance instability of Cloud instances and... more
The cloud is an eco-system in which virtual machine instances are starting and terminating asynchronously on user demand or automatically when the load is rapidly increased or decreased. Although this dynamical environment allows to rent... more
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows-common in datacenters, clouds, and HPC infrastructures. We show that the... more
• A review of multiple QoS parameter workflow scheduling. • A new multiple QoS algorithm with quadratic complexity for workflow scheduling. • Similar performances of search-based algorithms in a small fraction of the time. • Results for... more
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014.
We introduce a new Manycore Workflow Runtime Environment (MWRE) to efficiently enact traditional scientific workflows on modern manycore computing architectures. In contrast to existing engines that enact workflows acting as external... more
Key factors causing irreproducibility of research include those related to inappropriate study design methodologies and statistical analysis1. In modern statistical practice irreproducibility could arise due to statistical (false... more
All tools dedicated to system administration of local area network follow the same scheme. For each host they install an initial operating system and then update it using a package manager. The package managers run independently on the... more
All tools dedicated to system administration of local area network follow the same scheme. For each host they install an initial operating system and then update it using a package manager. The package managers run independently on the... more
The dative alternation the pirate will send the necklace to the princess (PP) the pirate will send the princess the necklace(DO)• dative verbs also display bias effects:• p (PP| send)> p (DO| send)• p (PP| show)< p (DO| show)• this... more
This research looks into the question of where and how Artificial Intelligence and Big Data can be usefully implemented into Affiliate Marketing. By consulting relevant literature and qualified experts, this work identifies 6 areas, where... more
Ein kurzer Leitfaden mit Strukturhilfe von Verena Glass Einleitung Der Anfang ist oft das Schwerste-besonders beim Schreiben. Viele Studierende wissen zwar, dass sie eine Bachelorarbeit schreiben müssen, aber nicht, wie sie anfangen... more
Wer eine Hausarbeit schreibt, weiß: Der Weg von der Idee bis zur fertigen Abgabe kann holprig sein. Zwischen Gliederung, Literaturrecherche und Formatvorgaben schleichen sich schnell kleine Fehler ein, die am Ende wertvolle Punkte... more
The healthcare industry is undergoing a paradigm shift driven by the convergence of design thinking, artificial intelligence (AI), and emerging technologies. This paper explores how integrating human-centered design with advanced... more
This paper outlines a Java-based AI framework aimed at optimizing healthcare resource allocation under the principles of Responsible AI. Leveraging real-time data streams, the proposed approach employs constraint-satisfaction and... more
This paper presents a unifying framework for applying data analytics in the finance and healthcare sectors, where large-scale datasets demand robust and domain-specific methods. Drawing on machine learning and statistical techniques, the... more
This paper presents a no-code eXtended Reality (XR) workflow that allows non technical people to create XR experiences as part of equipment design review (EDR) processes. The demo showcases how the workflow can enable seamless... more
[1] Ilmu bukti (science) dapat mengacu pada dua hal, yaitu sebagai pengetahuan yang akurat (most likely to be correct) tentang dunia sejauh bukti-bukti yang ada dan sebagai proses dalam memperoleh pengetahuan yang akurat tentang dunia... more
The integration of Machine Learning (ML) into healthcare systems is transforming the operational and clinical capabilities of hospitals, significantly impacting patient outcomes and the efficiency of hospital workflows. This article... more
The Workflow Scheduling directs in the area of workflow management systems. The efficient Scheduling in workflows helps the system to increase cost, which can be evaluation by considering various clustering methods, which is been studied... more
Formale Hinweise, die Sie bitte beachten, wenn Sie bei mir Ihre Bachelor- oder Masterarbeit schreiben.
Reference managers, which have become a common tool in academic writing, seem to lend themselves almost naturally to historical research and to the management (and citation) of oftentimes vast collections of sources. This paper... more
Download research papers for free!