Papers by Dag I K Sjoberg

Background: Construct validity concerns the use of indicators to measure a concept that is not di... more Background: Construct validity concerns the use of indicators to measure a concept that is not directly measurable. Aim: This study intends to identify, categorize, assess and quantify discussions of threats to construct validity in empirical software engineering literature and use the findings to suggest ways to improve the reporting of construct validity issues. Method: We analyzed 83 articles that report human-centric experiments published in five top-tier software engineering journals from 2015 to 2019. The articles' text concerning threats to construct validity was divided into segments (the unit of analysis) based on predefined categories. The segments were then evaluated regarding whether they clearly discussed a threat and a construct. Results: Three-fifths of the segments were associated with topics not related to construct validity. Two-thirds of the articles discussed construct validity without using the definition of construct validity given in the article. The threats were clearly described in more than four-fifths of the segments, but the construct in question was clearly described in only two-thirds of the segments. The construct was unclear when the discussion was not related to construct validity but to other types of validity. Conclusions: The results show potential for improving the understanding of construct validity in software engineering. Recommendations addressing the identified weaknesses are given to improve the awareness and reporting of CV. • General and reference → General literature.
On the Roles of Software Testers: An Exploratory Study
Social Science Research Network, 2022

Springer eBooks, Oct 4, 2006
A fundamental question in object-oriented design is how to design maintainable software. Accordin... more A fundamental question in object-oriented design is how to design maintainable software. According to expert opinion, a delegated control style, typically a result of responsibility-driven design, represents object-oriented design at its best, whereas a centralized control style is reminiscent of a procedural solution, or a "bad" object-oriented design. This paper presents a controlled experiment that investigates these claims empirically. A total of 99 junior, intermediate, and senior professional consultants from several international consultancy companies were hired for one day to participate in the experiment. To compare differences between (categories of) professionals and students, 59 students also participated. The subjects used professional Java tools to perform several change tasks on two alternative Java designs that had a centralized and delegated control style, respectively. The results show that the most skilled developers, in particular, the senior consultants, require less time to maintain software with a delegated control style than with a centralized control style. However, more novice developers, in particular, the undergraduate students and junior consultants, have serious problems understanding a delegated control style, and perform far better with a centralized control style. Thus, the maintainability of object-oriented software depends, to a large extent, on the skill of the developers who are going to maintain it. These results may have serious implications for object-oriented development in an industrial context: Having senior consultants design object-oriented systems may eventually pose difficulties unless they make an effort to keep the designs simple, as the cognitive complexity of "expert" designs might be unmanageable for less skilled maintainers.

Programming is an increasingly important skill in the 21st century. Therefore, many education sys... more Programming is an increasingly important skill in the 21st century. Therefore, many education systems internationally offer noncompulsory programming (NCP) courses during high school years. Aim. Our goal is to study the effect of NCP on first-semester student performance in CS1. Because interest in computing is more often associated with men than women, we also want to study gender differences. Method. A total of 232 students from a Norwegian university were involved in the study. High school grades from the public student registry were analysed together with questionnaire data and the CS1 grade. Results. The students with NCP performed significantly better in CS1 than those without (average grade 4.4 vs. 3.6, where A, B,. .. , F is coded as 5, 4,. .. , 0). For women the difference in performance with and without NCP was 4.4 vs. 3.2, for men it was 4.4 vs. 3.8. Conclusion. This study shows that for students with NCP, the notorious gender difference in CS1 performance was absent. The other results merit further considerations regarding mathematics and science backgrounds, grades, prior experience, and self-efficacy. CCS CONCEPTS • Social and professional topics → CS1.
Benefits management in software development: A systematic review of empirical studies
IET Software, Feb 1, 2021
On the roles of software testers: An exploratory study
Journal of Systems and Software, Oct 1, 2023
Why theory matters
Elsevier eBooks, 2016
Abstract It is relatively easy to generate and acquire much data from software engineering activi... more Abstract It is relatively easy to generate and acquire much data from software engineering activities. The challenge is to obtain meaning from the data that represents something true, rather than spurious. To increase knowledge and insight, more theories should be built and used.

arXiv (Cornell University), Jun 8, 2023
Background: Construct validity concerns the use of indicators to measure a concept that is not di... more Background: Construct validity concerns the use of indicators to measure a concept that is not directly measurable. Aim: This study intends to identify, categorize, assess and quantify discussions of threats to construct validity in empirical software engineering literature and use the findings to suggest ways to improve the reporting of construct validity issues. Method: We analyzed 83 articles that report human-centric experiments published in five top-tier software engineering journals from 2015 to 2019. The articles' text concerning threats to construct validity was divided into segments (the unit of analysis) based on predefined categories. The segments were then evaluated regarding whether they clearly discussed a threat and a construct. Results: Three-fifths of the segments were associated with topics not related to construct validity. Two-thirds of the articles discussed construct validity without using the definition of construct validity given in the article. The threats were clearly described in more than four-fifths of the segments, but the construct in question was clearly described in only two-thirds of the segments. The construct was unclear when the discussion was not related to construct validity but to other types of validity. Conclusions: The results show potential for improving the understanding of construct validity in software engineering. Recommendations addressing the identified weaknesses are given to improve the awareness and reporting of CV.
Challenges and Recommendations when Increasing the Realism of Controlled Software Engineering Experiments
Springer eBooks, Oct 4, 2006
ABSTRACT An important goal of most empirical software engineering experiments is the transfer of ... more ABSTRACT An important goal of most empirical software engineering experiments is the transfer of the research results to industrial applications. To convince industry about the validity and applicability of the results of controlled software engineering experiments, the tasks, subjects and the environments should be as realistic as practically possible. Such experiments are, however, more demanding and expensive than experiments involving students, small tasks and pen-and-paper environments. This chapter describes challenges of increasing the realism of controlled experiments and lessons learned from the experiments that have been conducted at Simula Research Laboratory.
Java Programming Skill Task Instrument

An empirical study of WIP in kanban teams
Background: Limiting the amount of Work-In-Progress (WIP) is considered a fundamental principle i... more Background: Limiting the amount of Work-In-Progress (WIP) is considered a fundamental principle in Kanban software development. However, no published studies from real cases exist that indicate what an optimal WIP limit should be. Aims: The primary aim is to study the effect of WIP on the performance of a Kanban team. The secondary aim is to illustrate methodological challenges when attempting to identify an optimal or appropriate WIP limit. Method: A quantitative case study was conducted in a software company that provided information about more than 8,000 work items developed over four years by five teams. Relationships between WIP, lead time and productivity were analyzed. Results: WIP correlates with lead time; that is, lower WIP indicates shorter lead times, which is consistent with claims in the literature. However, WIP also correlates with productivity, which is inconsistent with the claim in the literature that a low WIP (still above a certain threshold) will improve productivity. The collected data set did not include sufficient information to measure aspects of quality. There are several threats to the way productivity was measured. Conclusions: Indicating an optimal WIP limit is difficult in the studied company because a changing WIP gives contrasting results on different team performance variables. Because the effect of WIP has not been quantitatively examined before, this study clearly needs to be replicated in other contexts. In addition, studies that include other team performance variables, such as various aspects of quality, are requested. The methodological challenges illustrated in this paper need to be addressed.

Objective: Our objective is to describe how software engineering might benefit from an evidence-b... more Objective: Our objective is to describe how software engineering might benefit from an evidence-based approach and to identify the potential difficulties associated with the approach. Method: We compared the organisation and technical infrastructure supporting evidence-based medicine (EBM) with the situation in software engineering. We considered the impact that factors peculiar to software engineering (i.e. the skill factor and the lifecycle factor) would have on our ability to practice evidence-based software engineering (EBSE). Results: EBSE promises a number of benefits by encouraging integration of research results with a view to supporting the needs of many different stakeholder groups. However, we do not currently have the infrastructure needed for widespread adoption of EBSE. The skill factor means software engineering experiments are vulnerable to subject and experimenter bias. The lifecycle factor means it is difficult to determine how technologies will behave once deployed. Conclusions: Software engineering would benefit from adopting what it can of the evidence approach provided that it deals with the specific problems that arise from the nature of software engineering. 1 www.med.nagoya-cu.ac.jp/psych.dir/ebpcenter.htm 2 www.york.ac.uk/healthsciences/centres/evidence/cebn.htm 3 www.evidencenetwork.org 4 cem.dur.ac.uk/ebeuk/EBEN.htm

Springer eBooks, 2006
This paper relates a technology transfer experience which aims at supporting the introduction of ... more This paper relates a technology transfer experience which aims at supporting the introduction of software process improvement in small businesses, small organizations and/or small projects. The experience is born from a European interregional collaboration between two university research teams (France and Belgium) and a public technology center (Luxembourg). One of the contributions of this experience is the design of a Software Process Improvement approach particularly adapted to small units on the one hand, and to regional context, on the other hand. The proposed approach is gradual. It is based on three nested evaluation models ranging from an extremely simplified model (the micro-evaluation model) to a complete standard model which is a version of SPICE. The intermediate model, called the mini-evaluation model, can be viewed as a tailoring of SPICE and can be used by itself as a definitive model by small businesses and small organizations.
IEEE Software, May 1, 2020
Members of high performing software teams collaborate, exchange information and coordinate their ... more Members of high performing software teams collaborate, exchange information and coordinate their work on a frequent, regular basis. Most teams have the daily stand-up meeting as a central venue for these activities. Although this kind of meeting is one of the most popular agile practices, it has received little attention from researchers. We observed 102 daily stand-ups and interviewed 60 members of 15 teams in five countries. We found that the practice is usually challenging to conduct in a way that benefits the whole team. Many team members have a negative experience from conducting the meeting, which reduces job satisfaction, co-worker trust and well-being. However, the practice can be adjusted and improved to empower teams. In this article, we describe key factors that affect the meeting and propose four recommendations for improving the practice.
Springer eBooks, 2001
Use case models are used in object-oriented analysis for capturing and describing the functional ... more Use case models are used in object-oriented analysis for capturing and describing the functional requirements of a system. Several methods for estimating software development effort are based on attributes of a use case model. This paper reports the results of three industrial case studies on the application of a method for effort estimation based on use case points. The aim of this paper is to provide guidance for other organizations that want to improve their estimation process applying use cases. Our results support existing claims that use cases can be used successfully in estimating software development effort. The results indicate that the guidance provided by the use case points method can support expert knowledge in the estimation process. Our experience is also that the design of the use case models has a strong impact on the estimates.

IET Software, May 23, 2022
Organisations spend much money on Information Technology (IT) development and maintenance activit... more Organisations spend much money on Information Technology (IT) development and maintenance activities with the intention that these activities will create results that enable benefits for the organisations. This paper seeks to understand potential associations between IT development and maintenance activities and the adoption of benefits management practices to realise value for the organization. The aim is also to uncover potential differences between public and private organisations. We surveyed 86 Norwegian public and private organisations, including data collected in similar surveys every five years since 1993. For the period between 1998 and 2018, we observe a stable pattern of IT work distribution. We found that organisations that managed benefits put more effort into advancing functionality for the end-users than other organisations, and they realised more benefits. This advantage was particularly true for organisations that managed benefits beyond the early stages of the development lifecycle. Private organisations both managed and realised benefits to a larger extent than public organisations. Our findings can enable organisations to be evidence-based when choosing management practices to achieve a higher return on investments in IT development and maintenance activities. This is an open access article under the terms of the Creative Commons Attribution-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited and no modifications or adaptations are made.

IEEE Transactions on Software Engineering, Mar 1, 2023
Empirical research aims to establish generalizable claims from data. Such claims may involve conc... more Empirical research aims to establish generalizable claims from data. Such claims may involve concepts that must be measured indirectly by using indicators. Construct validity is concerned with whether one can justifiably make claims at the conceptual level that are supported by results at the operational level. We report a quantitative analysis of the awareness of construct validity in the software engineering literature between 2000 and 2019 and a qualitative review of 83 articles about human-centric experiments published in five high-quality journals between 2015 and 2019. Over the two decades, the appearance in the literature of the term construct validity increased sevenfold. Some of the reviewed articles we reviewed employed various ways to ensure that the indicators span the concept in an unbiased manner. We also found articles that reuse formerly validated constructs. However, the articles disagree about how to define construct validity. Several interpret construct validity excessively by including threats to internal, external, or statistical conclusion validity. A few articles also include fundamental challenges of a study, such as cheating and misunderstanding of experiment material. The diversity of topics included as threats to construct validity calls for a more minimalist approach. Based on the review, we propose seven guidelines to improve how construct validity is handled and reported in software engineering.
Springer eBooks, 2008
The use of registered names, trademarks, etc., in this publication does not imply, even in the ab... more The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.
Revised Papers from the 9th International Workshop on Persistent Object Systems

arXiv (Cornell University), Oct 25, 2018
In a recent article, Falessi et al. (2017) call for a deeper understanding of the pros and cons o... more In a recent article, Falessi et al. (2017) call for a deeper understanding of the pros and cons of using students and professionals in experiments. The authors state: "we have observed too many times that our papers were rejected because we used students as subjects." Good experiments with students are certainly a valuable asset in the body of research in software engineering. Papers should thus not be rejected solely on the ground that the subjects are students. However, the distribution in skill is different for students and professionals. Since previous studies have shown that skill may have a moderating effect on the treatment of participants, we are concerned that studies involving developers with only low to medium skill (i.e., students) may result in wrong inferences about which technology, method or tool is better in the software industry. We therefore provide suggestions for how experiments with students can be improved and also comment on some of the alleged drawbacks of using professionals that Falessi et al. point out.
Uploads
Papers by Dag I K Sjoberg