Social Software Engineering
Besides being a problem that is challenging from a technical point of view, software engineering processes also comprise numerous interesting social aspects that deserve to be studied scientifically. Representing one particularly interesting type of socio-technical system, we thus investigate the social structures and dynamics of collaborative software engineering processes. We are generally interested in the question how particular ways to structure collaborations, or to distribute tasks and responsibilities in software development teams will positively or negatively affect the performance of the project.
Our research methodology is data-driven, i.e. we analyse large-scale data sets capturing social dynamics in collaborative software engineering and make use of recent advances in the theory of complex systems, complex networks, machine learning and statistical analysis of big data. For this purpose, we commonly make use of available data on the community of developers and contributors to large-scale Open Source Software projects such as Eclipse, NetBeans, KDE, Linux or the Linux distribution Gentoo. Studying the evolution of the social organisation of these projects over periods of more than one decade, we can then try to link the performance of the projects to features of the collaboration or communication networks in their underlying communities.

Collaboration structures of several software development communities. Nodes represent contributors to the project, links represent collaborations between them. This picture illustrates the significant differences between highly centralized projects (left), projects with a modular or hierarchical social organisation (middle) and projects with less structured collaboration networks (right).
Our recent research in this area has uncovered interesting relations between the topology of collaborations and both individual and collective performance in software development projects. Through a case study of the software project Gentoo, we were able to find empirical evidence for that fact that the runaway centralisation of collaboration structures poses a severe risk to software projects, resulting in a severe and lasting drop in performance if one of the central collaborators decides to leave the project. Using data of evolving collaboration structures, we were further able to show that the presence of such central collaborators results in a decreasing commitment of the remaining community, thus further driving an increasing centralisation.
Apart from such empirical studies, we have also used our complex systems and complex networks perspective to suggest concrete improvements of online collaboration tools. Again using large-scale data sets on collaborative software engineering projects, we were able to show that the collaboration structures of software development communities can be used to automatically assess the quality of bug reports that contributors submit to the community. We not only show that there is a statistically significant relation between the information provided in bug reports and the collaboration networks into which the reporting user is embedded. We also use state-of-the-art machine learning techniques to develop a fully automatic classifier which can be directly used to mitigate the information overload which is a severe problem for many of the large-scale software engineering projects.
We would like to highlight that in the research summarised above, we particularly avoid to oversimplify complex engineering processes by integrating hands-on experience in software engineering with expertise in data mining and the modeling and analysis of complex systems. Through the unique combination of competencies in computer science, statistical data analysis and complex networks theory present in our team, we were able to publish our results not only in interdisciplinary journals like \emph{Advances in Complex Systems}, but also in premium software engineering venues such as the \emph{International Conference on Software Engineering}. Apart from this scientific recognition, our research results have further attracted significant attention from industry, as documented by talk invitations from major software companies such as Google, IBM and DATEV.
From Aristotle to Ringelmann: A large-scale analysis of team productivity and coordination in Open Source Software projects
|
[2016]
|
Scholtes, Ingo;
Mavrodiev, Pavlin;
Schweitzer, Frank
|
Empirical Software Engineering,
pages: 642-683,
volume: 21,
number: 2
|
more» «less
|
Abstract
Complex software development projects rely on the contribution of teams of developers, who are required to collaborate and coordinate their efforts. The productivity of such development teams, i.e., how their size is related to the produced output, is an important consideration for project and schedule management as well as for cost estimation. The majority of studies in empirical software engineering suggest that - due to coordination overhead - teams of collaborating developers become less productive as they grow in size. This phenomenon is commonly paraphrased as Brooks’ law of software project management, which states that “adding manpower to a software project makes it later”. Outside software engineering, the non-additive scaling of productivity in teams is often referred to as the Ringelmann effect, which is studied extensively in social psychology and organizational theory. Conversely, a recent study suggested that in Open Source Software (OSS) projects, the productivity of developers increases as the team grows in size. Attributing it to collective synergetic effects, this surprising finding was linked to the Aristotelian quote that “the whole is more than the sum of its parts”. Using a data set of 58 OSS projects with more than 580,000 commits contributed by more than 30,000 developers, in this article we provide a large-scale analysis of the relation between size and productivity of software development teams. Our findings confirm the negative relation between team size and productivity previously suggested by empirical software engineering research, thus providing quantitative evidence for the presence of a strong Ringelmann effect. Using fine-grained data on the association between developers and source code files, we investigate possible explanations for the observed relations between team size and productivity. In particular, we take a network perspective on developer-code associations in software development teams and show that the magnitude of the decrease in productivity is likely to be related to the growth dynamics of co-editing networks which can be interpreted as a first-order approximation of coordination requirements.
How do OSS projects change in number and size? A large-scale analysis to test a model of project growth
|
[2014]
|
Schweitzer, Frank;
Nanumyan, Vahan;
Tessone, Claudio Juan;
Xia, Xi
|
ACS - Advances in Complex Systems,
pages: 1550008,
volume: 17,
number: 07n08
|
more» «less
|
Abstract
Established open source software (OSS) projects can grow in size if new developers join, but also the number of OSS projects can grow if developers choose to found new projects. We discuss to what extent an established model for firm growth can be applied to the dynamics of OSS projects. Our analysis is based on a large-scale data set from SourceForge (SF) consisting of monthly data for 10 years, for up to 360,000 OSS projects and up to 340,000 developers. Over this time period, we find an exponential growth both in the number of projects and developers, with a remarkable increase of single-developer projects after 2009. We analyze the monthly entry and exit rates for both projects and developers, the growth rate of established projects and the monthly project size distribution. To derive a prediction for the latter, we use modeling assumptions of how newly entering developers choose to either found a new project or to join existing ones. Our model applies only to collaborative projects that are deemed to grow in size by attracting new developers. We verify, by a thorough statistical analysis, that the Yule–Simon distribution is a valid candidate for the size distribution of collaborative projects except for certain time periods where the modeling assumptions no longer hold. We detect and empirically test the reason for this limitation, i.e., the fact that an increasing number of established developers found additional new projects after 2009.
Communication In Innovation Communities: An Analysis Of 100 Open Source Software Projects
|
[2014]
|
Geipel, Markus Michael;
Press, Kerstin;
Schweitzer, Frank
|
ACS - Advances in Complex Systems,
pages: 1550006,
volume: 17,
number: 07n08
|
more» «less
|
Abstract
We develop a model of innovation communities which allows us to address in a systematic way the influence of users and developers as well as communication between and within these groups. Based on this model, we derive a formal approach to quantify communication flows, community activity and community turnover. These measures are calculated using the data of 100 open source software projects. Our empirical analysis shows that: (i) Users play indeed a predominant role in communication, which points towards the vivid role of an active user community; (ii) communication is highly concentrated, which points towards the importance of active individuals and (iii) community turnover exhibits only little correlation with community segregation, which may allow to benefit from high turnover rates while keeping negative effects small. We argue that insight from this extensive analysis not only complements existing case studies, it also provides a reference frame to put these singular results into perspective when aiming at generalizations.
The Role of Emotions in Contributors Activity: A Case Study of the Gentoo Community
|
[2013]
|
Garcia, David;
Zanetti, Marcelo Serrano;
Schweitzer, Frank
|
In Proceedings of the International Conference on Social Computing and Its Applications
|
more» «less
|
Abstract
We analyse the relation between the emotions and the activity of contributors in the Open Source Software project Gentoo. Our case study builds on extensive data sets from the project's bug tracking platform Bugzilla, to quantify the activity of contributors, and its mail archives, to quantify the emotions of contributors by means of sentiment analysis. The Gentoo project is known for a considerable drop in development performance after the sudden retirement of a central contributor. We analyse how this event correlates with the negative emotions, both in bilateral email discussions with the central contributor, and at the level of the whole community of contributors. We then extend our study to consider the activity patters on Gentoo contributors in general. We find that contributors are more likely to become inactive when they express strong positive or negative emotions in the bug tracker, or when they deviate from the expected value of emotions in the mailing list. We use these insights to develop a Bayesian classifier that detects the risk of contributors leaving the project. Our analysis opens new perspectives for measuring online contributor motivation by means of sentiment analysis and for real-time predictions of contributor turnover in Open Source Software projects.
The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community
|
[2013]
|
Zanetti, Marcelo Serrano;
Scholtes, Ingo;
Tessone, Claudio Juan;
Schweitzer, Frank
|
CHASE/ICSE '13 Proceedings of the 6th International Workshop on Cooperative and Human Aspects of Software Engineering
|
more» «less
|
Abstract
Social organization and division of labor crucially influence the performance of collaborative software engineering efforts. In this paper, we provide a quantitative analysis of the relation between social organization and performance in Gentoo, an Open Source community developing a Linux distribution. We study the structure and dynamics of collaborations as recorded in the project's bug tracking system over a period of ten years. We identify a period of increasing centralization after which most interactions in the community were mediated by a single central contributor. In this period of maximum centralization, the central contributor unexpectedly left the project, thus posing a significant challenge for the community. We quantify how the rise, the activity as well as the subsequent sudden dropout of this central contributor affected both the social organization and the bug handling performance of the Gentoo community. We analyze social organization from the perspective of network theory and augment our quantitative findings by interviews with prominent members of the Gentoo community which shared their personal insights.
Categorizing bugs with social networks: A case study on four open source software communities
|
[2013]
|
Zanetti, Marcelo Serrano;
Scholtes, Ingo;
Tessone, Claudio Juan;
Schweitzer, Frank
|
ICSE '13 Proceedings of the 35th International Conference on Software Engineering
|
more» «less
|
Abstract
Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of comparably inexperienced part-time contributors. In this paper, we propose an efficient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away. Our classification is based on nine measures to quantify the social embeddedness of bug reporters in the collaboration network. We demonstrate its applicability in a case study, using a comprehensive data set of more than 700, 000 bug reports obtained from the BUGZILLA installation of four major OSS communities, for a period of more than ten years. For those projects that exhibit the lowest fraction of valid bug reports, we find that the bug reporters’ position in the collaboration network is a strong indicator for the quality of bug reports. Based on this finding, we develop an automated classification scheme that can easily be integrated into bug tracking platforms and analyze its performance in the considered OSS communities. A support vector machine (SVM) to identify valid bug reports based on the nine measures yields a precision of up to 90.3% with an associated recall of 38.9%. With this, we significantly improve the results obtained in previous case studies for an automated early identification of bugs that are eventually fixed. Furthermore, our study highlights the potential of using quantitative measures of social organization in collaborative software engineering. It also opens a broad perspective for the integration of social network analysis in the design of support infrastructures.
The co-evolution of socio-technical structures in sustainable software development: Lessons from the open source software communities
|
[2012]
|
Zanetti, Marcelo Serrano
|
ICSE '12 Proceedings of the 34th International Conference on Software Engineering
|
more» «less
|
Abstract
Software development depends on many factors, including technical, human and social aspects. Due to the complexity of this dependence, a unifying framework must be defined and for this purpose we adopt the complex networks methodology. We use a data-driven approach based on a large collection of open source software projects extracted from online project development platforms. The preliminary results presented in this article reveal that the network perspective yields key insights into the sustainability of software development.
A quantitative study of social organisation in open source software communities
|
[2012]
|
Zanetti, Marcelo Serrano;
Sarigol, Emre;
Scholtes, Ingo;
Tessone, Claudio Juan;
Schweitzer, Frank
|
In Proceedings of the 2012 Imperial College Computing Student Workshop
|
more» «less
|
Abstract
The success of open source projects crucially depends on the voluntary contributions of a suf-
ficiently large community of users. Apart from the mere size of the community, interesting
questions arises when looking at the evolution of structural features of collaborations between
community members. In this article, we discuss several network analytic proxies that can be
used to quantify different aspects of the social organisation in social collaboration networks. We
particularly focus on measures that can be related to the cohesiveness of the communities, the
distribution of responsibilities and the resilience against turnover of community members. We
present a comparative analysis on a large-scale data set that covers the full history of collabor-
ations between users of 14 major open source software communities. Our analysis covers both
aggregate and time-evolving measures and highlights differences in the social organisation across
communities. We argue that our results are a promising first step towards the definition of
suitable, potentially multi-dimensional, resilience and risk indicators for open source software
communities.
A Complex Networks Perspective On Collaborative Software Engineering
|
[2015]
|
Cataldo, Marcelo;
Scholtes, Ingo;
Valetto, Giuseppe
|
ACS - Advances in Complex Systems,
pages: 1430001,
volume: 17,
number: 07n08
|
more» «less
|
Abstract
Large collaborative software engineering projects are interesting examples for evolving complex systems. The complexity of these systems unfolds both in evolving software structures, as well as in the social dynamics and organization of development teams. Due to the adoption of Open Source practices and the increasing use of online support infrastructures, large-scale data sets covering both the social and technical dimension of collaborative software engineering processes are increasingly becoming available. In the analysis of these data, a growing number of studies employ a network perspective, using methods and abstractions from network science to generate insights about software engineering processes. Featuring a collection of inspiring works in this area, with this topical issue, we intend to give an overview of state-of-the-art research. We hope that this collection of articles will stimulate downstream applications of network-based data mining techniques in empirical software engineering.
|