SNSF: Social Interactions and Architecture in OSS - ETH - Chair of Systems Design

On the Interplay between Social Interactions and Software Architecture in Open Source Software

This project is related to our research lines: Social Software Engineering, Design and analysis of socio-technical systems, and Applications in Software Engineering

Duration: 36 months (October 2009 - September 2012)

Funding source: Swiss National Science Foundation (Grant CR12I1_125298)

Open Source Software refers to software developed by voluntary contributors and distributed under specific licensing terms which enables users to study the code and alter it at will. The popularity of Open Source is reflected by the fact that it led to several category-killers: products that quickly took over significant market-share. For example, the Apache web server holds around 50 % of the world-wide server market and Mozilla's Firefox holds around 30 % of the browser market.

Consequently, several scientific disciplines took up research on Open Source. Physicists use the new developments in network science to study both the architecture of Open Source solutions as well as the social networks of the developer communities. In Computer Science, software engineering research studies the efficiency of collaboration and coordination practices employed by Open Source Software communities. Management Science centers on the motivation of developers, the competitive dynamics between Open Source and proprietary software solutions as well as the determinants of success.

The aim of this project is to bring together these scientific disciplines, to harvest the synergies between them, and to advance the understanding of the complex socio-technological dynamics underlying Open Source Software beyond the scope of one particular discipline. We focus on the statistical laws governing the evolution of the software architecture, its link to project organisation, and the resulting social dynamics.

The project contributes both to science and practice. With its explicit multi-disciplinary setup, it establishes a holistic picture of the phenomenon of Open Source and fosters cross fertilisation between physics, computer science and management. We suppose that this insight will yield results which will be also relevant to practitioners. Understanding the statistical laws of software evolution may help developers to steer development towards favorable architectures. Understanding the link between architecture and project organisation may enable new management principles or provide tools for smoothing the interface between software, developers, and users.

Selected Publications

Communication In Innovation Communities: An Analysis Of 100 Open Source Software Projects				[2014]
Geipel, Markus Michael; Press, Kerstin; Schweitzer, Frank
ACS - Advances in Complex Systems, pages: 1550006, volume: 17, number: 07n08				more» «less

DOI:

10.1142/S021952591550006X

Abstract

We develop a model of innovation communities which allows us to address in a systematic way the influence of users and developers as well as communication between and within these groups. Based on this model, we derive a formal approach to quantify communication flows, community activity and community turnover. These measures are calculated using the data of 100 open source software projects. Our empirical analysis shows that: (i) Users play indeed a predominant role in communication, which points towards the vivid role of an active user community; (ii) communication is highly concentrated, which points towards the importance of active individuals and (iii) community turnover exhibits only little correlation with community segregation, which may allow to benefit from high turnover rates while keeping negative effects small. We argue that insight from this extensive analysis not only complements existing case studies, it also provides a reference frame to put these singular results into perspective when aiming at generalizations.

link

|bibtex

|view

|download

Categorizing bugs with social networks: A case study on four open source software communities				[2013]
Zanetti, Marcelo Serrano; Scholtes, Ingo; Tessone, Claudio Juan; Schweitzer, Frank
ICSE '13 Proceedings of the 35th International Conference on Software Engineering				more» «less

DOI:

10.1109/ICSE.2013.6606653

Abstract

Efﬁcient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of comparably inexperienced part-time contributors. In this paper, we propose an efﬁcient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away. Our classiﬁcation is based on nine measures to quantify the social embeddedness of bug reporters in the collaboration network. We demonstrate its applicability in a case study, using a comprehensive data set of more than 700, 000 bug reports obtained from the BUGZILLA installation of four major OSS communities, for a period of more than ten years. For those projects that exhibit the lowest fraction of valid bug reports, we ﬁnd that the bug reporters’ position in the collaboration network is a strong indicator for the quality of bug reports. Based on this ﬁnding, we develop an automated classiﬁcation scheme that can easily be integrated into bug tracking platforms and analyze its performance in the considered OSS communities. A support vector machine (SVM) to identify valid bug reports based on the nine measures yields a precision of up to 90.3% with an associated recall of 38.9%. With this, we signiﬁcantly improve the results obtained in previous case studies for an automated early identiﬁcation of bugs that are eventually ﬁxed. Furthermore, our study highlights the potential of using quantitative measures of social organization in collaborative software engineering. It also opens a broad perspective for the integration of social network analysis in the design of support infrastructures.

arxiv

|link

|bibtex

|view

|download

@inproceedings{Scholtes2013,
    author = "Zanetti, Marcelo Serrano and Scholtes, Ingo and Tessone, Claudio Juan and Schweitzer, Frank",
    conference = "In Proceedings of the International Conference on Software Engineering",
    doi = "10.1109/ICSE.2013.6606653",
    isbn = "978-1-4673-3076-3",
    title = "Categorizing bugs with social networks: A case study on four open source software communities",
    url = "http://dl.acm.org/citation.cfm?id=2486788.2486930\&coll=DL\&dl=ACM\&CFID=220466194\&CFTOKEN=89604713",
    abstract = "Efﬁcient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of comparably inexperienced part - time contributors. In this paper, we propose an efﬁcient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away. Our classiﬁcation is based on nine measures to quantify the social embeddedness of bug reporters in the collaboration network. We demonstrate its applicability in a case study, using a comprehensive data set of more than 700, 000 bug reports obtained from the BUGZILLA installation of four major OSS communities, for a period of more than ten years. For those projects that exhibit the lowest fraction of valid bug reports, we ﬁnd that the bug reporters’ position in the collaboration network is a strong indicator for the quality of bug reports. Based on this ﬁnding, we develop an automated classiﬁcation scheme that can easily be integrated into bug tracking platforms and analyze its performance in the considered OSS communities. A support vector machine (SVM) to identify valid bug reports based on the nine measures yields a precision of up to 90.3\% with an associated recall of 38.9\%. With this, we signiﬁcantly improve the results obtained in previous case studies for an automated early identiﬁcation of bugs that are eventually ﬁxed. Furthermore, our study highlights the potential of using quantitative measures of social organization in collaborative software engineering. It also opens a broad perspective for the integration of social network analysis in the design of support infrastructures.",
    year = "2013",
    arxivid = "1302.6764",
    booktitle = "ICSE '13 Proceedings of the 35th International Conference on Software Engineering",
    pages = "1032-1041"
}

The Role of Emotions in Contributors Activity: A Case Study of the Gentoo Community				[2013]
Garcia, David; Zanetti, Marcelo Serrano; Schweitzer, Frank
In Proceedings of the International Conference on Social Computing and Its Applications				more» «less

DOI:

10.1109/CGC.2013.71

Abstract

We analyse the relation between the emotions and the activity of contributors in the Open Source Software project Gentoo. Our case study builds on extensive data sets from the project's bug tracking platform Bugzilla, to quantify the activity of contributors, and its mail archives, to quantify the emotions of contributors by means of sentiment analysis. The Gentoo project is known for a considerable drop in development performance after the sudden retirement of a central contributor. We analyse how this event correlates with the negative emotions, both in bilateral email discussions with the central contributor, and at the level of the whole community of contributors. We then extend our study to consider the activity patters on Gentoo contributors in general. We find that contributors are more likely to become inactive when they express strong positive or negative emotions in the bug tracker, or when they deviate from the expected value of emotions in the mailing list. We use these insights to develop a Bayesian classifier that detects the risk of contributors leaving the project. Our analysis opens new perspectives for measuring online contributor motivation by means of sentiment analysis and for real-time predictions of contributor turnover in Open Source Software projects.

arxiv

|link

|bibtex

|view

|download

The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community				[2013]
Zanetti, Marcelo Serrano; Scholtes, Ingo; Tessone, Claudio Juan; Schweitzer, Frank
CHASE/ICSE '13 Proceedings of the 6th International Workshop on Cooperative and Human Aspects of Software Engineering				more» «less

DOI:

10.1109/CHASE.2013.6614731

Abstract

Social organization and division of labor crucially influence the performance of collaborative software engineering efforts. In this paper, we provide a quantitative analysis of the relation between social organization and performance in Gentoo, an Open Source community developing a Linux distribution. We study the structure and dynamics of collaborations as recorded in the project's bug tracking system over a period of ten years. We identify a period of increasing centralization after which most interactions in the community were mediated by a single central contributor. In this period of maximum centralization, the central contributor unexpectedly left the project, thus posing a significant challenge for the community. We quantify how the rise, the activity as well as the subsequent sudden dropout of this central contributor affected both the social organization and the bug handling performance of the Gentoo community. We analyze social organization from the perspective of network theory and augment our quantitative findings by interviews with prominent members of the Gentoo community which shared their personal insights.

arxiv

|link

|bibtex

|view

|download

The co-evolution of socio-technical structures in sustainable software development: Lessons from the open source software communities				[2012]
Zanetti, Marcelo Serrano
ICSE '12 Proceedings of the 34th International Conference on Software Engineering				more» «less

DOI:

10.1109/ICSE.2012.6227030

Abstract

Software development depends on many factors, including technical, human and social aspects. Due to the complexity of this dependence, a unifying framework must be defined and for this purpose we adopt the complex networks methodology. We use a data-driven approach based on a large collection of open source software projects extracted from online project development platforms. The preliminary results presented in this article reveal that the network perspective yields key insights into the sustainability of software development.

link

|bibtex

|view

|download

The Link between Dependency and Cochange: Empirical Evidence				[2012]
Geipel, Markus Michael; Schweitzer, Frank
IEEE Transactions on Software Engineering, pages: 1432-1444, volume: 38, number: 6				more» «less

DOI:

10.1109/TSE.2011.91

Abstract

We investigate the relationship between class dependency and change propagation (cochange) in software written in Java. On the one hand, we find a strong correlation between dependency and cochange. Furthermore, we provide empirical evidence for the propagation of change along paths of dependency. These findings support the often alleged role of dependencies as propagators of change. On the other hand, we find that approximately half of all dependencies are never involved in cochanges and that the vast majority of cochanges pertain to only a small percentage of dependencies. This means that inferring the cochange characteristics of a software architecture solely from its dependency structure results in a severely distorted approximation of cochange characteristics. Any metric which uses dependencies alone to pass judgment on the evolvability of a piece of Java software is thus unreliable. As a consequence, we suggest to always take both the change characteristics and the dependency structure into account when evaluating software architecture.

link

|bibtex

|request pdf

Sustainable growth in complex networks				[2011]
Tessone, Claudio Juan; Geipel, Markus Michael; Schweitzer, Frank
Europhysics Letters, pages: 58005, volume: 96, number: 5				more» «less

DOI:

10.1209/0295-5075/96/58005

Abstract

Based on the analysis of the dependency network in 18 Java projects, we develop a novel model of network growth which considers both preferential attachment and the addition of new nodes with a heterogeneous distribution of their initial degree, k0. Empirically we find that the cumulative distributions of initial and final degrees in the network follow power law behaviours: 1−P(k0)∝k1−$α$ as a function of the network size, we find empirically K(N)∝N$β$,where $β$ ∈[1.25, 2] (for small N), while converging to $β$ ∼1 for large N. This indicates a transition from a growth regime with increasing network density towards a sustainable regime, which prevents a collapse due to 0 ,and 1−P(k)∝k1−$γ$, respectively. For the total number of links ever increasing dependencies. Our theoretical framework allows us to predict relations between the exponents $α$, $β$, $γ$, which also link issues of software engineering and developer activity. These relations are verified by means of computer simulations and empirical investigations. They indicate that the growth of real Open Source Software networks occurs on the edge between two regimes, which are dominated either by the initial degree distribution of added nodes, or by the preferential attachment mechanism. Hence, the heterogeneous degree distribution of newly added nodes, found empirically, is essential to describe the laws of sustainable growth in networks.

arxiv

|link

|bibtex

|view

|download

Software change dynamics: Evidence from 35 Java projects				[2009]
Geipel, Markus Michael; Schweitzer, Frank
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering				more» «less

DOI:

10.1145/1595696.1595739

Abstract

In this paper we investigate the relationship between class dependency and change propagation in Java software. By analyzing 35 large Open Source Java projects, we find that in the majority of the projects more than half of the dependencies are never involved in change propagation. Furthermore, our analysis shows that only a few dependencies are transmitting the majority of change propagation events. An additional analysis reveals that this concentration cannot be explained by the different ages of the dependencies. The conclusion is that the dependency structure alone is a poor measure for the change dynamics. This contrasts with current literature.

link

|bibtex

|view

|download

A complementary view on the growth of directory trees				[2009]
Geipel, Markus Michael; Tessone, Claudio Juan; Schweitzer, Frank
The European Physical Journal B, pages: 641-648, volume: 71, number: 4				more» «less

DOI:

10.1140/epjb/e2009-00302-5

Abstract

Trees are a special sub-class of networks with unique properties, such as the level distribution which has often been overlooked.We analyse a general tree growth model proposed by Klemm et al. [Phys. Rev. Lett. 95, 128701 (2005)] to explain the growth of user-generated directory structures in computers. The model has a single parameter q which interpolates between preferential attachment and random growth. Our analysis results in three contributions: first, we propose a more efficient estimation method for q based on the degree distribution, which is one specific representation of the model. Next, we introduce the concept of a level distribution and analytically solve the model for this representation. This allows for an alternative and independent measure of q.We argue that, to capture real growth processes, the q estimations from the degree and the level distributions should coincide. Thus, we finally apply both representations to validate the model with synthetically generated tree structures, as well as with collected data of user directories. In the case of real directory structures, we show that q measured from the level distribution are incompatible with q measured from the degree distribution. In contrast to this, we find perfect agreement in the case of simulated data. Thus, we conclude that the model is an incomplete description of the growth of real directory structures as it fails to reproduce the level distribution. This insight can be generalised to point out the importance of the level distribution for modeling tree growth.

arxiv

|link

|bibtex

|view

|download

@article{Geipel2009,
    author = "Geipel, Markus Michael and Tessone, Claudio Juan and Schweitzer, Frank",
    doi = "10.1140/epjb/e2009-00302-5",
    title = "A complementary view on the growth of directory trees",
    journal = "The European Physical Journal B",
    abstract = "Trees are a special sub-class of networks with unique properties, such as the level distribution which has often been overlooked.We analyse a general tree growth model proposed by Klemm et al. [Phys. Rev. Lett. 95, 128701 (2005)] to explain the growth of user-generated directory structures in computers. The model has a single parameter q which interpolates between preferential attachment and random growth. Our analysis results in three contributions: first, we propose a more efficient estimation method for q based on the degree distribution, which is one specific representation of the model. Next, we introduce the concept of a level distribution and analytically solve the model for this representation. This allows for an alternative and independent measure of q.We argue that, to capture real growth processes, the q estimations from the degree and the level distributions should coincide. Thus, we finally apply both representations to validate the model with synthetically generated tree structures, as well as with collected data of user directories. In the case of real directory structures, we show that q measured from the level distribution are incompatible with q measured from the degree distribution. In contrast to this, we find perfect agreement in the case of simulated data. Thus, we conclude that the model is an incomplete description of the growth of real directory structures as it fails to reproduce the level distribution. This insight can be generalised to point out the importance of the level distribution for modeling tree growth.",
    issn = "1434-6028",
    arxivid = "0902.1114",
    number = "4",
    mendeleytags = "FS-Public2005-2011,PRJ\_OSS,SG-Publication,TOP\_OSS,model selection,network analysis,tree",
    month = "September",
    volume = "71",
    url = "https://link.springer.com/article/10.1140/epjb/e2009-00302-5",
    year = "2009",
    keywords = "FS-Public2005-2011,Networks,Networks and genealogical trees,PRJ\_OSS,SG-Publication,Structures and organisation in complex systems,TOP\_OSS,model selection,network analysis,tree",
    pages = "641-- 648"
}

On the Interplay between Social Interactions and Software Architecture in Open Source Software

Selected Publications

Communication In Innovation Communities: An Analysis Of 100 Open Source Software Projects

ACS - Advances in Complex Systems, pages: 1550006, volume: 17, number: 07n08

Categorizing bugs with social networks: A case study on four open source software communities

ICSE '13 Proceedings of the 35th International Conference on Software Engineering

The Role of Emotions in Contributors Activity: A Case Study of the Gentoo Community

In Proceedings of the International Conference on Social Computing and Its Applications

The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community

CHASE/ICSE '13 Proceedings of the 6th International Workshop on Cooperative and Human Aspects of Software Engineering

The co-evolution of socio-technical structures in sustainable software development: Lessons from the open source software communities

ICSE '12 Proceedings of the 34th International Conference on Software Engineering

The Link between Dependency and Cochange: Empirical Evidence

IEEE Transactions on Software Engineering, pages: 1432-1444, volume: 38, number: 6

Sustainable growth in complex networks

Europhysics Letters, pages: 58005, volume: 96, number: 5

Software change dynamics: Evidence from 35 Java projects

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

A complementary view on the growth of directory trees

The European Physical Journal B, pages: 641-648, volume: 71, number: 4