A computer science and statistics major explores the impact of corporate participation on open source software project communities.
CELEBRATE! Week offers an annual opportunity to highlight the academic and artistic achievements of Elon students and faculty. Each day this week, we’ll be putting the spotlight on a student scholar’s research — what they are seeking to find out, and who they became interested in their project.
Name: Jack Hartmann
Area of study: Computer Science
Majors: Computer Science and Statistics
Minor: Data Science
Faculty mentor: Megan Squire, professor of computer science
Title of research: Understanding the Impact of Corporate Participation on Open Source Software Project Communities
Abstract:
Free, Libre and Open Source Software (FLOSS) have become increasingly important in our technology-driven world. FLOSS products all have software licenses that explicitly give users the freedom to use, share, study and modify the software, including their source codes.
This user-oriented focus meant that initially many FLOSS projects were small “hobby” projects with one or two core members. Nowadays, blue chip software companies such as Samsung, Microsoft, Facebook, Google, IBM and HP are hiring software engineering teams specifically for the purpose of working on FLOSS projects.
The primary goal of this research project is to measure and evaluate the impact of corporations on FLOSS teams over time. To answer our research question, we first wrote software to collect, clean and store over 20 million emails used by Apache Software Foundation (ASF) project teams to organize their work. Next, we used data mining techniques to discover patterns in these emails that could be a result of increasing corporate involvement.
The ultimate goal of this multi-year research project is to publish research showing a better understanding of the impact of corporate involvement on FLOSS projects, with a secondary goal of creating high-quality datasets that other researchers can use for further analysis. For this SURF presentation, we will describe how FLOSS works and why we selected the ASF as a source for our data. We will show how we collected, cleaned and stored 20 million emails, and we will show some preliminary results of the data mining activities we have completed so far. Finally, we will outline the next phase of our work, which will involve more advanced data mining techniques such as sentiment analysis and natural language processing.
In other words:
FLOSS is a type of software that uses a license that explicitly gives the user the freedom to use, share, study and modify the software, including its source code. In the 1990s and 2000s when the concept of free software was new, FLOSS teams were usually led by a single “true believer” in intellectual property rights.
These early leaders set about creating a project management structure that mirrored their beliefs about freedom and transparency. An initial focus on developer and user freedom helped FLOSS teams produce high-quality products and foster fun and diverse working environments. By 2016, FLOSS had become a $60 billion industry, and it is accepted as fact that the FLOSS method of developing software has “won” over traditional, proprietary methods.
With this victory have come some changes. Any multi-billion-dollar industry will attract profit seekers, and FLOSS is no different. In this research, our primary goal is to learn more about the growing involvement of corporations in the development of FLOSS. We want to understand their influence on this important software segment.
Explanation of study:
While we are only one year into the two-year thesis project, up until this point the focus has been on data collection and data cleaning. More than 19 million emails that the Apache Software Foundation has made public have been collected and broken down into the necessary components for further analysis. Currently, we are working with over 350GB of information collected using software we wrote.
We have made some findings with regard to the general impact and story behind corporate participation in projects within the Apache Software Foundation. In the months to come, we plan to look at the data with a variety of text analysis techniques to better understand the cultural shift that occurs when large, private corporations engulf a project.
What made this research interesting to you? How did you get started?
At first, the idea that private companies were profiting from “free” software was a bit confusing to me. I began to wonder why a company would have any interest in working on a project or piece of software that it did not own.
Additionally, the transition that has occurred in open source software over the past 10 to 20 years is interesting and still not completely understood. At one point, programmers developed FLOSS projects on the side, and they were developed for the greater good with the purpose of sharing code with the community. Now, most of the larger projects are dominated by a selection of blue chip technology companies, and some private companies have even formed to develop open source software entirely.