Open Source Projects: Studying The Theoretical And Empirical Evolution: [Essay Example], 2204 words GradesFixer

Haven't found the right essay?

Get an expert to write your essay!


Professional writers and researchers


Sources and citation are provided


3 hour delivery

This essay has been submitted by a student. This is not an example of the work written by professional essay writers.

Open Source Projects: Studying The Theoretical And Empirical Evolution

Print Download now

Pssst… we can write an original essay just for you.

Any subject. Any type of essay.

We’ll even meet a 3-hour deadline.

Get your price

121 writers online

Download PDF

The Dynamic Evolution of Open-Source Projects


This project aims at examining the dynamic evolution of open source projects both empirically and theoretically. In particular, we intend to address three questions:

  • What’s the role of commercial firms behind popular open source projects?
  • How do commercial firms work and compete with open source?
  • What’s the incentives of unpaid programmers to continue working on open source projects?

The success of open source movement has gained scholars’ attention for a rather long time. However, due to data availability, only a few empirical tests were done in current literature, and the theoretical development are also hindered by lack of empirical evidence.

The proposed project based on the well-documented Application Programming interface (API) of GitHub, the world largest source code hosting service provider. it try to address the above questions and construct a sampling database for further exploration.

The result of the pilot study will be used in two ways:

First, three regular research papers will be written and submitted to Internationally recognized journals in the field of industrial organisation.

  • Rand Journal of Economics (4* ABS),
  • International Journal of Industrial Organization (3* ABS),
  • Journal of Economics and Management Strategies (3* ABS).
  • Second, it will form the basis of further grant application. Potential external funding sources include:
  • Standard ESRC grant application,
  • Marie-Curie Fellowship,
  • ERC starting grant.


Open source software development involves developers at many different locations and organizations sharing code to develop programs. It has become quite popular among programmers’ community and is often referred to as a movement with an ideology and enthusiastic supporters. At the core of this process are three interesting phenomena:

• Unpaid volunteers do a non-trivial portion of the development of open source programs,

• Unlike commercial software, open source software is not sold or licensed for a fee.

• Many commercial firms are actively involved in developing open source projects .

Open source movement has been successful so far . The impact of open source projects have been spilling over to other areas such as academia. In the scientific world, open source programming languages such as Python, R and Octave also becomes increasingly popular in the field of machine learning and statistics.

In general, to open the black box of open-source project development process and understand its business cycle, competition pattern and lifespan not merely of academic interest, but also has important implications on firms’ strategies. This is also the research question that this project wants to resolve.

Limitation of the old database (SourceForge)

Most empirical literature (Athey et al., 2014; Fershtman et al., 2011; Lemley et al., 2011; Lerner et al., 2006) on open source based their work on data collected from SourceForge. SourceForge was used to be a leading source code hosting service provider before the rise of GitHub. As is shown in table 1, compared with what we can obtain from GitHub, the SourceForge data is quite incomplete and lack of many important aspects of project information.

  • Characteristics SourceForge GitHub
  • Project created date Yes Yes
  • Project versions Yes Yes
  • Contributors’ names Yes Yes
  • When a particular code commit was made No Yes
  • The details of each contributor’s contribution No Yes
  • The dynamics of contributors No Yes
  • The number of forked projects (an indicator of project popularity among developers) No Yes
  • The number of downloaded projects (an indicator of project popularity among customers) Yes Yes
  • User-centric data No Yes
  • Project-centric data Yes Yes

Lerner, Pathak and Tirole (2006) use SourdeForge data to examine the dynamics of open source contributors. They use contributors’ e-mail suffix as an identification of developers’ organisation, i.e. if a developer use an email address with ‘.com’ (except and, he or she will be classified as a firm-sponsored contributor; otherwise, if the email address ends with ‘.edu’ or ‘.org’, he or she will be identified an unpaid contributor/ programming habitant. After sampling 100 projects from SourceForge, They draw a conclusion that open source projects with more firm-sponsored contributors are likely be larger and more successful.

Although their conclusions are aligned with theoretical prediction, the method of identification is relatively imprecise, which might substantially undermine the validity of their results. Furthermore, their analysis is actually static and thus they cannot really examine the dynamics of the lifespan of open source projects, which is essential for understanding open source business pattern. Nevertheless, that’s what they can have with SourceForge data.

GitHub, as a hosting provider, has a significant feature — “every event is recorded and exportable.” The event here refers to any kind of activity that any developer conducts with any project, such as adding/modifying/deleting lines of code, forking a repository, and etc. Through GitHub’s API, we can easily profile a complete portrait of any hosted project, which enable us to do further analysis and appropriately address our research questions.

Theoretical Challenge

On the theoretical side, the enthusiastic behavior of individual programmers and commercial companies engaged in open source processes is startling to an economist at the first glance.As an initial response, Lerner and Tirole (2002, 2004) identified several short- and long-term benefits that might count.

First, developing open source projects may help developers shape their programming skills. This outcome is particularly relevant for system administrators looking for specific solutions for their company. Second, the programmer may find intrinsic pleasure if they are developing ‘cool stuff’. Third, in the long run, open source contributions may help developers build reputation, which lead to future job offers, shares in commercial open source-based companies, or future access to the venture capital market. However they did not provide a formal theoretical model of open source development.

A recent work of Athey (2014) build a dynamic model to explain the evolution of open source software projects, but her model heavily relies on the assumption of reciprocal altruism of programmers. This assumption is questionable in the sense that it close the door to study the opportunistic behaviour of open source contributors. It is inherently not compatible with the standard framework of economics research.

To fill this research gap, one objective of this research project is to rationalize and predict developers’ behaviour under various situations. Specifically, we are interested in modeling project contributors’ decision process and profiling the situation where project contributors are likely to stop developing a particular project and move to others.

Another challenge is to understand the competition between open source and commercial software. To address this question, we will build a stylized model to explore the underlying mechanism, evaluate the impact on social welfare and give business implications to firms that are willing to sponsor open source projects. Moreover, this research output can also act as a guidance to commercial firms about how to choose the best open source projects to support.

Planned Work

This project is planned for four stages:

Stage 1: Data Collection:(3-4 months)

At the initial stage, We will look through GitHub repositories and sampling open source projects. A network spider integrated with GitHub API will be programmed and installed in a 24/7 mini server to perform data collection task.

A research assistant will be hired at the end of auto-collection process to clean the data.

Stage 2: Contributor Classification:(2 months)

After we obtain the clean data, we will take a subset from the data as a training set for the machine learning algorithm. The training set will contain the contributors whose working status (paid/unpaid for contribution) is definitely known from other information channels. e.g. the paid contributors of Pyston, a project sponsored by Dropbox, are listed directly on Dropbox’s website.

A machine learning algorithm will be programmed and installed in the high-spec laptop. This algorithm will learn the events of paid/unpaid contributors and use these information to classify the rest contributors in the database.

Stage 3: Empirical and Theoretical analysis:(5 months)

The data we obtained from GitHub will be used to form a two-mode-network of projects and contributors. We can use this network to construct two different networks, i.e. the contributors’ network and the project network.

We can then conduct panel analysis on the project network to test our hypotheses on the dynamics of open source project evolution, and employ the method of Fershtman et al. (2011) to examine the spill-over effects among GitHub projects and contributors.

Theory development will parallel with empirical studies. We are going to build a model to explain contributors’ motivation and competition pattern respectively.our model in two papers respectively.

Research Output

An empirical paper on the dynamics of open source development pattern

To fully benefit from this data, there is a need to revisit the open source business pattern described in (Lerner et al. 2006). The analysis will use a machine learning algorithm and a subset of GitHub repositories as a training set to identify whether a particular contributor is paid by a commercial firm or contribute by instinct motivations. This method would give us much more precise information than identifying contributor by their email addresses. The central question that this paper addresses is: who plays a more important role in a growing open source project, firm sponsored contributor or programming habitant?

From the project’s perspective, this paper provides a start point to understand the motivation of contributors of open source projects and the business pattern of firms relying on open source projects.

The targeted journal of this paper is The Rand Journal of Economics.

A theoretical paper on developers’ motivation of open source development

After obtaining the empirical evidence from the first paper, We will explore the change of share of corporate contributions in large and growing projects. To do this, we will build a theoretical model to explain the motivation of open source project’s contributors. Based on a repeated game-theoretical framework, developers’ behaviour will be fully rationalized, and social welfare impact will be evaluated, together with projects lifespan predictions.

The targeted journal of this paper is International Journal of Industrial Organization.

A third paper on open source and proprietary software competition

Commercial companies may interact with an open source project in a number of ways. While improvements in the open source software are not appropriable, commercial companies can benefit if they also offer expertise in some proprietary segment of the market that is complementary to the open source program.

A common situation in the software industry is co-existence of open-source products and proprietary products, such as Android v.s. iOS. Google acquired Android for at $50 million and then put its code in the public domain. People may ask, what’s the driven force behind Google? This paper will explore the underlying rationale of Google’s decision — what are the incentives for commercial firms to share its code under open source license? We already have a prototype model to explain this phenomenon and it will benefit from the empirical support of this research project.

The targeted journal of this paper is Journal of Economics and Management Strategies.

Other directions:

Knowledge spillovers and license choices also play a crucial role in software development. There are two kinds of knowledge spillovers: project-based spillover and developer-based spillover. By exploring GitHub data and build two-mode-network, it might be the case that we can have other papers about technology spillovers and optimal licenses.

Potential Impact

Nowadays, open source have been an indispensable part of information technology, and many prominent IT firms, such as Google, Facebook and Amazon, invest substantial on supporting open source projects. Therefore, This paper not only targeted at academic audience, but also attempt to attract practitioners’ attention, especially for those firms who are active in sponsoring open source projects. Also, it helps individual developers and firms to choose the “right” open source projects to work on.

Future Grant Application

GitHub’s data set provides an excellent opportunity to explore the dynamics of open source projects, and it has strong potential for long-term research impact and external funding opportunities. The First Grant Venture is most likely to be used as a seeding fund, that is, the first stage of applying a larger fund. The number of projects hosted on GitHub is more than 10 millions and more than 4500 new repositories are added in a daily base. It is a treasure sleeping on the Internet, perfectly matching the current trend of ‘Big Data’. With more external funding support, GitHub data can be systematically collected, explored and analyzed. We can even get more deeper sight by exploiting GitHub’s private API.

Benefit of First Venture Fund Support

The current application taps into a truly novel idea with a huge potential, it might well serve as a starting point of a series of research papers and funding applications. Lack of financial support will result in delayed implementation of the project, and the danger that the idea will be exploited by someone else.

Remember: This is just a sample from a fellow student.

Your time is important. Let us write you an essay from scratch

100% plagiarism free

Sources and citations are provided

Cite this Essay

To export a reference to this article please select a referencing style below:

GradesFixer. (2019, March, 12) Open Source Projects: Studying The Theoretical And Empirical Evolution. Retrived September 23, 2019, from
"Open Source Projects: Studying The Theoretical And Empirical Evolution." GradesFixer, 12 Mar. 2019, Accessed 23 September 2019.
GradesFixer. 2019. Open Source Projects: Studying The Theoretical And Empirical Evolution., viewed 23 September 2019, <>
GradesFixer. Open Source Projects: Studying The Theoretical And Empirical Evolution. [Internet]. March 2019. [Accessed September 23, 2019]. Available from:

Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.

By clicking “Send”, you agree to our Terms of service and Privacy statement. We will occasionally send you account related emails.



Your essay sample has been sent.

Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.

thanks-icon Order now

Hi there!

Are you interested in getting a customized paper?

Check it out!
Having trouble finding the perfect essay? We’ve got you covered. Hire a writer uses cookies. By continuing we’ll assume you board with our cookie policy.