# SANER 2016
This document presents the steps needed in order to reproduce the data used in our SANER'16 study. Read on for the guidelines.
## Github Data
Here we describe how we downloaded and analyzed Github data.
### Finding the projecs
Using [GithubArchive](http://githubarchive.org/), we used the following query in order to select our initial set of projects:
a.repository_name as name,
a.repository_owner as owner,
a.repository_organization as organization,
a.repository_watchers AS stars,
a.repository_forks AS forks,
a.repository_language as language
from [githubarchive:github.timeline] a
where a.repository_language = "Haskell"
) as a
SELECT MAX(created_at) as max_created, repository_name
GROUP EACH BY repository_name
) as b
b.max_created = a.created_at and
b.repository_name = a.repository_name
ORDER BY stars desc
For each programming language analyzed, we changed the ```where``` clause, where it appears ```Haskell```, for the programming language we intent to analyze. We ran the query and the result we stored in a CSV file. Create a single CSV for each programming language. [Here](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/projects_downloaded.csv) is the full list of projects downloaded.
### Downloading the projecs
After saving all required CSVs, run the program [cloner.py](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/cloner.py) in the same directory that you saved the CSV files.
This script will try to clone each project in the current directory.
After downloading all required projects, it is time to analyze them. Use script [contributions.sh](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/contributions.sh) to generate inidividual casual contributions.
Use script [casual_contributors.py](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/casual_contributors.py) to parse this data in a CSV fashion.
Use script [loc.sh](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/loc.sh) to generate lines of code data for each project. This script requires that the program ```cloc``` is installed.
Use script [difference_contributors.py](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/difference_contributors.py) to generate data regarding the number of projects and their number of contributors.
We conducted two surveys: one with the casual contributors, and the other one with the project maintainers.
### Casual contributors questionnaire
Link to [questionnaire](http://goo.gl/qjmUl3) ([responses](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/survey/casual_contributors.csv))
### Project maintainers questionnaire
Link to [questionnaire](http://goo.gl/Isgrcf) ([responses](https://github.com/gustavopinto/casual-contributors/blob/gh-pages/survey/owners.csv))