msr17 in 5 tweets

3 minute read

The Mining Software Repositories (MSR) conference is the real place to go, if you want to learn the new advances regarding msr techniques, datasets, and tools. As data science is becoming increasingly important at a steady pace, the same is true for the msr conference.

This year, msr17 had a beautiful program. I had been tweeting a lot during the two conference days. This blog post represents my personal and biased tentative of summarizing the whole conference in only five tweets. (If you haven’t been there and want to know more about what happened, just search for the #msr17 hashtag on Twitter)

To get started, there was a plenary session about what are the MSR research pillars. Among the topics, some attendees suggested Open Access is something that the msr community must adopt. To decrease the adoption barrier, one attended mentioned that Docker should be much more used, in particular, for increasing the reproducibility of msr studies. However..

That said, maybe one of the first takeaway of #msr17 is that we need to encourage msr researchers to adopt such kind of tools that can help other researchers to replicate msr studies. Docker might be an interesting option. Indeed, there was a #msr17 paper about docker. According to this study, docker takes only 2 minutes to build, docker size can be as small as 4mb, and docker files have on average 3 revisions per year. That is, docker images are small, fast, and do not require much maintenance effort. It might not be that hard to convince your peers and students to use docker, right?

Another hot topic discussed at #msr17 was Continuous Integration (CI). Indeed, there was not only an entire session about CI, but also the mining track this year was all about CI. Interestingly, in 2016, there was no papers about CI at the msr conference. In 2017, #msr17 had 17 papers about CI (14 at the mining track, and 3 at the research track). One explanation is that high-quality, open-source CI servers became popular roughly in the last years. Other than that, most of the CI data was hidden in software companies databases – which is a challenge itself for a community that praises Open Access. Anyway, needless to say that, if you like CI, #msr17 was a real fun. Are you a Ph.D. student looking for a research topic? Maybe CI can be a good start point. However, the best yet most unexpected thing is that my co-authors and I won the Best Mining Challenge Paper Award! How cool is that?

The mining track was only possible due to the efforts from TU Delft folks, who created and maintained a high-quality CI dataset. Similarly, due to his seminal contributions to high-quality datasets (in particular to the Promisse repository), Tim Menzies won the Foundational Contribution Award. During his keynote talk, prof. Menzies mentioned that “There is not enough science in data science”, that is, there is too much knowledge waiting to be discovered in all this myriad of data. If that is the case, maybe the question is: where to start? prof. Menzies gave some insights on how to find important research problems:

That is, figure out what is the strangest thing about your research and go mining!

My last tweet was actually a farewell one. While #msr17 was a pleasant experience, I’m already looking forward to #msr18. #msr18 will take place at Gothenburg, Sweden, and Yasutaka Kamei is the General chair. I briefly talked with him, and he is very excited about organizing #msr18. I believe we can expect another awesome msr! See you there?