Megan Conklin is presenting her paper “Project Entity Matching across FLOSS Repositories” at the 3rd International Conference on Open Source Systems, to be held in Limerick Ireland June 11-14. This long paper was refereed and accepted for the proceedings of this conference, which will be published by Springer-Verlag in 2007.
The paper presents a data mining algorithm designed to successfully match open source projects from multiple code repositories based on their project characteristics, while eliminating duplicates and minimizing false positives/false negatives.
This work is related to Conklin’s ongoing project called FLOSSmole, designed to collect, aggregate, and disseminate quantitative data about open source software development practices.
Megan is also co-chair of the 2nd Workshop on Public Data about Software Development, co-located with the IntOSS conference for the 2nd year. Last year’s workshop was very well attended and drew papers and lively discussions among participants from 5 continents and 12 countries.
While in Ireland, Megan will also participate in a debate panel about whether open source research data should be “free and open.”
(Ironically, she was assigned by a coin toss to argue the “con” side in this debate!)
Finally, Megan is presenting an interactive tutorial called “How to Collect FLOSS Metrics” with colleagues Jesus Gonzalez-Barahona and Gregorio Robles from Universidad Rey Juan Carlos in Spain. Their tutorial explains to researchers how to use the tools that Megan, Jesus, and Gregorio have already developed (such as FLOSSmole and CVSanalyze) for collecting and studying quantitative FLOSS data.