Wednesday, December 14, 2011

Blog #21: Human Model Evaluation in Interactive Supervised Learning


Paper Title: Human Model Evaluation in Interactive Supervised Learning


Authors: Rebecca Fiebrink, Perry Cook and Daniel Trueman


Author Bios:Rebecca Fiebrink: joined Princeton as an assistant professor in Computer Science and affiliated faculty in Music. She recently ompletedhr PhD in Computer Science at Princeton, and she spend January through August 2011 as a postdoc at the University of Washington. She works at the intersection of human-computer interaction, applied machine learning and music composition and performance.


Perry Cook: He researches but does not teach at Princeton University in the department of Computer Science and Department of Music.


Dan Trueman: Professor in the department of music at Princeton University




Presentation Venue: CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems that took place at New York (ACM)


Summary:
Hypothesis: If the user can be allowed to iteratively update the current state of a working machine learning model, then the results and actions taken from that model will be improved (in terms of quality).
How the hypothesis was tested: The authors conducted three studies of people applying supervised learning to their work in computer music. Study "A" was a user-centered design process, study "B" was an observatory study in which students were using the Wekinator in an assignment focussed on supervised learning, and study "C" was a case study with a professional composer to build a gesture-recognition system.
Results: From the studies, the authors gathered results and analyzed all of the results. From study "A", the authors saw that the participants iteratively re-trained the models by editing the training dataset. For study "B", the students re-trained the algorithm an average of 4.1 times per task, and the professional from study "C" re-trained it an average of 3.7 times per task. Cross-validation was used an average of 1 and 1.8 times per task, respectively. Direct evaluation was also present in the evaluation metric of the system. This was used more frequently than cross-validation. Participants in "A" strictly ued this measure, while the students and the professional in studies "B" and "C" used direct evaluation on an average of 4.8 to 5.4 times per task, respectively. Using cross-validation and direct evaluation, users were able to receive feedback on how their actions affected the outcomes. The overall results were that users were able to fully understand and use the system effectively. The wekinator allowed users to create more expressive/intelligent/quality models than with other methods/techniques.


Discussion:
Effectiveness: This paper shares a really good idea to have a system where you can tell the machine what is "good" or "bad" before it gives the final result. Being able to re-train the algorithm is very effective. The cost-benefit for it seems reasonable, so its easy to see this idea becoming more widespread before long. The authors achieved their goals and proved that their hypothesis was true.
Faults: I did not really find any faults with the system.

No comments:

Post a Comment