Paper Title: Taking advice from intelligent systems: the double-edged sword of explanations
Authors: Kate Ehlrich, Susanna Kirk, John Patterson, Jamie Rasmussen, Steven Ross and Daniel Gruen
Author Bios:
Kate Ehlrich: is a Senior Technical Staff Member in the Collaborative User Experience group at IBM Research where she uses Social Network Analysis (SNA) as a research and consulting tool to gain insights into patterns of collaboration. She has been active in several professional societies including the ACM SIG on Human Computer Interaction including founder of the Boston chapter and conference co-chair.
Susanna Kirk: did her masters in Human Factors in Information Design. Her coursework is in uder-centered design, prototyping, user research and advanced statistics
John Patterson: is a Distinguished Engineer (DE) in the Collaborative User Experience Research Group
Jamie Rasmussen: joined the Collaborative User Experience group in 2007 and is working with John Patterson as a part of a team that is exploring the notion of Collaborative Reasoning.
Steven Ross: is presently working in the area of Collaborative Reasoning using semantic technology to help individuals within an organization to think together more effectively and to enable them to discover and benefit from existing knowledge withing the organization in order to avoid duplication of effort
Daniel Gruen: is currently working on the Unified Activity Management project.
Presentation Venue: IUI '11 Proceedings of the 16th international conference on Intelligent user interfaces. ACM New York, NY, USA
Summary:
Hypothesis: The authors of this paper demonstrate the features of intelligent systems by addressing the role of explanations in the context of correct and incorrect recommendations, in a mission-critical setting. They describe their prototype that would give valid recommendations to users based on their study.
How the hypothesis was tested: The authors conducted a study with analysts who are engaged in real-time monitoring of cybersecurity incidents based on alerts that are generated from sets of events identified by intrusion detection systems (IDS), systematically categorizing and prioritizing threats.
Each analyst was tested individually in a two-hour session. Sessions began with an introduction to the study and 30 minute detailed training on the NUMBLE test console. During the training, participants had an opportunity to ask questions and they completed two hands-on examples. They were also explicitly informed that the system was a prototype and may not provide correct recommendations.
The research questions were tested using a balanced parametric design with two experimental variables: 1) Recommendation 2) Recommendation Correctness.
Results: The authors found out that the users performed marginally better when there was a correct recommendation compared to no correct recommendation. There was no overall statistical effect of recommendation nor was there an interaction between recommendation and recommendation correctness.
Under conditions when no correct choice was available, there was no difference between the Baseline condition and either Suggestions or Justifications.
Based on authors' analysis, there was a significant difference between the Baseline and Suggestions and between the Baseline and Justifications. They noticed that these results suggested that the users were indeed being influenced by the response consistent with one of the recommendations. Also, analysis of the data revealed a significant interaction between Recommendation and Group and a significant 3-way interaction. When no correct recommendation was present, the High group continued to respond with one of the recommendations, especially in the presence of Justifications, but the other groups did not.
Summary:
After reading this paper, I am definitely convinced that intelligent systems can be an important aid to user's decision-making. The study performed by the authors sought to examine the benefit of providing justifications as well as the possible negative consequences when explanations supported incorrect recommendations. Besides, their results suggested that users obtained some benefit of suggestions over the baseline condition, but there was no additional benefit for justifications over suggestions alone.
Although the prototype did produce errors of producing compelling explanations, the overall purpose of the prototype does not get defeated in the process. This paper would definitely be helpful in the future for developing a system similar to the one mentioned in here.