CSCE 436 Blog

Thursday, December 15, 2011

Blog #32: Taking advice from intelligent systems: the double-edged sword of explanations (Changed)

Paper Title: Taking advice from intelligent systems: the double-edged sword of explanations

Authors: Kate Ehlrich, Susanna Kirk, John Patterson, Jamie Rasmussen, Steven Ross and Daniel Gruen

Author Bios:

Kate Ehlrich: is a Senior Technical Staff Member in the Collaborative User Experience group at IBM Research where she uses Social Network Analysis (SNA) as a research and consulting tool to gain insights into patterns of collaboration. She has been active in several professional societies including the ACM SIG on Human Computer Interaction including founder of the Boston chapter and conference co-chair.

Susanna Kirk: did her masters in Human Factors in Information Design. Her coursework is in uder-centered design, prototyping, user research and advanced statistics

John Patterson: is a Distinguished Engineer (DE) in the Collaborative User Experience Research Group

Jamie Rasmussen: joined the Collaborative User Experience group in 2007 and is working with John Patterson as a part of a team that is exploring the notion of Collaborative Reasoning.

Steven Ross: is presently working in the area of Collaborative Reasoning using semantic technology to help individuals within an organization to think together more effectively and to enable them to discover and benefit from existing knowledge withing the organization in order to avoid duplication of effort

Daniel Gruen: is currently working on the Unified Activity Management project.

Presentation Venue: IUI '11 Proceedings of the 16th international conference on Intelligent user interfaces. ACM New York, NY, USA

Summary:

Hypothesis: The authors of this paper demonstrate the features of intelligent systems by addressing the role of explanations in the context of correct and incorrect recommendations, in a mission-critical setting. They describe their prototype that would give valid recommendations to users based on their study.

How the hypothesis was tested: The authors conducted a study with analysts who are engaged in real-time monitoring of cybersecurity incidents based on alerts that are generated from sets of events identified by intrusion detection systems (IDS), systematically categorizing and prioritizing threats.

Each analyst was tested individually in a two-hour session. Sessions began with an introduction to the study and 30 minute detailed training on the NUMBLE test console. During the training, participants had an opportunity to ask questions and they completed two hands-on examples. They were also explicitly informed that the system was a prototype and may not provide correct recommendations.

The research questions were tested using a balanced parametric design with two experimental variables: 1) Recommendation 2) Recommendation Correctness.

Results: The authors found out that the users performed marginally better when there was a correct recommendation compared to no correct recommendation. There was no overall statistical effect of recommendation nor was there an interaction between recommendation and recommendation correctness.

Under conditions when no correct choice was available, there was no difference between the Baseline condition and either Suggestions or Justifications.

Based on authors' analysis, there was a significant difference between the Baseline and Suggestions and between the Baseline and Justifications. They noticed that these results suggested that the users were indeed being influenced by the response consistent with one of the recommendations. Also, analysis of the data revealed a significant interaction between Recommendation and Group and a significant 3-way interaction. When no correct recommendation was present, the High group continued to respond with one of the recommendations, especially in the presence of Justifications, but the other groups did not.

Summary:

After reading this paper, I am definitely convinced that intelligent systems can be an important aid to user's decision-making. The study performed by the authors sought to examine the benefit of providing justifications as well as the possible negative consequences when explanations supported incorrect recommendations. Besides, their results suggested that users obtained some benefit of suggestions over the baseline condition, but there was no additional benefit for justifications over suggestions alone.

Although the prototype did produce errors of producing compelling explanations, the overall purpose of the prototype does not get defeated in the process. This paper would definitely be helpful in the future for developing a system similar to the one mentioned in here.

Wednesday, December 14, 2011

Blog #32: Taking advice from intelligent systems: the double-edged sword of explanations

Paper Title: Taking advice from intelligent systems: the double-edged sword of explanations

Authors: Kate Ehlrich, Susanna Kirk, John Patterson, Jamie Rasmussen, Steven Ross and Daniel Gruen

Author Bios:

Kate Ehlrich: is a Senior Technical Staff Member in the Collaborative User Experience group at IBM Research where she uses Social Network Analysis as a research and consulting tool to gain insights into patterns of collaboration in distributed teams.

Susanna Kirk: did her masters in Human Factors in Information Design. Her coursework is in uder-centered design, prototyping, user research and advanced statistics

John Patterson: is a Distinguished Engineer (DE) in the Collaborative User Experience Research Group

Jamie Rasmussen: joined the Collaborative User Experience group in 2007 and is working with John Patterson as a part of a team that is exploring the notion of Collaborative Reasoning.

Daniel Gruen: is currently working on the Unified Activity Management project.

Presentation Venue: IUI '11 Proceedings of the 16th international conference on Intelligent user interfaces. ACM New York, NY, USA

Summary:
Hypothesis: If the authors can investigate intelligent systems and its justifications, then maybe the accuracy of these systems will increase and users will not be "led astray" as much as they are being right now.

How the hypothesis was tested: The authors decided to conduct a study on the effects of a user's response to a recommendation made by an intelligent system as well as the correctness of the recommendation. In this case, it was conducted on analysts engaged in network monitoring. The authors used a software called NIMBLE to help collect data for this study.

Results: The users performed slightly better with a correct recommendation than without one. Results indicated that justifications grant benefits to users when a correct response is available. When there is no correct response available, neither suggestions nor justifications made a difference in performance. Most of the analysts seemed to discard the recommendations anyway.

In the separate study concerning analyzing user's reactions, it was found that users typically follow the recommendations given and that the influence between the recommendation and the user's action is high.

Discussion:
Effectiveness: Although this paper has great applications, the accuracy of their system is a little skeptical. This technology can be implemented to either extremely specific fields or extremely general ones, that way the recommendation would be either extremely accurate due to less training data or it would be extremely general (in which case it would be okay for people to ignore it sometimes).

Blog #31: Identifying emotional states using keystroke dynamics

Paper Title: Identifying emotional states using keystroke dynamics

Authors: Clayton Epp, Michael Lippold and Regan Mandryk

Author Bios:

Clayton Epp: is currently a software engineer for a private consulting company and holds a master's degree in CHI from the University of Saskatchewan

Michael Lippold: is currently a masters student at the University of Saskatchewan

Regan Mandryk: is an Assistant Professor in the Interaction Lab in the Department of Computer Science at the University of Saskatchewan

Presentation Venue: CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems that took place at New York (ACM)

Summary:
Hypothesis: To investigate the efficacy of keystroke dynamics for determining emotional state, the authors conducted a field study that gathered keystrokes as users performed their daily computer tasks. The authors found a solution of detecting user's emotional states through their typing rhythms on the common computer keyboard.

How the hypothesis was tested: The authors' methodology consisted of two primary components: data collection process and data processing. The collection process consisted of gathering and labeling users' keystroke data. The data processing consisted of extracting relevant keystroke features to build classifiers.
The authors chose an experience-sampling methodology for two reasons: 1) they were interested in emotional data gathered in the real-world, rather than induced in a laboratory setting through emotion-elicitation methods. Our results are intended for use in real-world system, and gathering the data for modeling from naturally occuring emotion increases out ecological validity.
The data collection software was written in C# and used a low-level windows hook to scan each keystroke as it was entered by the user. This program ran in the background, gathering keystrokes regardless of the application that was currently in focus.

Results: The researchers used undersampling on many of the models to help make the data more meaningful in terms of detectable levels of emotion. They found that two of their "tired" models performed most accurately with the most consistency, and that models utilizing the undersampling performed better overall.

Discussion:
Effectiveness: This paper was okay.The idea of being able to classify a use's emotion by keystrokes is not a very impressive one. However, I doubt the accuracy of such a system would be high because there are so many factors that go into classifying that. For e.g. I can type fast because I am angry, or just because i am drinking a Monster energy drink or may because i am in a hurry. This same rate of keystrokes could have 3 different kinds of emotions related to it. Looking at this, we can conclude that this system needs to have more features that can accurately determine the use emotions. Of course the system is effective when looked at the overall picture but due to the same reason it can have limited implementations and applications. Overall, the authors did achieve their goals!

Blog #30: Life "modes" in social media

Paper Title: Life "modes" in social media

Authors: Fatih Ozenc and Shelly Farnham

Author Bios:

Fatih Ozenc: is at Carnegie Mellon University and holds a PhD in Interaction Design

Shelly Farnham: is currently a researcher at Microsoft Research and holds a PhD from the University of Washington

Presentation Venue: CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems that took place at New York (ACM)

Summary:
Hypothesis: People organize their social worlds based on life "modes" and social sites have not sufficiently addressed how to help users improve their experiences in this area. Thus, if the authors can manage a user's social media interactions across different "modes" of their lives, the experiences and services of the social media interactions will be maximized

How the hypothesis was tested: Based on author's framework, prior work, and their review of existing technologies, they focused on four explanatory design themes for their study:

1) Organize by facets
2) Rituals for transitions
3) Focused sharing and consuming
4) Natural representations of facets
16 participants were recruited through an online screening questionnaire that asked about age, gender, Internet usage, identity faceting, sociability and work status. Participants were included if they were anywhere from 21 to 55, had intermediate or higher levels of internet usage, either worked full or part time or were students. The authors performed in-depth 2 hour interviews to explore how people naturally and mentally model different areas of their lives. They also explored how the people incorporate communication technologies to support them, and how we might improve their online experiences of managing their social media streams.

The authors then conducted a study on separate individuals scoring highly on extroversion and having multi-faceted lives so that their feedback would be effective toward creating "division" mechanisms. They incorporated diverse visual representations of people and groups into the study to help assess what are more natural ways to organize and visualize people.

Results: Majority of participants drew their life maps as social meme maps, while a few others focused more on a timeline style. The researchers found that participants chose communication channels based on closeness and different areas of their lives. Specifically, the closer they were to someone, the more they used a mix of multiple communication channels. Additionally, the amount of segmentation that participants wished to maintain between certain facets of their lives varied greatly with age, personality, and cultural differences.

Discussion:
Effectiveness: I really enjoyed the ideas and concepts of this paper. I believe most people will actually enjoy using this sort of division mechanism. However, there would be many people who would consider it a threat to their privacy due to various reasons. I enjoyed the new approach but I am not sure how effective it would be.

Blog #29: Usable gestures for blind people: understanding preference and performance

Paper Title: Usable gestures for blind people: understanding preference and performance

Authors: Shaun Kane, Jacob Wobbrock and Richard Ladner

Author Bios:

Shaun Kane: is currently an Assistant Professor at the University of Maryland and holds a PhD from the University of Washington

Jacob Wobbrock: is currently a Professor at the University of Washington and holds a PhD in Mathematics from the University of California, Berkely

Richard Ladner: is currently a Professor at the University of Washington and holds a PhD in Mathematics from the University of California, Berkeley

Presentation Venue: CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems that took place at New York (ACM)

Summary:
Hypothesis: Blind people have different needs and preferences for touch based gestures as compared to sighted people. This paper aims to explore exactly what these preferences may be.

How the hypothesis was tested: In the first study both blind and sighted people were asked to invent a few of their own gestures that might be used to interact and conduct standard tasks on a computing device. Because visual results of commands would not be visible to all participants, the experimenter read a description of the action and result of each command. Each participant invented two gestures for each command and then assessed them based on usability, appropriateness, etc.

The second study was more focused on determining whether blind people simply perform gestures differently or actually prefer to use different gestures. In this study all participants performed the same set of standardized gestures. The experimenter described the gesture and its intended purpose, and the participants tried to replicate it based on his instruction.

Results: In the first study the experimenters found that on average, a blind person's gesture contains more strokes than a sighted person's. Additionally, blind people were also slightly more likely to make use of the edge of the tablet when positioning their gestures, as well as being more likely to use mili-touch gestures.

In the second study, there was no significant measure of difference in easiness between blind and sighted people. It was noted that blind people tended to make significantly larger gestures than sighted people, although the aspect ratio appeared consistent between the two groups. Additionally, blind participants took about as twice as long to perform the gestures, and their lines were often more "wavy" than those of sighted participants.

Discussion:
Effectiveness: I found this paper was very interesting and that the authors did a great job in achieving their goals. They performed their tasks in a very thoughtful and an organized manner. I would like to see this technology into effect!

Blog #28: Experimental Analysis of Touch-Screen Gesture Designs in Mobile Environments

Paper Title: Experimental Analysis of Touch-Screen Gesture Designs in Mobile Environments

Authors: Andrew Bragdon, Eugene Nelson, Yang Li and Ken Hinckley

Author Bios:

Andrew Bragdon: is currently a PhD student at Brown Univeristy

Eugene Nelson: is currently a PhD student at Brown Univeristy

Yang Li: is a researcher at Google and holds a PhD from the Chinese Academy of Sciences

Ken Hinckley: is a Principal Researcher at Microsoft Research and has a PhD from the University of Virginia

Presentation Venue: CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems that took place at New York (ACM)

Summary:
Hypothesis: Bezel and marked-based gestures can offer faster, more accurate performance for mobile touch-screen interaction that is less demanding on user attention

How the hypothesis was tested: 15 participants performed a series of tasks designed to model varying levels of distraction and measure their interaction with the mobile device. They studied two major motor activities, sitting and walking, and paired them with three levels of distraction, ranging from no distraction at all to attention-saturating distraction. The participants were given a pre-study questionnaire and instruction on how to complete the tasks in addition to a demonstration.

Results: Bezel marks had the lowest mean completion time, though there was no significant performance difference between soft button's and bezel's paths, but there was a noticeable increase in mean completion time between bezel's paths and hard button paths. Bezel marks and soft buttons performed similarly in direct, and with various distraction types Bezel marks significantly and consistently outperformed soft buttons.

Discussion:
Effectiveness: The authors of this paper accomplished their goal of understanding how distractions can play a role in how users prefer to interact with their devices. I think they did a good job of covering all of the bases and exploring a wide avenue of possibilities.

Blog #27: Sensing Cognitive Multitasking for a Brain-Based Adaptive User Interface

Paper Title: Sensing Cognitive Multitasking for a Brain-Based Adaptive User Interface

Authors: Erin Solovey, Francine Lalooses, Krysta Chauncey, Douglas Weaver, Margarita Parasi, Matthias Scheutz, Angelo Sassaroli, Sergio Fantini, Paul Schermerhorn, Audrey Girouard and Robert Jacob

Author Bios:

Erin Solovey: is a postdoctoral fellow in the Humans and Automation Lab (HAL) at MIT

Francine Lalooses: is a PhD candidate at Tufts University and has a Bachelor's and Master's degree from Boston University

Krysta Chauncey: is a post doctorate researcher at Tufts University

Douglas Weaver: has a doctorate degree from Tufts University

Margarita Parasi: is working on a Master's degree at Tufts University

Angelo Sasaroli: is a research assistant professor at Tufts University and has a PhD from the University of Electro-Communication

Sergio Fantini: is a professor at Tufts University in the Biomedical Engineering Department

Paul Schermerhorn: is a post doctorate researcher at Tufts University and has studied at Indiana University

Audrey Girouard: is an assistant professor at The Queen's University and has a PhD from Tufts University

Robert Jacob: is a professor at Tufts University

Presentation Venue: CHI '11 Proceedings of the 2011 annual conference on Human factors in computing systems that took place at New York (ACM)

Summary:
Hypothesis: Cognitive miltitasking is a common element in daily life, and the researchers' human-robot system can be useful in recognizing these multitasking tasks and assisting with their execution. If the authors can create a system to detect a user's "to-do list" and allow them to multitask on different things at once, then those tasks will be completed faster., the user will become "understood" by the system, and a new kind of technology will be effectively used.

How the hypothesis was tested: The first experiment was designed to highlight three conditions: delay, dual-task and branching. The participants interacted with a simulation of a robot on Mars, sorting rocks. Based on the pattern/order of rock classification, measure data related to each of the three conditions listed above.

The second experiment was used to determine whether they could distinguish specific variations of the branching task. Branching was divided into two categories: Random branching and predictive branching. Also, the experiment followed the same basic procedure as the first experiment. However, there were only two experimental conditions

Results: The preliminary study returned a recognition accuracy of 68%. To the authors, this was promising. In the second study with the robot and the rocks, any result where the participant achieved less than a score of 70% were discarded because it was seen as the task being done incorrectly. In the last study, there was no significant statistical difference found between the random and predictive branching. The authors were able to construct a proof-of-concept model because they were able to differentiate between the three types of tasking and incorporate machine learning to it.

Discussion:
Effectiveness: I found these guys very bright. The technology they use is super advanced and the methodologies were complex when dealing with the proof-of-concept system. I don't know how much effect this will have in the HCI field because it didn't seem to me that there were any new inventions in this paper. The authors definitely achieved their goals.