One of the core lessons over decades of HCI research and user experience design is that even with the most experienced designers and meticulous processes, the rich complexity of real people means we do not get things right first time. This chapter looks at the evaluation of potential designs and deployable systems, including the techniques that can be applied and the different purposes for which it is used. In some cases this is ‘summative’, an acceptance test before deploying or delivering a system to a client. More often the purpose is ‘formative’, finding potential problems in order to make things better. Testing with real users is usually the gold standard, but is not always possible or sensible. We will discuss options for testing with real users including more controlled experiments ‘in the lab’ and more realistic evaluation ‘in the wild’. We will also look at tools and techniques for expert evaluation, including heuristics and walkthrough methods and also semi-automated evaluation especially for accessibility. Evaluation does not stop when a system is deployed. Logs of real usage can be analysed to tune systems or prompt major cycles of re-design, and minor variants may be deployed simultaneously and compared using A/B testing.
Contents
- Role of evaluation
- End-user testing
- Lab vs in the wild
- Control in the wild
- Ecological validity in the lab
- Novel technology, lash-ups and Wizard of Oz
- Online evaluation
- Friends and fun
- Measuring and recording — quantitative vs qualitative
- The metric is the goal
- Quantitative methods
- Qualitative methods
- Mixed methods — strength through diversity
- Evaluation without users
- Existing knowledge
- Expert evaluation
- Automated tools
- Long-term evaluation
- Post-deployment evaluation
- Chapter Keypoints
- Additional reading
Glossary items referenced in this chapter
actor-network theory, agile software development, AI-based systems, AI-based tools, alt text, attention, human, augmented reality, Awen Institute, coding in inductive methods, cognitive explanations, colour blindness tools, confabulation, constructive learning theory, control, experimental, convenience sample, cussed participants, dialectic (re)coding, direct observation, eating your own dogfood, end-to-end measures, episodic interactions, exploratory evaluation, formative evaluation, GPS, grounded theory, halo-effect, iterative development, lash-up technology, long-term benefits, long-term tests, mechanism, observable behaviour, online evaluation, online participant platforms, Perceptual Experience Laboratory, physiological measures, Post-it notes, post-task interview, post-task reflection, post-test questionnaire, prototype, questionnaires, reaction times, Sea Hero Quest, simulated user, summative evaluation, surveys, think aloud, triangulation, usability lab, user satisfaction, user testing, validation, video annotations, virtual reality, VR cave, W3C accessibility guidelines, within-subjects, Xerox Star