Chapter 4: Evaluation

One of the core lessons over decades of HCI research and user experience design is that even with the most experienced designers and meticulous processes, the rich complexity of real people means we do not get things right first time. This chapter looks at the evaluation of potential designs and deployable systems, including the techniques that can be applied and the different purposes for which it is used. In some cases this is ‘summative’, an acceptance test before deploying or delivering a system to a client. More often the purpose is ‘formative’, finding potential problems in order to make things better. Testing with real users is usually the gold standard, but is not always possible or sensible. We will discuss options for testing with real users including more controlled experiments ‘in the lab’ and more realistic evaluation ‘in the wild’. We will also look at tools and techniques for expert evaluation, including heuristics and walkthrough methods and also semi-automated evaluation especially for accessibility. Evaluation does not stop when a system is deployed. Logs of real usage can be analysed to tune systems or prompt major cycles of re-design, and minor variants may be deployed simultaneously and compared using A/B testing.

Contents

Role of evaluation
End-user testing
Lab vs in the wild
Control in the wild
Ecological validity in the lab
Novel technology, lash-ups and Wizard of Oz
Online evaluation
Friends and fun
Measuring and recording — quantitative vs qualitative
The metric is the goal
Quantitative methods
Qualitative methods
Mixed methods — strength through diversity
Evaluation without users
Existing knowledge
Expert evaluation
Automated tools
Long-term evaluation
Post-deployment evaluation
Chapter Keypoints
Additional reading

Glossary items referenced in this chapter