Previously, Charline Poirier provided an excellent post about how to recruit representative participants for usability testing. To continue the story, we are going to talk about the next stage: developing effective task sets, which is a crucial part of a test protocol.
We conduct usability testing iteratively and throughout the product life cycle. The testing interface could range from being as simple as paper images, to clickable prototypes, to a fully working system.
In order to assess the usability of an interface, we ask users to carry out a number of tasks using the interface. We use tasks that resemble those that users would perform in a real life context, so that the data we collect is accurate. In other words, the user behaviour we observed is representative, and the problems we found are those that users would be likely to encounter.
Design testing tasks – ‘a piece of cake’?
When I first learnt about usability testing, I thought: ‘It’s simple: you just need to write some tasks and ask people to solve them, and done!’ But after conducting my first ever usability testing, I realised this was not the case. I had so many questions: I wasn’t sure where to start or what tasks should be used, and there were numerous details that needed to be thought through. You need to carefully craft the tasks.
Now, having conducted hundreds of usability testings, I would like to share my experience with you about how to design effective tasks. There are three main stages involved:
Stage 1: Decide on the tasks
Before you sit down to compose a set of tasks, you are likely to go through the following stages:
‘Walkthrough’ with the design team: If testing an early prototype that has not been fully implemented, it’s important to go through the prototype with the designers so that you are aware of how it works, what is working and what is broken.
Inspection : go through the test interface at least three times. The first time to get an idea of the general flow and interaction of the interface; the second time to ‘put on the user’s hat’, and examine the interface by thinking about what users would do, and pay attention to any possible difficulties they may experience. This is the stage where you could start to write down some of the potential tasks you could use, which cover the features you need to assess, and the predicted problematic areas; and the third time, you should focus on developing tasks when you are going through the interface again. This gives you the opportunity to evaluate the tasks you identified, and add or remove tasks. By the end, you will have a number of potential task banks to work on.
Dumas and Fox (2008, p1131) provide a very good summary of the kind of tasks that are likely to be involved in usability testing. It is in line with those that we used in our testing sessions in most contexts. These include:
tasks that are important, such as frequently performed tasks or tasks that relate to important functions;
tasks where evaluators predict users will have difficulties;
tasks that enable a more thorough examination of the system, such as those that can only be accomplished by navigating to the bottom of the system hierarchy, or tasks that have multi-links or shortcuts;
tasks that influence business goals;
tasks that examine the re-designed areas;
tasks that relate to newly-added features.
For this step, you don’t need to worry about how to phrase the task descriptions, but make sure all areas that you need to investigate are covered by your tasks.
Stage 2: Formulate the tasks
How well the tasks are formulated determines the reliability and the validity of the usability testing and the usefulness of the data. It’s crucial to get this right. You should consider:
- The formats of tasks to be used
- The articulation of the tasks
The formats of tasks
The tasks could be categorised into two main formats:
You need to decide what should be used, and when.
Scenario task or Direct task
A scenario task is presented as a mini user story: often it has the character, the context and the necessary details for achieving the goal. For example, to test the browser and bottom menu on the phone:
You are holding a dinner party this Saturday. You want to find a chicken curry recipe from the BBC food site.
A direct task is purely instructional. For instance, to use the above example:
Find a chicken curry recipe from the BBC food site.
Among these two types, we often use the scenario tasks in the testing. This is because it emulates real-world context that participants can easily relate to, and consequently they are more likely to behave in a natural way. This helps to mitigate the artificiality of user testing to a great extent. The closer they are related to the reality, the more reliable the test results can be (eg. Rubin, 1994; Dumas and Fox, 2008). In addition, some research (eg. Shi, 2010) shows that the scenario tasks work more effectively with Asian participants.
Interesting research: for Indian participants, Apala Lahiri Chavan’s research (Schaffer, 2002) shows that using a ‘Bollywood’ style task would elicit more useful feedback. For example:
Your innocent and young sister is going to get married this Saturday, and you just get a news the prospective groom is already married! So you want to book a flight ticket as soon as possible to find your sister and save her.
The researchers found that Indian participants feel reluctant to make criticisms to the unfamiliar facilitator, but once they phrased the task in a film-like story, the participants became more talkative and open.
Closed task or Open-ended task
A closed task is specific to what the participants need to do. This type of task has one correct answer, and therefore allows us to measure if participants solved or failed a task. It is the most commonly used format. For example, to test the telephony on the phone:
You want to text your landlord to say you will give her the rent tomorrow. Her number is: 7921233290.
An open-ended task contains minimum information and less specific direction as to what you want a participant to do. It gives users more freedom to explore the system. This is particularly useful if you want to find out about what areas users would spontaneously interact with, or which ones matter most to them.
For example, in our Ubuntu.com testing, designers wanted to understand what information was important for users to get to know about Ubuntu. In this case, an open-ended task would be appropriate. I used the task:
You heard your friends mention something called ‘Ubuntu’. You are interested in it and want to find out more about what Ubuntu is and what it can offer you?
There are three main limitations of using open-ended tasks:
Since participants have control over the task, features that require user feedback might be missed; or vise versa, they may spend too much time on something that is not the focus of the testing. The remedy would be to prepare for a number of closed-tasks, so if certain features are not covered by the participants, these could be used.
Some participants may experience uncertainty as to where to look and when they have accomplished the task. Others may be more interested in getting the test done, and therefore do not put in as much effort as what they would in reality.
You cannot assign task success rates to open-ended tasks, as there is no correct answer, so it is not suitable if a performance comparison is needed.
The articulation of the tasks
Avoid task cues that would lead users to the answers. Make sure the tasks do not contain task solving related actions or terms that are used on the system. For example, in the Juju testing we wanted to know if participants understood the ‘browse’ link for browsing all the charms. We asked participants to find out the types of charms that are available instead of saying ‘you want to browse the charms’.
Be realistic and avoid ambiguity. The tasks should be those that would be carried out in the real context, and the descriptions should be unambiguous.
Ensure an appropriate level of details. It should contain just enough information so that participants understand what they are supposed to do, but not too much that they are restricted from exploring naturally in their own way. The description of context should not be too lengthy, otherwise participants may lose their focus or forget about it. When closed tasks are used, make sure they are specific enough, so it is clear to the participants as to when they would accomplish their goals. For example, compare the description of ‘You want to show your friends a picture’ to ‘You want to show your friends a picture of a cow’ – which one is better? For the former, the goal is more vague and participants would be likely to click on the first image or a random picture, and assume the task is done. As a result, we might miss usability problems. For the latter, the task communicates the requirements more effectively: it would be accomplished once they found the picture of a cow. Furthermore, it also provides us with more opportunities to assess the navigation and interaction further, as participants need to navigate among the pictures to find the relevant one.
Stage 3: Be tactful in presenting the order of the tasks
In general, the tasks are designed to be independent from each other for two reasons: to grant flexibility in terms of changing the orders of the tasks for different participants; and to allow participants to continue to the next task, even if they failed the previous one.
However, in some contexts, we use dependent tasks (proceeding on to one task depends on whether or not participants solved another task successfully ) on purpose, for instance:
When there is a logistic flow involved and the stages of procedures must be followed. To use a very simple example, in order to test account ‘log in’ and ‘out’, we need a task for ‘log in’ first, and then a task for ‘log out’.
When testing ‘revisiting’/’back’ navigation (eg. if participants could navigate back to a specific location they visited before) and multitasking concepts (eg. if participants know to use the multitasking facility). For example, when testing the tablet, I had the tasks as follows:
You want to write down a shopping list for all the ingredients you need for this recipe using an app
Here, the participants will need to find the note app and enter ingredients.
Then I had several tasks that were not related to the task above, for example:
You remember that you will have an important meeting with John this coming Thursday at 10:00 in your office. You want to put it on your calendar before you forget.
Then I instructed participants:
You want to continue with your shopping list by adding kitchen roll on it.
This requests the participants to go back to the note app that they opened earlier, from which we could find out if they knew to use the right edge swipe to get to the running apps – in other words, whether or not they understood the multitasking feature.
Now you will have your first version of tasks, and on completion, you should always try the tasks out by using the interface to check that they all make sense.
We use tasks to discover the usability and user experience of an interface. The task quality determines how useful and accurate your testing results would be. It requires time to hone your skills in writing tasks. Let me sum up some of the main points:
Define the goal(s) of the testing;
Familiarise yourself with the test interface and go through this interface at least 3 times;
Use the appropriate task formats and avoid any inclusion of task-solving cues;
Ensure the description is realistic, is at the right level of detail, and avoids ambiguity;
Consider the ordering of the tasks, and whether or not you need to use dependent tasks;
Pilot the task set with yourself.
What happens next, after you have the list of tasks ready for the the usability testing? It doesn’t end here.
If time allows, we always pilot the tasks with someone to make sure they are understandable, and that the orders of the tasks work. There are always changes you could make to improve the task sets.
In addition, you will realise that once you are in the actual testing, no matter how perfect the task sets are, you will need to react instantly and make adjustments in response to the dynamics of the testing environment: we cannot predict what participants will do. It is therefore important to know how to manipulate the task sets in the real testing condition. We will discuss this in the next post.
Dumas, J.S. & Loring, B.A. (2008). Moderating Usability Tests: Principles and Practices for Interacting. San Francisco, CA: Morgan Kaufmann.
Rubin, J. (1994). Handbook of Usability Testing: How to Plan, Design and Conduct Effective Tests. New York: John Wiley & Sons.
Schaffer, E. (2002) Bollywood technique, http://www.humanfactors.com/downloads/jun02.asp#bollywood
Shi, Q. (2010). An Empirical Study of Thinking Aloud Usability Testing from a Cultural Perspective. PhD thesis. Denmark: University of Copenhagen.