User Experience for Background Processes


Done well, background processes are a fantastic way to save time, and more importantly, save focus. Handing off monotonous chores so you can do something more challenging can be a true relief. Unfortunately, background processing isn’t always implemented well. When it’s done poorly, it can undermine trust in the system.

A lack of trust can have negative effects beyond just making people miserable. It can discourage exploration and discovery. It can force painful and inefficient workarounds. It can also cause data to be duplicated, forgotten, corrupted, or lost entirely.

In the age of autonomous agents this problem is bound to get worse, so let’s talk about how to do it better.

Quick version

An easy way to test your background interactions is to imagine that you are the background process, and your significant other asked you to do the task. Assuming you want to please your significant other, what would be the best service you could provide?

If my wife asked me to buy milk at the store, I would buy the milk, and put it in the fridge. I wouldn’t interrupt her in the middle of something important to tell her the milk is in the fridge. I’d just put it there, and she would expect it to be there later.

If the store was out of milk, however, I’d probably let her know. Similarly, I wouldn’t call her at work to tell her about it on the spot. I’d save it until the next convenient opportunity to bring it up.

If there was some problem selecting the milk, like I wasn’t sure what kind she wanted, or they were out of the kind she wanted, I might call her. I might just make a choice. I might skip it and wait until the next time I’m at the store too. This will depend on the urgency of the request, and the cost of getting it wrong. Buying the wrong kind of milk is a bit wasteful, but if we were talking about ordering plane tickets, buying tickets to the wrong city would be a serious problem.

To bring this back to a software example, take a look at email clients. Email is asynchronous by nature, and a lot of effort has gone into refining the experience over the decades.

Aside: You may not think email applies to your situation, but according to Zawinski’s Law, you’ll be adding email to your application eventually.

If I push send on a new email, it’s nice to get some immediate acknowledgement that the email will be sent. If everything goes normally, this is enough.

If the email system can’t deliver the email, it will usually give some kind of feedback. It usually isn’t an in-your-face warning. A little red dot on the outbox is probably enough. Some indication of what it means if a user hovers or clicks on it helps with discoverability. If the client can eventually send the email, the dot can silently disappear. A very busy person might never notice, and that’s okay.

If something was wrong with the email such that it could never be delivered (ex: a non-existent email address), something a little harder to ignore makes sense. An in-your-face error message is still not great. A lot of email services send an email back to you explaining the situation. This is great because you’re (likely) already checking your emails, and you can choose when you deal with it.

There are numerous failure modes in email, and if you haven’t worked on email processing yourself, you probably aren’t aware of most of how fraught a system it really is. This is a sign of a mature user experience. You haven’t hit many edges because you’ve been steered away from them subtly. When problems do occur, though, the feedback feels natural and proportionate.

Long Version

I now submit to you: my attempt at a list of suggestions to consider when designing background processes.

1) Be Transparent

Background process can be awfully opaque, even in some well established applications. When a user is requesting some work, it’s important that they understand what they are doing. If they mess something up because the system did something they didn’t anticipate, they will lose trust in the system.

1.1) It should be obvious when starting an asynchronous task

Users should have some indication that the thing they’re requesting will not be instantaneous. If it will be a long time before it gets done, this should be obvious too. There are a lot of great, subtle ways to hint at this. This should be enough for most cases.

1.2) The user should know what they’re going to get, and/or it should be easy to undo

If a button in an application is a “heavy” button, meaning that it does a bunch of stuff, or can’t be easily reversed, I naturally feel nervous. Messing things up for myself is annoying, but messing them up for my team can be a lot more scary. This is especially true when I’m not familiar with the system I’m using. For any changes that aren’t obvious, it’s helpful to get a breakdown of exactly what will happen.

Alternatively, if the operation is quick and easily reversible, that might be good enough. It’s sometimes a lot easier to show the result than to describe it. However, the longer the process takes, the harder it will be for the user to notice what changed. Try-and-undo works best for changes that are easily discoverable.

2) Be Dependable

A big part of handing work over to a computer is so that you don’t have to keep track of it any more. This kind of delegation can be very freeing, but it only works if the application delivers reliable results.

2.1) Don’t drop the ball

When a user requests something, make sure you do it, or if you can’t, make sure you tell them that you couldn’t do it. This could take a lot of different forms, and some tuning to get it right, but it’s an essential part of being useful.

2.2) Fail quickly when possible

If something is requested that will obviously never work, you should try to let the user know as soon as possible. Even better if you can notify them before accepting the task. If a user is notified right away, they’ll still have the context in their head to fix it. Or if they are delegating a more important task, they will know they might need to change their plans. A great example of this is when sending an email with no recipients.

2.3) Let users check-up on long-running tasks

Allowing users to check on the things they requested can also improve trust. This is especially important for tasks that can take a long time, or are of critical importance. Sometimes people want to check up on what’s going on, or check to see if something is broken.

2.3.1) Finding completed tasks

Similarly, being able to find work that has completed successfully can also be helpful. The sent folder in most email clients is a great example of this. If there is ever a question about something being done, looking it up is awfully convenient.

2.3.2) Allow cancelling work that hasn’t completed

If you do have a way to see queued or incomplete work, and it’s possible, allowing someone to cancel it is also nice. If they realize they forgot something, or made a mistake, being able to stop it and fix it early allows them to do better work.

This assumes that cancelling a task is possible. If it’s not possible, then obviously you shouldn’t make that option available.

3) Handle Issues with Civility

Error handling is always tricky in software design. It’s especially tricky with processes that don’t have constant attention from the user. Getting this right is another essential part of maintaining trust.

3.1) The intrusiveness of a notification should be proportional to its urgency

If a user has had enough time to move on to another task, you should only interrupt them for feedback if it’s likely they’ll want to drop what they’re doing to deal with it. Email is an example where it can probably wait. If someone is hailing a ride, and something has changed the driver’s arrival time significantly, you should probably let them know urgently, and give them some choices about what happens next (such as cancelling or rescheduling).

3.2) Make the current state of things clear

When a user is notified of a problem, they’re going to have to decide what to do about it. If work was half-done, things were partially sent, or something needs to be manually cleaned up, they should know this. Expecting them to know, or to hunt around and figure it out themselves isn’t very generous. If some kind of retry or automatic correction is going to happen asynchronously, it’s essential that this be communicated as well.

Making atomic changes is nice when it’s possible, because not doing anything is usually better than doing something half-way. Of course, it’s not always feasible. If you do have this behaviour, make sure your users know about it. It will give them more confidence with the system.

Alternatively, it can be helpful to give the user an easy way to find the affected data / records. If they have to do manual cleanup, this will at least help them find it all.

3.3) Let the user manage failure notifications

Providing asynchronous failure notifications with some bit of visual feedback can be a great way to go for low-urgency issues. When you do, however, make sure that there is a way for the user to clear the notification. If you want a user to respond when they see the red dot, make sure you take it away when there is nothing to do. Otherwise, if they get accustomed to seeing the dot when there is nothing to fix, they may start to ignore it when it there really is something to do.

Similarly, for tasks that can be important but may not necessarily be urgent, once they’ve been notified of a failure, they may want to keep it in a list somewhere so they are reminded to deal with it later.

Email is a great example of this done well. When a message can’t be delivered, popping it into their inbox lets them keep it as long as they want. They can delete it if they respond to it right away, forward it to someone if they need to ask for help, or just leave it there forever.

4) Be Considerate of Multi-user Impacts

It’s always important to consider the impacts of actions in multi-user systems, but this is especially true with background processes.

If it’s normal for tasks to run for a very long time (hours, days, etc), you might need a system that multiple users can use to monitor the work and respond to any failures. You don’t want urgent issues to get ignored when someone has taken a lunch break or a vacation. If immediate action isn’t necessary, you could also have a staged notification: only notify the broader team if the person who started the job hasn’t dealt with it in some amount of time. This can get pretty complicated though, so use it carefully, and test it a lot.

If you do allow multiple users to respond to a failure, you might also need some way for them to coordinate who is responding to the issue.

When tasks are very expensive to run, or there are scarce resources to run them, you may also need some sort of prioritization or rate limiting system. However, these kinds of mechanisms can cause unpleasant side effects when systems are run beyond their capacity, so consider them carefully before adding them in.

5) Be Autonomous with guard rails

A big benefit of running work asynchronously is that it frees up the user to do other things. The more you require interaction to get the task done, the less valuable this becomes. That being said, doing something undesirable in failure scenarios is often worse than doing nothing. So it’s important to only implement autonomous error handling when the desired outcome is certain.

5.1) Make the obvious correction when possible

Sometimes the way to resolve an issue is obvious. An easy example is when someone is trying to delete something that no longer exists. Maybe you treat that as a success, or a success with a warning. The user requested that the thing no longer exist, and now it doesn’t.

This assumes, of course, that there isn’t some other implication to the delete being already requested by someone else. An inventory management system is an example where it might be a more serious problem.

5.2) Allow the user to override bad judgements

If you do make corrections on behalf of the user that could be questionable, make sure there is a way for them to find them and fix them if they’re wrong. Replication conflicts is a common example of this. It’s great when you can auto-merge, but sometimes it’s wrong, and if there is no way for the user to find the conflicting changes, it may be hard for them to make it correct.

5.3) Guard against infinite loops

Automatic retry is a great solution to some kinds of problems, but you should always make sure there is some finite limit to how much you’ll retry. Just like queue base systems need dead-letter queues, background tasks need some way to say that it’s unlikely that a task will ever complete if the user doesn’t intervene.

Also make sure that you don’t retry immediately over and over again. If some system down the line is having a temporary hiccup, this just increases the stress on it when it may already be overwhelmed. Incremental back-off and circuit-breaker mechanisms can make a components drastically better citizens of their systems.

Conclusion

Background processes may be mostly invisible, but they become quite obvious when they behave poorly. Design your background interactions mindfully, because delighting users is what we should all be striving to do.

See also