Touch-based Game Interaction Design Considerations

A few days ago, Bennett Foddy of QWOP fame asked me what my favorite iPad games were. I unsarcastically replied with Chicanery (a game he made with Auntie Pixelante), before stammering about non-game ‘toys’, ‘art games’ that barely qualify as games, and simple iPhone ports. There are good games on the iPad, yet very few of them are “iPad games”, that is, games that could only exist, or exist best, on the iPad. What defines the iPad is its large, multi-touch screen. The important characteristics are ‘touch’, ‘multi-’, and ‘large’, as each modifier expands the range of interactions considerably.

* From here forward, feel free to replace iPad with the large-screen multi-touch device of your choosing.

Note: I am going to ignore mice in this article. They’re kind of a weird hybrid abstracted-touch technology. Earlier drafts dealt with mice, and it made things unnecessarily complicated and, let’s face it, games that utilize the mouse usually have specific reasons for doing so. That is, people don’t haphazardly design mouse games the way many people haphazardly design touch games.

Let’s start with the basics. Games have traditionally been played with clearly abstracted interfaces utilizing joysticks, buttons, and d-pads. I call these interfaces abstracted because the joystick is not your avatar* and the act of pushing a button is, in most contexts outside of games, not associated with a person making a physical action. In some cases, every input feels the same on a physical phenomenological level. Taking Mario as an example, running, jumping and shooting a fireball all feel the same: the press of a button. Which also feels the same as firing a gun (or hiding) in Metal Gear Solid, or rotating a piece in Tetris, or accelerating a car in a racing game. Other games adopt control schemes designed around complex multi-button simultaneous input. Fighting games are perhaps the most ubiquitous games with complex control schemes – the percussive button presses mixed with the fluid directional input cannot be found elsewhere. Other examples of expressive multi-button input include the rhythmic double-tap of a button for a double-jump in Metrovania type games, the physical sensation of holding z before tapping jump in Super Mario 64, and experimental games like QWOP and GIRP. Buttons are by no means outmoded; they represent a form of input that is more abstracted than touch input.

* Starting a timer until someone makes a meta-game where the avatar is a joystick.

The introduction of the touch screen is dramatic. It is now a familiar technology, having first appeared in a mass-market consumer entertainment device in 2004 with the launch of the Nintendo DS (the Palm obviously pre-dates this yet I found few examples of Palm applications utilizing the stylus for things other than pointing or writing). I would argue that the most important change that touch interaction brings is phenomenological. By this, I highlight that the screen can be touched in a plethora of ways which enables concepts like ‘direct input’ and creates immediate physical metaphors for interacting with the game world. Using The World Ends With You as an example, the act of casting a healing spell is performed by tapping, the act of slashing an enemy is performed by scribbling the stylus back and forth across the display, and a spell’s area of effect can be literally drawn with a circle. Each of these acts feels different from each other, and feels different from, say, creating ramps by drawing smooth calligraphic movements in Kirby’s Canvas curse, or investigating a dark room with a flashlight in Phoenix Wright by dragging. (Note the dual verbs per example: the game verb and the gesture verb.) These gestures are natural and their meaning and effect are extended in the game world.

That there is a joy in the tactility of direct input is without question – look at the legions of Angry Birds fans enjoying the simple physicality of slingshotting objects and watching the havoc it creates. That’s all there is to the game. The level design is generally not excellent, but that’s irrelevant; the game succeeds best as a toy, almost like firing a virtual Nerf gun at building blocks. Yet it’s crucial that slingshotting the bird is performed by touching the screen and dragging the finger (and the bird) backwards then releasing. Angry Birds on consoles, controlled with joysticks and buttons, has been, well, less than a hit, and I don’t believe it’s due to demographics differences or market saturation. The natural quality of touch interactions is also crucial to accessibility: Angry Birds took off with ‘non-gamers’ because it could be picked up and played, without explanations or a heavy GUI or tutorial explaining the nuances of dragging birds around.

It’s worth noting that the touch screen means that the user’s access time for interactive elements in the game world conform’s to Fitts’s Law. This equation governs the time it takes a user to point to a target and it is primarily a logarithmic function of target distance and target size. The equation does not hockey stick with the addition of new targets (at least not until there are so many targets on screen that the targets shrink, in which case, potential ouch). Compare this to button-based navigation of targets. If the number of targets maps neatly to the number of buttons such that each button acts as an index to a target, targets can be accessed in a small constant time. Think about a large number of targets (letters) paired with sufficient buttons (your keyboard), or a small number of targets (4 weapons) paired with a d-pad (four directions) as in Gears of War. Fast.

But when few keys are used to navigate large arrays – consider navigating inventory in a console JRPG  –  it’s a slog. Forget navigating the list of inventory items in real-time. In short, with key-driven interfaces to lists of objects, each additional object results in additional button-presses and extra time to access a given object. An action-puzzler like the iOS game Cut The Rope is possible only on a touch interface. It is a physics-driven game involving a piece of candy, a goal, obstacles, interactive elements such as whoopie cushions, plus ropes to connect all of these objects. In a typical challenge, the player will cut the rope that is holding the candy and quickly tap other objects to redirect the candy around obstacles into the goal. Touching these objects in time poses little problem since the amount of time to move the finger across an iPhone screen can be measured in fractions of a second. With half a dozen or more manipulable objects on screen, keying through each one with a button would be time-consuming and imprecise. Other examples of games which benefit from touch driven input include Lemmings* and World of Goo.

* Summary: Fitts’s Law governs the time that it takes a user to point to a target. It is a function of distance and target size.
* I acknowledge that Lemmings has had console ports utilizing a d-pad controllable cursor but I challenge you to look me in the eyes and tell me that they’re not a pain in the ass.

Multi-touch displays bring the ability to, yes, track multiple touches. This is not a sea change like the introduction of the touch screen, yet its ubiquity with the success of the iPhone and Android smartphones makes it important to consider. I have isolated two points: an increased capacity for modality and an expanded vocabulary of natural gestures. First, modality. In short: the number of active finger-presses determines the mode of the interface which changes the meaning of the given gesture. An example would be using one finger drags to move game pieces and two-finger drags to scroll the visible game world in the viewport. I’m not yet certain of the importance of the expansion of natural gestures. Single-touch devices can use the natural gestures of the tap, the press (and hold), and the drag. Multi-touch devices introduce two-finger gestures for expansion, contraction, and rotation. If I’m missing any crucial natural gestures, please let me know.

Despite the brevity of the previous paragraph on multi-touch smartphone screens, I believe that simply enlarging the screen brings a host of new potential interactions. The first is that multi-tasking (using multiple touches to track multiple objects) via bimanual input are possible. Though this is technically possible on small multi-touch screens, I assert that bimanual input is inconvenient on a phone-sized device since two simultaneous hands can easily occlude the entire screen. I have not seen many games make explicit use of the multi-tasking capability, but I will note that it is possible to get furiously fast times in Cut The Rope by using both hands, which halves the average distance from any hand to a target. Large screens are easy to share which encourages emergent multiplayer activities. Emergent multiplayer is the event where a single player game is seamlessly played by multiple players by either dividing screen segments or aspects of control. (Note that none of that is encouraged, enforced, or limited by the rules of the game; it is the players who are inventing these interaction patterns*.) Games like Flight Control are great for this, where multiple objects need to be sorted simultaneously. Players can easily agree to sort the objects on a given side of the screen. Creative toy-like applications like a pottery simulation are also good for sharing, as players can take turns shaping the clay before it is fired, and verbally contribute even while not physically interfacing with the game. Same for puzzle games, where the level of information input to the game is low in relation to the level of information output from the game – that is, two people can easily share a puzzle game with only one person in control, since most of the playing of the game – solving puzzles – is performed outside of the game, in the player’s brains, thus inter-player speech is a perfect interface.

* DEATH OF THE AUTHOR

Explicit multiplayer on a shared large multi-touch device remains a largely unplumbed avenue. This play configuration results in much shared information between players, as well as shared physical space. It is the latter aspect that results in players both touching and being touched (by other players). Sharing a device with another person has a unique immediacy. Bodily presence, mood, posture, and more are all apparent to the other players, contributing to a shared emotional state. Because all the control occurs on the screen itself, players also have absolute information regarding the current and upcoming input as well as player intent. That is, because hands perform the input, and the shared screen is the locus of input, players can see each other’s input. This should play a factor in multiplayer games: it can be exploited for bluffing via falsified input or for coordination in cooperative tasks. A basic example of the latter occurs in PongVaders, a Pong/Space Invaders/Arkanoid mashup, with dual paddles on opposing sides of the screen. There is a powerup that causes one paddle to shoot projectiles which must be blocked by the other paddle. The blocking player must follow the shooting player’s finger; it’s a smoother experience than if the players were wielding separate controllers. Sharing an input area also results in issues regarding personal/controlled space as well as the potential for physical contact.

It’s now appropriate, if not past due, to talk about what sorts of tasks and games are unlocked with the intersection of all the qualities described above. To start, it’s worth rehashing the research of Stacey Scott surrounding territoriality. The rundown is that when people share a surface, there is a notion of personal spaces – the areas nearest the users  –  and public space in a central location apart from the users. Users are uncomfortable reaching into another user’s personal space or having their personal space invaded. Public space is a grey area that is difficult to define and navigate. You should already be imagining ways to build games around those conflicts.

Entire games can be built around territoriality. The emergent multiplayer aspect of Flight Control is an excellent example. Planes flying in from all over the screen must be directed to airports. Who directs which planes? A simple answer is to divide the screen into halves, but there will always be edge cases that are awfully close to the center, where either verbal or physical signals will be required to declare intent and territory. And what happens when planes originate in one player’s territory and must be directed to another player’s territory?

Given a shared input device, ‘above the board’ physical competition is possible. Due to the inability to disambiguate touches between players (that is, the device has no idea to /whom/ each touch belongs), it is easy to create games that are readily ‘broken’ by players blocking input from each other or touching each other’s avatar/game pieces. Of course, one can consider that there is a risk/reward scenario to exploit related to the idea that every finger or hand spent futzing with another player’s (real life) fingers or (in game) resources is one less hand spent defending one’s own (real life) hand or own (in game) resources. The best example of this physicality is Chicanery, a game by Auntie Pixelante and ported to the iPad by Bennett Foddy. In this game, each corner of the device has an image of a pad. Each player must hold a pad. The last person still holding a pad wins. In terms of the game, very little happens on the device/in the game world itself. The device acts as referee and that’s about it – the majority of the action occurs between the bodies of the players themselves as they strike and push each other to force players to let go of the device. The presence of the other people is also important, from being able to anticipate and dodge (real life) punches and kicks from other players, as well as being able to discern limits based on other players’ emotional states. Combined with questions of territoriality, one can imagine a resource-hoarding game where players strive to physically block other players from dragging resources from the shared middle of the screen to their personal stockholds, or to steal resources from other player’s stockholds for their own, at the cost of using those fingers to defend one’s own resources or collect ‘fresh’ resources from the middle of the screen.

One last task that holds potential is coordinated input. Let’s consider coordinated input of a single avatar. Using separate controllers, this would be immensely difficult. Halo allows one player to drive a car while another player controls the turret, yet it compensates for the rotation of the car in the aiming of the turret such that the driver never has to warn the gunner that a sharp right is upcoming – words are too slow to communicate frequent micro-adjustments, and the lag between 1) the driver’s input, 2) the driver’s input translated in the game, 3) the gunner’s visual perception of the driver’s input in the game, 4) to the gunner’s compensated input is too great. I believe that by cutting the chain down to 1) the driver making an input, 2) the gunner directly seeing the driver’s input, and 3) the gunner making compensated input, the control scenario should be feasible. In addition to solving some information problems sharing a physical input device also brings new physical challenges. Imagine a game where two players share the control of a single squid, a sea creature with ten obvious controllable facets (8 tentacles + 2 ‘arms’). One player could scrub the arms to make the squid paddle in the water while the other player controlled the tentacles to gather food and fend off dangers. Or both players could control tentacles. If all goes as planned, the game should, at times, devolve into finger twister.

While this is clearly not a definitive guide to design patterns and considerations when making games for multitouch phones and tablets, I do hope that it serves as a useful summary and starting point so we’ll see more new types of games that are possible with these new classes of devices.

SPECIAL THANKS: Jonathan Blow and Cole Krumbholz.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Comments

  1. Posted April 18, 2011 at 04:58 | Permalink

    Came here from VHX.… was not expecting to read an in-depth article on touch based interaction design. Gems like these make the internet awesome. Keep it up!

  2. Jackson Williams
    Posted July 22, 2012 at 04:16 | Permalink

    Interesting article… one thing; I’m not familiar with the use of “hockey stick” as a verb.  Does anyone know what the following quote means: “The equation does not hockey stick with the addition of new targets”

One Trackback