CHI 2001, Day Two

Previously posted at http://joel.westside.com/default.view on April 4, 2001.

Overall, I'm enjoying the conference and think it's worth it. I keep bumping into the same people, some of whom I met at a dinner Monday for people on the CHI-WEB mailing list.

Assorted items

this may be the first computer-related event I've ever been to that had gender parity.

the IBM display didn't seem quite so amazing today. hmm.

Tried an eye-tracking system - camera below the monitor watches the eye to figure out where the pupil is pointing. It had some trouble with reflections from my glasses, and accuracy was iffy (within .5 inches at best; during text scanning it was a whole line off for much of the text). However, the reading pattern was extremely apparent in the plot of the X-axis movement: an obvious, repetitive stairstep as I scanned a word at a time, then ratcheted back to the next sentence. This pattern was apparent for two other people. The booth lady said that many stroke victims completely lack this pattern - their eyes simply don't know what to do. Also, kids with reading problems have obvious regression patterns - you can see where they go back and reread a word one or more times. There are now systems that can detect this and read the word out loud to the child. Like a parent .... *shudder* They've sold 250 to handicapped people and 50 to labs. Accuracy goes up when you can't move your head. I tried to use the eye-tracking keyboard and found it frustrating but doable - with training, practice, and quadraplegia, I probably could get fairly good. $18,000.

First event: Ethics in HCI

good panel discussion.

Existing ethics resources: ACM guidelines, American Psychological Society; medical guidelines. www.dialogdesign.dk/chi2001.htm has links.

Five situations were posed to the audience, with panel members arguing the two sides:

1) Studies show that, in order to develop, many types of communities need boundaries and exclusivity. Should you (the usability engineer) push for this functionality while building an online community system?
60% of the audience said no; 40% said yes. Audience comments: ethics are almost always situational, not absolute. In South Africa, there would be serious legal issues with an exclusion system.

2) Is it ok to overstate the severity of usability problems in order to get them fixed?
95% voted no.

3) Is it ok to condemn a site in an press interview based on opinion, not tested data? went 50/50 comments: immaturity and unprofessionalism are not necessarily unethical. argument: unprofessionalism is unethical. professional opinion carries weight and disclaimers are likely to be dropped by press.

4) You are usability testing a medical product which will be used by nurses. The nurse supervisors insist on 'taking the test' themselves first, and then observing while their nurses participate in order to 'make sure they do it right.' Should you proceed or not?
Yes argument: 1) the situation is already bad, so your test can't be responsible for making it worse. 2) you can manage the testing to compensate
No argument: 1) you have an overriding obligation to the participants (to not set them up to be abused) 2) the nature of the evaluation is clearly being misunderstood, so the situation is ambiguous 3) you will get bad data
95% voted No

5) The day before shipping, usability testing uncovers a potentially very serious (non-usability) defect. There are no showstopper usability problems. The next day, there is a go/no-go decision where all managers must agree to ship the product. If the product ships on time, all managers get a substantial cash bonus. Should you (the usability manager) agree to ship the product?
Yes argument: 1) the defect isn't a usability problem 2) it's better to ship anyway; why delay a product that many customers are waiting for due to a defect that won't affect most of them
No argument: no good notes


90% voted no

Closing comments

An alternative view to "adding shareholder value is the sole goal of a corporation" is that contribution to the public good is as or more important.
Ethical context is not constant over time - what was ethical at one point in time may not be later because standards change
Usability professionals (and presumably everyone) face ethical issues every day. for example, whenever writing a report on usability testing, you must decide what to include and exclude; what to emphasize. this will reflect personal biases.


legal comment: in a court of law, on a scale of one to one hundred, consent forms are worth about 15 points

Second event: Panel on Measuring Information Architecture

Five participants, all made positive contributions to the discussion. The presentations were all thought-provoking; everyone had a different opinion on what the panel was going to be about. I don't remember thinking that the discussion was a waste of time, but I don't have any real notes from it. Some time was wasted on 'what is information architecture?'

Jesse James Garrett, Adaptive Path

This presentation was off the mark: he was claiming to argue that the quality of Information Architecture isn't quantifiable, but was actually arguing that current machine measures of IA are unacceptably bad. And even that point is debatable in my opinion. He tried to make the point by creating a parody of website guidelines: what would a typical website guideline look like when applied to a textbook. (paraphrasing from memory) between 12 and 19 chapters, each between x and y pages, with a certain ratio of pictures to text, etc. His point was that this is ridiculous, but I actually disagree - I don't think that that kind of restriction for a textbook is silly at all. Or rather, it's a good starting point for anyone who doesn't know any better. Once you understand both the subject domain and the way your readers will use your textbook (comprehensive class, pick and choose, reference; undergrad, advanced researchers, practitioners) then you can break the rules. He also argued that the reason to quantify IA is to justify the work at all. I think this is a terrible point; you should quantify IA in order to find out if your current IA works or not, and to find out how to improve it, not to justify your profession. Mr Garrett clarifies: "I didn't argue this point. A questioner from the audience suggested it; I agreed that justifying the value of doing information architecture was important, but I felt quantification was not the best way of making this case."

The only points in his presentation that I liked were, The user's key question is, will clicking this link get me closer to what I want and Language, more than anything else, is the user's primary cue. I'm not sure I agree 100% with the second point, as location, format, page shape, and other factors can all overwhelm language, but it's worth arguing.
Fortunately, he was better in the panel discussion than in his misguided presentation.

Marti Hearst, Assistant Professor at School of Info Mgmt and Systems, Berkeley

"I believe most aspects of IA quality can be quantified, and that new tools are needed ...." She had a useful chart:

Complexity of ContentHighCatalog (IA most useful here)Information system
LowBrochureService Site
LowHigh
Complexity of Application

(Mecca et al., ACM WebDB'99, http://www-rocq.inria.fr/~cluet/WEBDB/procwebdb99.html)

She identified a number of components that have been labeled Information Architecture

  • Information Design: Categories and labels
  • Navigation: The paths through data
  • Graphic Design: The visual presentation of everything else
  • Interaction Architecture: The system (rules, metaphor) by which users interact with the computer (Not sure this was on her list, but I heard it elsewhere and it fits here)

(Newman & Landay, DIS 2000, http://guir.berkeley.edu/pubs/)

A major trend on the internet which massivly impacts IA: pages of content served from a database. This has a very positive consequence for IA: access can be instrumented. It's getting much easier to collect reams of data about exactly how people look for information, and about how successful they are looking for your information in your system. "We need to measure to learn what works and what doesn't."

She riposted Garrett's parody of her team's webpage guidelines: "Measuring the surface properties can accurately predict how people rate web sites."

Ms. Hearst adds, "More information on our empirical usability assessment work can be found at http://webtango.berkeley.edu. Thanks! Marti "

Nick Ragouzis, consultant (Interfacility)

Lazily hyperkinetic and demanded complete attention to keep up, but clearly a member of that rare species, "non-bogus consultant." Disclaimer: Ragouzis used language very precisely and accurately, and I'm probably butchering not just his prose but his actual points, since my notes are fairly low-fidelity and don't catch when he was being sarcastic to make a point, etc.

Paraphrase from his written statement: Practitioners of HCI are utterly ignorant of research. But "over decades, without a credible basis for defining or measuring the whole or human experience, they have garnered an astounding quantity of success. ... requires only the ability to innovate ... and to deliver user-perceptible value. ... Abandon quantification, and may the fittest win."

"[IA] isn't a research domain, it's an applied domain. ... IA isn't intrinsically valuable. IA quality improvements are verifiable only via customers' perceived value. What is your strategy: differentiate yourself from your rivals, show affinities with your partners; enjoy the competitive power of a well-positioned follower; sustained rate of growth relative to rivals.

What does a plan to achieve mediocrity look like? Maintain parity; do what 90% of the competition does; follow the de facto standards; follow the "X will address 75% of users" recommendations; get improved conversion rates upon launch.

Shiraz Cupala, Microsoft (managed Office web site)

This guy was so caffienated he made Ragouzis look sleepy. He advocated tying traditional IA metrics to standard business metrics.

IA measures
QuantitativeQualitative
Task Success Rate
Time on Task
# of categories, labels
Satisfaction
Frustration
confusion

for-profit goal: sell products. Metrics: Revenue, referrals, subscriptions, brand loyalty

non-profit goal: spread knowledge. Metrics: memberships, subscriptions, registrations, cross-links/references

Now we can do surgical tracking. Here are results for Front Page 98:

  • Millions of promotional items
  • ~100,000 views of information page on microsoft.com
  • ~10,000 views of registration page
  • ~2,500 downloads of evaluation copy

After removeing the registration requirement, and shortening the info page, got +10% downloads. However, lost a lot of email addresses (which could be used to [spam] notify people when release version was ready); calculated that overall, would get 2-4x more revenue with fewer email addresses vs more anonymous downloads. of course, the cost of experimenting, measuring, and figuring all this out could be greater that the delta in revenue.

The way to measure information architecture quality is to perform surgical tracking to determine if changes in IA improve business-goal-oriented metrics.

Gary Marchionini, Professor, School of Info. and Library Science, UNC

"I do believe we can measure the usability and effectiveness of a design for very fine-grained characteristics such as number of clicks to task, mouse-travel distance, ... I am skeptical that we can have a standardized overall measure of IA effectiveness.

We can take some fine-grained measurements, for some criteria, for some tasks, for some people, at some point in time. For example, path length, vocabulary, reading level; and post-hoc aggregate data such as hits and referring links.

Overall, IA quality is qualitative. We can measure a few discrete acts, such as retrieve/buy/print/verify, because they have clear progress indicators and stopping points. We can't measure things like Explore/Browse/Learn/Read, because they don't have clear progress indicators or stopping points; they are continuous acts.

Q & A

Why are we still talking about this, when IA is already a well-understood discipline? Because we stopped treating users as computers and started treating them as people.

How to test large site IA before building a site? Exploratory methods don't scale. Two first things when building a new web site? What do users want? What does the site owner want from users?. Usability is cyclical; IA is ongoing; entire org should do both. Usability has artifacts and checkpoints. Can't test IA in abstract; need examples. relationship between db design and IA? db is an implementation issue which doesn't/shouldn't directly address user needs.

Third event: Short talks on Interaction Techniques

A French researcher made a fairly incomprehensible and somewhat desultory argument that Fitts' law (Fitts' law essentially says that, when moving things (objects, mouse pointer, finger) from one point to another, the time it takes to do so accurately goes up (I think) exponentially with distance and target size.) is wrong. something about relative vs absolute measures. Each slide made sense but it didn't add up to a coherent point.

A common rule of thumb is that computers must react in <100 milliseconds in order for users to perceive response as instantaneous. A researcher at (somewhere in the Midwest) tested this for five common tasks (pulldown menu, buttons, typing text). They found that, to a good degree of confidence, half of all users (23 person sample size) consider <190 msec to be instantaneous for buttons/menus, and <150 msec for text entry. However, you probably need to aim for <80 msec (more research to pin this down) or some sizable fraction of users will still detect delay.

It's very hard for computers to measure error in text entry, because of extra characters, transposition errors, etc, that make direct string comparison useless. Also, users want to correct as they go. York University researchers applied the Levenshtein (think Gene Wilder) String Difference, which was originally developed to measure errors in genetic transcription. They reprocessed some data that had taken someone a day to measure manually, and found that results seemed to match. This talk was interesting but seemed somewhat chintzy to me as they hadn't done any actual experimentation, just fiddled with other people's data.

IBM researchers presented data on how to set up tap keyboards for efficiency. A blend of alphabetical and efficient design was almost as good for (theoretical - they didn't actually train anyone) trained users (41 WPM predicted vs 42 for purely efficient) and was somewhat easier to learn and faster for novices. In the course of a fifteen-minute test, users (sample size of 12) went from 8 to 9.5 WPM for the most efficient keyboard and 9 to 10.2 for the semi-alphabetical keyboard.

Problems with head-mounted HUD:
Binocular rivalry. If one eye sees a terminator-style display and the other normally, the brain will alternate between the two at random, unpredictable intervals, producing a patchwork image.
A transparent display (text in the foreground, reality in the background) produces visual interference.

Experiments showed a 37% difference in ability to read and work between best case (standing near a blank wall) and worst case (tv screen in background)

Last event: SIG on Socially Adept Technologies

Some researchers from Canada are trying to start up some sort of interest group within CHI for Socially Adept Technologies, an umbrella definition for systems that adapt to social context, situations, emotions, personality. Example applications: computers that make small talk to build trust with users. Computers that feign emotion to make users more comfortable.
Issues:

what are the ethical boundaries? Should a computer pretending to be human require a disclaimer? If the user figures it out, they are pissed (research shows). If the user doesn't figure it out, are they harmed?

Anecdote suggests it is possible to reliably tell if a user is introverted or extroverted based on a 200-word speech sample. (aside: other studies clearly show that users react better to computers that display the same behavior pattern as the user)

"Someone's going to do this. It should be us, since we'll do it ethically and openly."

It's generally accepted that people react to anything that displays intelligence (and even to plenty of objects that don't) as if it had human characteristics. Therefore, socially adept concepts won't go away if ignored.

What is socially adept technology? Is it a set of guidelines for usability design (such as, computer should not do things that would be rude if a human did them)? Is it active technology, such as profiling users on the fly and changing dialog text and behavior to compensate?

We should study this because it will illuminate human-human interaction issues from a new angle.

Computers can respect social expectations without behaving like a person.

Studies show: users behave as if a computer's time is worthless. A computer's apology is worthless. A computer's sympathy is worthless. A computer's empathy may not be worthless.