The Einstein-Clavin Effect

Some readers might recall Cliff Clavin, a character on the old “Cheers” television program. Cliff was a postal worker with a massive blind spot. In spite of all evidence to the contrary, Cliff thought he was smart. Unfortunately, Cliff’s IQ was only slightly higher than first-class mail. Albert Einstein, on the other hand, was a theoretical physicist who expanded our understanding of the physical universe. He was so smart that researchers kept his brain in a jar for study (after he died, that is). Now, let’s suppose both Cliff and Al decided to apply for a job. Let’s further suppose they both took a test that asked questions about their intelligence, problem-solving ability, school subjects, success attitudes, sales ability, customer service, and management style. Will their test scores accurately predict ability? Hopefully, you said, “No way! A management test, personality test, sales test, or any other kind of self-reported test, generally predicts success only if someone is too dull to fake good. That is, we could probably trust a low score, but we would have to be very cautious of high ones.” Excellent response! We should also not be surprised to learn that controlled research studies confirm the fact that people who “fake good” on self-reported tests can outscore folks who give honest responses. Burn that into memory: People who “fake good” on self-reported tests can outscore folks who give honest responses. This is the problem with many tests marketed for hiring. At first blush, they may seem like the answer to all your prayers, but experience shows they give a false sense of security. Al’s high score in “problem solving” for example, might be the same as Cliff’s, with one “small” difference: Al has an abundance of mental horsepower that Cliff lacks. Validity? Validity means someone conducted a formal study that showed test scores predicted performance for their job. The same validity study cannot be assumed to work for your job. Validity is local ó local to the organization and local to the job. Local. The only time a test user should trust someone else’s validity data is when he or she knows (really, really knows) that both jobs are virtually the same. But since everyone insists that his or her company is different, using external validity studies become problematic, yes? Well, let’s just make validity even more complicated. Validity scores are often assumed to fall along a straight line: a score of 10 equals 10% performance, 50 equals 50% performance, 100 equals 100% performance, and so forth. That’s what traditional statistics evaluate: straight-line, normally distributed relationships. The trouble with relationships, however, is that they are generally not linear. A 20% difference in test scores seldom translates into a 20% difference in job performance. Test scores and performance ratings are often error-filled, and test scores can be too low, just right, or too high. For example:

Performance is seldom linear. Unless we have something to count (e.g., units per hour, dollars per month, and so forth), the most we can say about performance is 1) people are at the top of their game, 2) they are doing okay, or 3) they are fish bait. In spite of the fact that HR asks us to rate employees from 1 to 10, most folks cannot accurately describe ten one-point differences between Billy Bob and Sally Mae. Nor can they put overall values on performance when, for example, Billy is better at closing but Sally is better at customer service.

Test scores are not like thermometers. While people can often sense a few degrees of temperature rise or fall, they cannot reliably identify a few points of performance difference. Like Billy and Sallie in our last example, there are simply too many factors to consider and too many things that interfere.
Speaking of interference, there is no such thing as a “perfect” test. Test scores tend to float up and down. I heard of one applicant who was given the same test by several organizations (the Wonderlic, a highly popular test of mental alertness). She started out average on her first trial, but after she took the test a few more times, she became a genius. Recall this story the next time a vendor brags about his or her widespread test popularity (nothing comes without a cost in this business).
High or low? Some managers tend to hire the best and brightest, put them into jobs that are dull and predictable, and act amazed that employees either turn over faster or demand fast-track promotions. For example, I once worked for a self-proclaimed world leader in testing whose consultants consistently designed assessment systems that hired the best and brightest for green-field startups. Guess what happened one year after the plants were up and running? Does the phrase “all chiefs and no Indians” mean anything? One size only fits all when you wear body paint ó and not everybody looks pretty in paint (compare Demi Moore with Michael Moore, for example).

Sorry about that last mental image. It was cruel, but a few weeks of therapy should help. Putting Your Gut First? Psychologists tend to be a pretty liberal bunch. While I was in grad school, many of my classmates argued for the “job equality” of men and women. Nothing wrong there. So I tried a little experiment in cognitive psychology to see if inner feelings matched public words. I divided the class into four groups and gave each group private instructions: Group 1 was to brainstorm a list of desirable business and management adjectives; Group 2 was asked to brainstorm a list of undesirable business and management adjectives; Group 3 was asked to brainstorm a list of “male” adjectives; and Group 4 was asked to brainstorm a list of “female” adjectives. When everyone was done, I asked each group to report. Guess what? Male adjectives matched the desirable business and management list, while female adjectives matched the undesirable business and management list. Their inner feelings “short circuited” their public statements! This exercise demonstrated how internal stereotypes can unconsciously affect external decisions (even among folks who argued they knew better). The same error-prone stereotyping applies to models such as social styles, MBTI, DISC profiles, sales styles, and leadership styles ó fun, but often impractical, unrealistic, and downright pejorative to qualified applicants. Take a Flyer on Poor Test Data? A few articles ago, some readers mounted a micro-attack against the use of empirical data to make hiring decisions. The argument was, “Tests cannot tell us everything about a candidate. Sometimes you have to ‘go with your gut’ and ‘take a flyer’!” (I think that means to take a chance). Okay. Of all the hiring opinions I have heard, that is certainly one of them. We can argue all day from the sidelines, but are most line managers willing to take a chance on an untried candidate? I floated the idea of “taking a flyer” with a few of them. Their reaction was not positive. In fact, the managers I spoke to were downright hostile that anyone would even think of asking them to either interview an unqualified applicant or hire someone who could not demonstrate skills before starting a job. Hmmm, I wonder why? Conclusion Testing is like quicksand. It looks harmless and easy, but it is very deep and has the potential to swallow users without a trace.

Always ask the test vendor to demonstrate his or her test “works” for your job and your application.
Never accept a test vendor’s word that a test has been “validated” unless you have evidence that test scores predict performance in your jobs.
Know how to set cut-off scores: too high, just right and too low.
Understand if high, medium and low scores can be trusted.

And finally, always be certain your tests can separate the Einsteins from the Clavins.