Much Ado about Testing

An applicant takes a battery of tests. Three trained professionals examine the scores. One professional exclaims, “We have found our candidate! Hire immediately!” The second says, “Hmmm, the scores are ok. But, I’ve seen better.” The third professional warns, “No way! I see problems and a total disaster in the making!” This was the result of a controlled experiment that explored the value of using clinical test batteries (i.e., tests not based on job related skills) to select new employees. As you can see, expert examinations of the same data from the same person were totally opposite. This is the same problem faced by recruiters and hiring managers who use generic personality tests. We all want to make good hiring decisions, but if experts cannot agree among themselves, how can a layperson decide? Personality? What personality? Let’s begin by thinking about personality tests in more depth. Study after study has identified only three factors that are consistently associated with performance. They are conscientiousness, emotional stability, and extraversion (for sales people). Well, that’s not exactly rocket science, isn’t it? Good performers care about the quality of their work, they’re not crazy, and salespeople should be outgoing! Who needs research to tell you that? Certainly there is more to personality and job fit than three simple-minded factors! Yes, there is–it’s just very hard to measure. “Fibbing” Let’s start with honesty. Think about the last personality test you took. Were you absolutely, perfectly, cross-your-heart-and-hope-to-die honest? Give yourself a break! You are just like everyone else. It is normal for people to “fib” when taking personality tests. We tend to answer personality items according to how we want others to see us. It’s in our nature. Only an applicant with the IQ of a turnip would openly admit to doing shoddy work, being neurotic, or hating people. One way to minimize (not eliminate) the number of turnips you interview is to check your test for a “lie scale.” If a lie scale is not built into your test, then you cannot trust the results. (If you are conducting training programs, lie scales are unnecessary because your objective is to improve communications, not make hiring decisions). Without a lie scale to weed out fibbing, you might as well call the Psychic Hot Line for hiring advice. Soft traits and hard skills Next, get a firm grip on the idea that personality test scores and hard skills are only minimally connected. For example, a physicist’s personality test may indicate she has an analytical mind, but Cliff Clavin on the old “Cheers” TV program probably believed the same thing about himself. In short, there is a minimal (about 2% to 8%) correlation between personality test results and actual skills. Claiming you are an analytical thinker only means you want others to believe it. Then there is the problem of defining “performance.” Is “performance” based on charisma? How about rating individual performance in organizational cultures where everyone is “excellent”? (A classic case of “group think” if I ever saw one.) How about measuring units produced? Team member opinions? Performance on mental alertness tests? Personality scores tend to correlate with performance ratings only when ratings are “opinion based.” The correlation decreases when performance is based on hard measures (like really being able to solve difficult problems). Personality factors also tend to shift from job to job. In some jobs, I found the DISC predicted performance beautifully and in others it was a bust! Was the test at fault? No, the DISC worked only when its four scales were closely associated with job success and utterly failed when they were different. I had similar results with other personality test, as well. The lesson? Don’t use personality tests that were designed for training – and beware of vendors who advertise their personality test can be used for selection. The results will be hit or miss, at best! One size does not fit all Most personality tests are pretty simple-minded. Answer ten questions about being worried, and your scores will “predict” that you are the type of person who worries a lot. (Amazing!) Another ten questions and another no-brainer prediction, and so on. Eventually, you get a list of separate scores that restate the obvious – but they do not predict job performance. Useful personality tests don’t report a string of one-dimensional factors. They use different factors in different combinations to predict different on-the-job behaviors. The same factors that combine to predict prospecting do not predict quality, and so on. Before it can be useful, individual scores must be “mined” using sophisticated artificial intelligence programs. Adding and subtracting individual personality items might be fun, but, like deep fried pork rinds, it’s mostly a marketing gimmick to sell pigskin that’s not good enough to be sold for shoe leather. (I can just hear the inventor of pork rinds now, “Hey! I got an idea! Why don’t we just drop the skin that’s not good enough to use for shoe leather into a tub of boiling hot fat? People will eat anything that’s deep-fried!”) Another vendor “trick” is building “desirable” scores based on averages. I hope you have not traveled this road. Why? Have you ever examined the average size of men and women? Did you notice that while the averages are interesting, no one actually has an “average” body? My point is that averages hide individual differences. If you work from personality averages, it is unlikely that even people in the study will “fit.” Averages are a wrong-headed approach to developing target norms. How about “canned” programs that offer customer service or sales tests? Would these help? Probably not. Consider a popular test of “sales” ability. Just what kind of sales position are they measuring? Auto sales, strategic sales, complicated sales, conceptual product sales, tangible product sales, recurring sales, cross-sales? Get my point? Canned test scores promise a lot, but so does canned meat. (Hey! I’ve got an idea…!) Industrial strength How would you like to have a doctor listen to your heart for a few minutes, then write up a ten-page report on your general health? That’s what some personality tests do. A good test does not use one or two items to measure major traits. You need about seven to ten items to produce one stable personality factor. Let me explain. Consider a test that uses three items to provide a factor score. Common sense would tell you that an answer on any single item could affect the total score by 33%! Would you want to take a test where one response could sway the whole thing? This is WAY too sensitive. Take time to count the items on your test. If your test has 100 test items it should have no more than 10-factors, plus a lie scale (i.e., 100?11 = 9 items per scale). If you find less than a seven to one ratio, put the test where you store the pork rinds and canned meat. What’s the score? Does your test ask you to “compare” items? That is, “Would you describe yourself as more like ….., or like …..?” These are called ipsative tests. They compare one item against another, then add up items. Nice for training, but not for selection. Why? Because ipsative tests assume every item has equal weight, they compare items with each other and they do not indicate the “strength” of difference. Comparative designs work when you want to learn something about yourself, but they don’t work when you want to use the results to compare your self to a target (such as selecting people). You need “strength” responses in order to compare different people to job requirements. As we said before, keep the training tests in the training department! In memoriam Tah Dah! Here’s the summary. Score your present selection test using the following checklist. Give yourself one point for each item checked:

_____ Uses a scoring system that compares one item with another (ipsative).

_____ Target scores are based on “average” of high producers (or, no targets at all).
_____ Has less than 7 items per factor.
_____ Uses “canned” positions as the norm.
_____ Also used for training programs.
_____ Uses “add and subtract” scoring algorithms.
_____ Does not contain a lie scale.
_____ Presents a laundry list of factor scores.
_____ Does not predict specific job behaviors.
_____ Ratings are based on personal opinions.
_____ Test factors are not based on documented detailed job analysis.
_____ You enjoy eating pork rinds and buy canned meat.

Now, add up your scores. If you scored zero, give yourself a big clap on the back. If you scored from 1 to 6, think about when you were a child and wet the bed at night. You enjoyed a nice warm feeling for a few minutes—until it turned cold and uncomfortable and eventually gave you a rash. If you scored from 7 to 12, gather all your unused tests into a pile, roll them into tight little logs, set them afire and have a weenie roast. If you scored over 12, you can’t add!