More About Test Validation

The last time I wrote about validation, you would have thought I was taking a public position on stem-cell research, partial-birth abortion, or teaching evolution in the classroom. The fact that some people would even challenge good validation procedures was, for me at least, a complete surprise. Validation means making sure that pre-hire test scores accurately predict performance. How can your argue with that? Yes, Virginia, interviews are pre-hiring tests. People get screened out by interviews. This means interviews are like every other form of pre-screen test: they should be validated. Otherwise, how can an interviewer know with any degree of certainty whether folks who pass interviews can do the job? Coming to grips with interviews as tests gives most people a headache because it forces them to come face to face with hard data rather than soft opinions. It makes no difference if an interviewer is 100% convinced of the accuracy of his or her interview technique if there is no feedback or hard evidence to support it. Why? Because human decision-making is flawed. Some people, for example, are convinced that flying is significantly more dangerous than driving, when exactly the opposite is true. Others believe they can win the lottery when there is a much greater chance of being hit by lightning. Just as surprisingly, although there is an embarrassing number of low producers on most payrolls, hiring managers still generally argue that “they know ’em when they see ’em.” Basically, recruiters must always be aware of the flaws that affect human decision-making. People tend to readily recall information that is vivid (i.e., a major accident) or recent (i.e., happened within the last few days, weeks, or months), that confirms existing opinions (i.e., we “stereotype” people), or that is readily available (i.e., we see it every night on CNN). Flawed decision-making in recruiting leads to flaky job standards, hiring the wrong people, and rejecting the right ones. It is a major reason why Congress passed the Civil Rights Act and the Department of Labor wrote the 1978 Uniform Guidelines on Employee Selection Procedures. Anecdotes Are Examples of Flawed Decision-Making I know a brother of an aunt who knew an actress who had a good experience with a shopkeeper who hired people using the “Seems Like It Might” (SLIM) performance test. In fact, the “Seems Like it Might” company proudly markets the fact that recruiters have a SLIM chance of making a good hiring decision based on their test results. Many people who avidly support certain tests do so because they think the test accurately predicts individual performance. When they are probed, we discover there is absolutely nothing except folklore and superstition to support their opinion. These opinions are not fact; they are homilies and anecdotes. They serve us well when there is nothing important at stake, but cost our organizations millions in lost productivity when they affect hiring decisions. We really need to rethink our profession and hammer home the point that this is not a “learn as you earn” job. Questions and unfounded recommendations about “best” interview questions and generic interview-workshop competency lists indicate an embarrassing lack of professional knowledge. Imagine a group of physicians asking questions about where to find sharp scalpels, engineers asking for recommendations about building materials, or policemen asking about the best bullets for shooting suspects. Based on feedback and questions in the public forum, one would think that setting clear job standards before starting a job search was akin to discovering cold fusion. How does a professional become more professional? Read the right books. Go back to school for a semester. Read the research. Do anything backed by good sense. But stamp out the idea once and for all that this business is as easy as giving an applicant a silly test. Want a good doctor? Find one with medical training. Want a good architect? Find one who has experience in construction. Want a good recruiter? Find one who knows how to set job standards and who can fairly and accurately measure applicant skills. Correlation Is Not Causation Suppose the National Enquirer published a nice ten-question hiring test. Furthermore, suppose we gave that test to our high producers and averaged their scores. Is that validation? (Do alligators make good house pets?) Correlation means there is an association between two variables, i.e., high producers tend to be mentally sane. But correlation is not enough. Recruiters need to find causation. Does sanity cause high production (causation)? Or do high producers tend to be sane (correlation)? Toss all your training tests in the closet unless you know, for certain, that the content they test for causes productivity. Styles as measured by the MBTI, DISC, Enneagram, Social Styles, and so forth, might occur more often among certain job holders, but life is too complex to assume style causes productivity. This is critical to remember because, while hiring managers might embrace an intuitively attractive test today, if is does not predict performance, it will fail over time. Recruiters need to know that “what is measured” equals “on-the-job performance.” “Getting to know the applicant” or playing amateur psychologist is pure bush league. Remember that blue eyes and blond hair might be correlated, but one does not cause the other. Cutting Through the Dreck Suppose we accept that 1) causation is the only way to accurately predict performance and, 2) our test content is a pre-cursor to job performance. Our next question is how to prove (i.e., validate) the test and arrive at good cut-off points. This takes a thorough knowledge of statistics and experimental design. For example, we have to define what to predict. Is it supervisor ratings? Performance appraisals? Three-sixty survey results? No, these are probably filled with error and subjectivity that would yield untrustworthy results. We have to find “hard” data that is hard to fake, something we know we can trust. Okay, let’s suppose we have the right kind of hard data. What’s next? We need to compare test scores with on-the-job performance. We can do this several ways:

By giving the test to everyone who applies, hiring them all, waiting until we get performance data, and comparing test scores with job performance.

By giving the test to people already in the job and comparing test scores with job performance.

Of course we’ll need to correct for “restriction of range,” that is, the people who are in the job will be more similar than people who apply for the job. So what? Well, for one thing, we might not see the same kind of big differences between high and low producers that we would see between applicants. Bummer! For example, we might find there is very little difference in skill between the top 10 players on the pro golf tour and the bottom 10. On the other hand, we would probably find a very large difference between the top 10 spectators and the bottom 10. Adverse Impact Discriminating against qualified people based on non-job-related factors such as race, gender, or disability is a bad practice (it’s also against the law in the U.S.). But that does not mean we have to hire people just because of race, gender, or disability either. We need clear job standards that are based on job requirements, business necessity, plenty of documentation, validated tests, and monitored pass rates for each decision point. Can we discriminate? Yes ó but only against people who do not have job skills. This is a “quicksand” occupation. It looks calm on the surface, but it is dangerously deep for naive folks who wander too close. Recruiters should never rely on one tool, should always do job analyses, should never ignore the law, should validate every test, should acknowledge flawed decision-making, and should do everything they canto become more professional. Looking for quick tips and usable ideas? Sorry, like everything else, there is no such thing as fast, cheap, and effective.