The test vendor says, “This test was developed specifically for the banking industry.” You say, “Sounds good. I work for a bank. I’ll buy it!” Mr. Politically Incorrect says, “Bzztt! Nope.” Now, some vendors probably believe their own press and think their tests are validated for a given industry. But that does not make them correct. To understand why, we need to review the concepts of “validation” and “validity generalization.”
Validation
Suppose you took an employment test. Regardless of whether it was used widely in the industry, wouldn’t you really want to know whether its scores predicted job performance? Think about it. What do the following statements really mean about a hiring test?
- “Our test is validated for use in the XYZ industry.” What, for every job? Are all companies in this industry identical?
- “Our test contains industry norms for the XYZ industry.” What, everyone in the “industry norm” base is a high performer?
- “This is the ‘average’ score for people with this job title.” So, all people with the same title do identical tasks and are high performers?
Validation means someone, somewhere, did a formal study to see whether test scores predicted job performance in a specific job. And what does validation have to do with test choice? Well, once upon a time people believed tests should be re-validated every time they were implemented. Then someone asked, “Why do we have to re-validate a test every time it is implemented when someone else might have already done all the work?” This started a series of investigations and studies that concluded, “If two jobs are essentially the same, then the validity data can be ‘transported’ from one job to another. Sweet!” (assuming Ph.D.s would actually say something like that). From the user’s perspective, this is “proof positive” that if someone is in the XYZ business (IT, financial, banking, and so forth), then it’s okay to use the “industry validated” XYZ test without doing any further work, right? Nope! Sorry. Read on.
Test Choice
As discussed in a past article, few tests are designed for hiring; that is, their content is not based on job performance and their scores don’t predict it either. Why should this be a big deal? Because improperly used tests have a real financial impact on 1) qualified people, and 2) organizations that hire unqualified people. Job-qualified minorities, for example, have a history of being excluded based on unsupported job requirements and inappropriate tests. Yes, that includes interviews.
A hiring test is different. Valid hiring tests are based on a theory of job performance; scores are supported by studies that show they predict on-the-job performance; scores are stable over time; and test developers follow guidelines intended to make their tests rock solid. This is a good thing. Assuming we are only working with a library of true hiring tests, how do we choose which one to use? We look at job analysis data. Job analysis identifies the critical competencies required to perform the job. People cannot just “believe” an XYZ test will work for all jobs. The next challenge is to make sure scores predict job performance. This is something a reasonable person would want to do, right? After all, if the test content is critical to the job, logic states we should make sure it works.
Deja Validation
But suppose someone else has already done the same work for a similar job. Can we save time? Yes, but only if we know the two jobs are essentially the same. This is done by comparing the parameters from the first validation study (what was measured and so forth) to the parameters from the second study. If the jobs are similar, if the performance criteria are similar, and if the first study followed professional practices, then and only then can we “transport” the data from one job to the next. Aside from legalese and validity generalization meta analyses, the bottom line is:
- Responsible people need to that know a specific test score predicts performance.
- Validity generalization is not an excuse to use a specific test just because a vendor claims the test was “validated.” To point out a few best practices here (excerpts from the 1978 Uniform Guidelines):
Under no circumstances will the general reputation of a test or other selection procedures, its author or its publisher, or casual reports of its validity be accepted in lieu of evidence of validity. Specifically ruled out are: assumptions of validity based on a procedure’s name or descriptive labels; all forms of promotional literature; data bearing on the frequency of a procedure’s usage; testimonial statements and credentials of sellers, users, or consultants; and other nonempirical or anecdotal accounts of selection practices or selection outcomes.
…Enforcement agencies will take into account the fact that a thorough job analysis was conducted and that careful development and use of a selection procedure in accordance with professional standards enhance the probability that the selection procedure is valid for the job.
Are these guidelines just a bureaucrat’s dream? No. Are they the “law of the land?” No. But what reasonable person can argue against using a test that “fits” the job or against knowing that scores actually predict job performance?
Sticky Issues
Research study findings are reported in terms of trends and correlations. They are not “perfect proof”; they just represent a high probability the results were not do to chance. Take, for example, the concept of meta-analysis. This technique statistically combines results from many similar studies while “mathematically” controlling for sample size, test error, and so forth. Meta analysis is supposed to “minimize” the experimental error between one study and another. The results of a meta analysis or other statistical report reads something like this: “The data had a correlation of +.30 with a probability of chance less than or equal to 5 percent.” In other words (human ones): The numbers from Source A were roughly in alignment 9% of the time with numbers from source B. (That’s not “technically” correct in statistical terms, but it will do for the purposes of discussion).
Of course, a +.30 correlation coefficient still leaves us wondering about the 91% of data that were out of alignment. Is meta analysis data as exact as reading a thermometer? No. Is it an indication that every study was tightly controlled? No. Can you “take results to the bank”? Only if you are prepared to argue that an average of averages is a precision measurement. It’s not. How about a uniform definition of job “performance”? For example, what happens when a mentally dull but politically skilled employee scores “low” on a test but is rated “high” by his or her manager? Is the test incorrect? Or is the rating incorrect? (I’d bet on the rating.) The bottom line is, it is okay to use tests:
- Designed for hiring
- As long as test content is supported by job analysis
- As long as test scores predict job performance
- As long as professional test protocol is followed
It is not okay to use employment tests:
- That are not designed for hiring
- That are not supported by job analysis
- If test scores do not predict job performance
Other considerations:
- An interview is a test.
- All tests should be examined and reviewed to reduce adverse impact.
- Validity generalization is only acceptable when the two jobs are essentially the same (and supporting data shows they are the same).
Think about it: Isn’t that just what a hiring manager wants to know about a pre-hire test?