If you have not done your homework in the past before using a pencil and paper test to hire people (or even if you only use scores to “guide” your hiring decision), you won’t like this article. It contains references to dry government documents, test standards, and other information that isn’t fun to read and easy to reject. Unfortunately, it also describes what a professional recruiter should be doing to get fully qualified people in the right jobs. You have been fairly warned! There aren’t many professions where bad advice is as prolific as it is in hiring. Almost everyone has a favorite test they claim is the best thing since the last best thing ó especially if they sell or are emotionally committed to a certain test. But what buyers do not know is that many tests currently used to make hiring decisions were developed in the infancy of selection science, some to explain clinical illness, some to find a general personality theory, some to explain “why good people went bad” in Nazi Germany, and others so that training vendors could expand into new markets. Whether or not these tests “work” or not (i.e., whether their scores are associated with job performance) is often a moot point. You could, for example, “prove” that shoe size is related to promotion by measuring the feet of people at different management levels. Since there are more male managers in higher positions, and because men tend to have bigger feet than women, your study might “prove” that people with big feet have more management potential. (Want to fly that kite in front of your executive committee?) Only within the last 25 years or so have tests been developed specifically for hiring applications. Since people who use or sell tests are seldom test experts themselves, they are plagued with a mental condition where they don’t know what they don’t know. This means that some of their claims sound highly enthusiastic and plausible, but are completely wrongheaded. Let’s see if we can shine a little sunshine on the testing domain by drawing some statements from past forums. You really don’t have to read any further if you don’t care about applicant quality or don’t use tests. Just remember, though, a good system and a legal system are the same thing: they both provide a fair and level opportunity for all applicants, are directly job-related, and accurately predict performance.
Test Development Standards Let’s start with the 1999 Standards for Educational and Psychological Testing. These “Standards” define best practices for test developers and are recommended reading for any test user. Here are a few excerpts from the section on using employment tests:
- “Validity is the most fundamental consideration of developing and evaluating a test…it consists of accumulating evidence to provide a sound scientific basis for score interpretations” (i.e., no home-based evidence = no test credibility).
- “A clear statement of the test objective should always be made prior to development of a test” (i.e., a hiring test should always be based on its ability to predict job performance ó you should not take any old test and use it for training).
- “Selecting the right test should always be based on a job analysis” (i.e., you should never test for something that is not firmly grounded on business need and job requirements).
There is a considerable amount of important information in the Standards; I encourage every test user to get, read, and follow its guidelines. It can help part the fog of misinformation. Now let’s look at some of those claims made in past forums.
Claim #1: “Test users don’t have to worry about lawsuits.” Reality: The numbers of cases that get settled by court order are relatively small and most have to do with wrongful termination. But losing in court is not the real cost. Check out the following table of EEOC charges and monetary benefits.
| Fiscal Year | FY 1998 | FY 1999 | FY 2000 | FY 2001 | 
| Charges Filed (all) | 79,591 | 77,444 | 79,896 | 80,840 | 
| Monetary Benefits (millions) | $82.6 | $119.1 | $149.0 | $146.8 | 
| Discrimination suits filed (all) | 405 | 465 | 329 | 431 | 
| Monetary Benefits (millions) | $92.2 | $96.9 | $46 | $50.6 | 
Source: EEOC statistics tables Win, lose, or draw, each of these cases took someone’s time, legal expense, investigation time, depositions, and reporting. You have the data, now you decide: 1) Should you worry about lawsuits? If not, 2) Has anyone ever calculated the hidden cost of hiring mistakes over and above the potential for legal challenges?
Claim #2: “Pencil and paper tests are highly accurate predictors of job performance.” Reality: A 1984 summary of the validity common selection procedures shows the following:
| Selection Method | Predictability %* | 
| Traditional Interviews | 4% | 
| Personality Tests | 4% | 
| Motivation Tests | 4% | 
| Mental Ability Tests | 25% | 
| Content Valid Simulations | 64% | 
*Adapted from a meta-analysis conducted by Hunter and Hunter, Psychological Bulletin, Vol. 96, 1984. Percentages have been rounded. “Predictability %” refers to the explained variance. This data was updated in 1998, but does not break out data for personality and motivation tests separately. Pencil and paper tests can be useful tools, but hundreds of controlled studies have shown they are among the least accurate hiring tools. You have the data, now you decide: Should you rely on one person’s claims that his or her test scores are highly predictive, or should you believe impartial research?
Claim #3: “You can use any test, so long as it is validated and has no adverse impact against a protected class.” Reality: The Department of Labor has published a document called the Uniform Guidelines on Employee Selection Procedures. Here is an excerpt from the Guidelines: “All test users [regardless of whether or not adverse impact occurs] are encouraged to use selection procedures which are valid.” You have the data, now you decide: Should you rely on one person’s interpretation of the Guidelines or have you done your own internal validity study?
Claim #4: “Test users can use the same test for everyone regardless of the job.” Reality: There are three kinds of validity (i.e., ways to correlate test scores with job performance). Here is some advise from the Guidelines:
- “There should be a review of job information to determine measures of work behavior(s) or performance that are relevant to the job or group of jobs in question…to the extent that they represent critical or important job duties, work behaviors or work outcomes as developed from the review of job information.”
- “A criterion-related validity study should consist of empirical data demonstrating that the selection procedure is predictive of or significantly correlated with important elements of job performance…”
- “A content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated…”
- “A construct validity study should consist of data showing that the procedure measures the degree to which candidates have identifiable characteristics which have been determined to be important in successful performance in the job for which the candidates are to be evaluated.”
Sorry about all the “governmentese”-sounding language, but this is important ó even if it was written by bureaucrats. You have the data, now you decide: Should you use the same test(s) for every position just because someone tells you it’s okay? Have you conducted your own criterion, content, or construct validity study? Or, if not, what level of hiring risk are you willing to accept?
Claim #5: “The candidate cannot cheat this test.” Reality: This is just too silly to believe. Do you really think applicants are totally honest on a test that cannot be verified? Any self-respecting test developer designing a hiring test will include a truthfulness factor to minimize (not eliminate) faking. But even that is not perfect. You have the data, now you decide: Do you really believe any test is so good that it is impossible to fake and does your hiring test include a truthfulness scale to at least minimize lying?
Claim #6: “The EEOC has “approved” such-and-such a test.” Reality: The EEOC is a government agency. It is not a certifying body. The EEOC investigates specific discrimination claims to see if they have merit. The only “test” that would be “approved” by the EEOC would be one that was challenged and found non-discriminatory in a specific application for a specific job in a specific company. You have the data, now you decide: Should you write to the EEOC and ask them which tests they have officially approved? Don’t be surprised if you don’t get a specific answer.
Claim #7: The EEOC “likes” instruments that measure mental constructs. Reality: Constructs are defined as deep-seated, unobservable, underlying mental traits that a test developer will use to describe job behavior. Mental constructs include things like personality, values, attitudes, satisfaction, emotional intelligence, depression, intelligence, etc. (source: Psychological Testing: an Introduction to Tests and Measurement, 1988). Here is what the Guidelines say about using construct-oriented tests, “There is at present a lack of substantial literature extending the [construct validity] concept to employment practices…users should be aware that the effort to obtain sufficient empirical support for construct validity is both an extensive and arduous effort involving a series of research studies, which include criterion related validity studies and which may include content validity studies.” You have the data, now you decide: Should you write to the EEOC and ask them how they feel about construct tests? Don’t be surprised if you don’t get any specific answers. Of course, you could believe what the vendor says about his or her test.
Claim #8: “This test reduces turnover by XX%, increases productivity by YY% (it also whitens teeth and eliminates morning breath).” Reality: The Guidelines state, “Under no circumstances will the general reputation of a test or other selection procedures, its author, its publisher, or casual reports of its validity be accepted in lieu of evidence of validity…this includes validity based on a procedure’s name or descriptive labels; all forms of promotional literature; data bearing on the frequency of a procedure’s usage; testimonial statements and credentials of sellers, users, or consultants; and other non-empirical or anecdotal accounts of selection practices or selection outcomes.”
Claim #9: “You don’t need professional help to build a valid selection system.” Reality: The Guidelines state, “Professional supervision of selection activities is encouraged but is not a substitute for documented evidence of validity. The enforcement agencies will take into account the fact that a thorough job analysis was conducted and that careful development and use of a selection procedure in accordance with professional standards enhance the probability that the selection procedure is valid for the job.”
Claim #10: “Here is an attorney’s site recommending my test.” Reality: See Claim #8 above.
Claim #11: “This data proves my test is effective.” Reality: See Claim #8 above (also it is silly to state that any hiring test is so perfect it is foolproof).
Claim #12: “The test vendor is responsible for the validity of the test.” Reality: See Claim #8 above.
Claim #13: “Employment agencies are exempt.” Reality: The Guidelines state, “The use of an employment agency does not relieve an employer or labor organization or other user of its responsibilities under Federal law to provide equal employment opportunity or its obligations as a user under these Guidelines.”
Conclusion Congratulations! You survived the onslaught of dull text and technical standards (you might just qualify for law school). This may sound simple, but although general interview technology delivers very weak results (often no better than chance); nevertheless, it is a tool that most everyone can relate to. If you cannot use a good test and validate it for your job, you would be much further ahead to stay with interviews ó at least it is a technique that is easy to understand. You have the data, now you decide. Do you believe the Standards and the Guidelines actually define how to effectively use decent, job-related, validated tests? Or do you prefer to believe the enthusiastic comments of people who don’t know what they don’t know? To me, one seems much more frightening than the other.