All questions appearing for the first time in an examination are pretested,
which means they are not counted in candidate scores. Pretesting is a standard
practice in the testing industry that allows testing of new, unproven questions
without risk to the candidate. Pretest questions, which are not identifiable to
the candidate, are distributed across several forms (versions of the
examination), each of which contains pretest questions. After assessing how each
pretest question performs, the Board determines which of these to include in the
live (scored) portion of the next examination.
The development process for the pretest portion of an examination begins with
a test committee that writes new questions. Each committee is composed to
reflect the breadth and variety of its specialty area. Academicians and
practitioners are included to ensure adequate content coverage and to include
the perspectives of both training and practice environments.
The content of each examination adheres to the examination blueprint, a
pre-established table of content specifications developed and reviewed annually.
It is based on analyses of practice in the specialty area, modified to reflect
the breadth and relative importance of the clinical problems encountered. After
reviewing the blueprint distribution of questions available in the pool for each
content area, ABIM test development staff determines how many new questions are
needed for each content area and makes assignments according to the committee
membersā€™ expertise and interests.
Each examination is composed of single-best-answer multiple-choice questions,
which are the most widely applicable question type used in the testing industry
and are particularly suited for simulating clinical decision making. Although
the single-best-answer format can effectively address a specific knowledge point
without the use of a clinical stem (patient-based case scenario), the
overwhelming majority of ABIM examination questions use patient-based formats
assessing the higher-order cognitive abilities required for clinical decision
Question Content Criteria
In choosing content areas for new questions, question authors are guided by
topic lists of available pool questions in order to target any underrepresented
areas. Authors survey the general content domain for their assignment and
highlight the most important areas for new questions, particularly those in
which practice has changed recently, and compile a list of specific testing
points, each of which will be addressed with a new question.
In choosing specific testing points for a proctored, secure examination, the
test committee is mindful of assessing only what the certified internist is
expected to know without access to medical resources or references, as opposed
to knowledge that is appropriate or even mandatory to "look up." The
level of difficulty for each testing point is targeted to the measurement goal
of the examination, which is to discriminate between candidates who possess the
cognitive expertise required for Certification from candidates who do not
possess this expertise.
ABIM question authors follow a stepwise procedure when writing new questions.
For each testing point, the author selects a cognitive task (such as diagnosis
or treatment) and a cognitive ability (such as clinical judgment) to be tested.
Then the author writes the lead line (the actual question to be posed), the
correct answer, and the distractors (incorrect response options). The correct
answer must be clearly correct and uncontroversial, evidence-based, and a better
choice than any of the distractors. The distractors are designed to reflect
plausible options likely to be selected by less able candidates. Next, the
author constructs a clinical stem designed to set up the specific testing point
addressed by the lead line and response options. Finally, the author writes a
question rationale, which relates the testing point to the specific information
in the question, and cites any applicable references from the medical
Question Review/Editing Process
Newly written questions are reviewed by the entire test committee at a
meeting in which questions are read aloud, one by one. The committee decides by
consensus opinion to accept them for further consideration, revise them at the
meeting, assign them to individual members for more extensive revision after the
meeting or reject them.
After the first review meeting, accepted questions are edited by the test
development editorial staff, who standardize question style, format and
terminology, correct grammar, and identify problems with ambiguity and technical
flaws such as cues to the answer. Editing occurs after each meeting; editorial
changes are checked for medical accuracy, and editorā€™s queries are resolved by
the committee at a subsequent meeting. All pretest questions are reviewed a
minimum of two times by the exam-writing committee.
Illustrations are prepared by the test development production staff from
pre-processed glossy prints, transparencies or, in some cases, original clinical
material (such as electrocardiographic tracings) supplied by the question
authors. The committee reviews all illustration proofs before they are produced
for the examination.
At the second review meeting, the test committee reviews the edited new
questions and revisions and selects the final set of pretest questions. The
selected questions are then proofread and prepared for examination production
along with the selected live questions (see below).
After the examination is administered, the pretest questions are sent to a
large group of diplomate-volunteers, who review the questions and rate them for
clinical relevance on a scale of one (not relevant to practice) to five (very
relevant to practice). Clinical relevance is self-defined by the reviewers, all
of whom are full-time practitioners (>70 percent time spent in direct patient
care based on self-reported survey data). On average, each reviewer sees about
15 questions; correct answers are not provided. Each question is rated by at
least 12 reviewers practicing in various areas of the country.
Answer Key Validation
Concurrent with the relevance review, staff psychometricians complete the
performance analyses of the pretest and live questions. Prior to final scoring,
a key validation process is conducted to determine if any answers have been
mis-keyed or need to be modified. This is accomplished by a Board review of
questions that were overly difficult, nondiscriminating or that performed
differently than in their previous examination use; questions that received
critical comments from candidates; and questions addressing topics for which new
information has emerged that may affect their correct answers. If the Board
determines that any live questions are unfair, multiple answers (up to and
including all possible answers) are scored as correct during the final scoring
process, and these questions are removed from the live question pool.
Certification Examination Selection Process
The test committee regularly reviews the live question pool to ensure that it
remains current. After the pool has been supplemented with the pretest questions
that performed well from the previous examination, the committee selects the
questions for the next live portion of the Certification examination, following
the content blueprint and considering the performance statistics and relevance
An absolute (content-based) standard is set during the live examination
selection meeting using a modification of the Angoff method, which is a
multi-step process. First, the standard setters define the characteristics of
minimally certifiable or ā€œborderlineā€ candidates. As the committee reviews
each question on the examination, each committee member identifies his/her
estimate of the percentage of borderline candidates that will answer the
question correctly. The initial estimates from these content experts are
recorded, and those whose estimates differ significantly from the groupā€™s
average are asked to offer reasons underlying their decision. Discussion follows
with all members free to change their estimates; final judgments are recorded,
and the average percentage for each question is calculated. The average
percentages for all questions are translated to proportions, and the sum of
these proportions (rounded up to the next whole number) becomes the number of
questions candidates must answer to pass the examination.
Unlike Certification examinations, Maintenance of Certification secure
examinations are composed entirely of previously tested live questions.
Maintenance of Certification examinations are selected separately from
Certification examinations. The same general content criteria and focus on
patient-based problems and clinical relevance apply to the Maintenance of
Certification examination as to the Certification examination. However, the
Board recognizes that there are training and practice differences that must be
reflected in the selection of Maintenance of Certification examination
questions; in general, questions are excluded from the Maintenance of
Certification examination if they fall into categories such as the following:
- Basic science/pathophysiology topics not directly related to a clinical
- Clinical problems normally referred in practice but widely incorporated
in the training experience;
- Recent clinical advances incorporated in training before they are widely
used in practice.
The Board continuously monitors candidate performance in both the
Certification and Maintenance of Certification examinations. Initial
Certification and Maintenance of Certification candidates perform the same on
the vast majority of questions appearing in both examinations. However, their
performance differs on a small proportion of shared questions: Certification
candidates perform better than Maintenance of Certification candidates on some
questions, while Maintenance of Certification candidates perform better than
Certification candidates on other questions. The Board continues to analyze
these differences to further refine the Maintenance of Certification