
For 75 years psychologists have developed precise measures of human intelligence
and somewhat less precise but nonetheless useful instruments for describing
human personality factors. Unfortunately, we have been less successful in
assessing human aptitudes for operating nuclear reactors, controlling air
and surface traffic, directing civil disaster responses, and providing emergency
medical services, to name but a few of the many complex operations humans
perform daily. Despite the investment of huge sums by the military in the
development and validation of selection batteries, their tests account for
no more than about 25 percent of the variance in training success and have
no evident correlation with operational performance.
The need for valid tests of complex operational aptitude is increasing as
the explosion in information technology and associated automation makes
more complex operations possible and the cost of placing the wrong person
in charge greater than ever. Increasing the information available gives
the operator more to attend to, and automation makes it all the more important
and difficult to keep track of everything that is going on and decide when
some intervention is critical. This is now called situational awareness,
and in the case of crew operations, we call it team resource management.
The costs of haphazard personnel selection are not limited to those resulting from bad judgment and mismanagement of critical operations. It is also costly to invest in the training of individuals who fail to reach criterion performance levels after training or, worse yet, pass all training tests but then are unable to stand up under operational stress. As so often happens with ATC trainees, the individual may have all of the skills and knowledge normally required but be unable to put them together in the confusion of a complex incident.
The failure to develop tests of high predictive validity for complex
operational aptitude has been caused by several factors, the first of which
is the usual clouding of operational performance criteria against which
to validate any such test. If measures of complex job performance are unreliable,
as they typically are, there is no way that the high predictive validity
of a test can be shown statistically. The pass-fail criterion would be of
value if approximately equal numbers of trainees passed and failed, but
when the ratio is four or five to one, as in ATC training programs, for
example, it is almost worthless. Rating scales are no better when almost
all trainees are given the same grade.
Aside from the criterion problem, development of effective aptitude tests
has been crippled by the notion that performance of complex operations depends
on a collection of individually simple abilities. Consistent with this idea,
batteries have been developed to test reaction time, manual dexterity, short-
and long-term memory, spatial orientation, and the like. The fact that such
batteries account for only about 25 percent of the variance in training
success is also caused in part by the correlations among the so-called factors
measured by the individual tests. Any one or two of the tests provides almost
as much predictive power as the entire battery. Administering the rest of
the battery is a waste.

The secret of operational aptitude testing is to recognize the complexity
of what we are trying to predict and construct a measuring instrument of
similar complexity. The fact that expanding a test battery adds little
predictive validity does not mean that a selection test should be short
to be cost effective. It is wishful to expect situational awareness and
stress tolerance to be revealed reliably in a short test. If a day or even
part of two days is required by most candidates to approach a terminal performance
level on an aptitude test, its application would still be cost effective
if only candidates of high aptitude were selected and the potential failures
were rejected before large sums had been invested in their training.
While situational complexity is necessary to test situational awareness,
it is not sufficient. To avoid confounding basic aptitude with the effect
of prior training in specific tasks, the elements that comprise the test
must be unlike any real-world activities such as operating computers or
controlling specific vehicles. Furthermore, the individual subtasks must
be sufficiently simple to allow their mastery in a short practice period
before combining them in the test situation. Sufficient situational complexity
can be achieved by the manner in which the individually simple subtasks
are combined in an adaptive scenario involving multiple sources of information
and multiple response alternatives.
A complex system operator must search for, evaluate, and integrate information
about all relevant events, conditions, and resources, quickly assess changes
in situational priorities, and allocate attention accordingly. To determine
an individual's aptitude for meeting these demands requires a complex test
in which high scores depend on: