What is the NIH Toolbox?

The NIH Toolbox is a comprehensive set of assessment tools that can be used by clinicians and researchers to assess emotional, cognitive, motor and sensory function in a variety of settings. With the NIH Toolbox, the neuro-behavioral measurements can be assessed quickly (two hours or less) and conveniently from an iPad and other smartphones. Most of the tests can be done in five minutes or less, with self- and proxy- reported measures completed in one minute.

In 2004, the Blueprint for Neuroscience Research was formed with the mission of coming up with new resources, tools and training opportunities to increase the pace of research in neuroscience. The coalition was formed by the coming together of the 15 offices and institutes at the NIH that support research in neuroscience. The NIH Blueprint awarded a contract in 2006 to develop an innovative standard to assess emotional and cognitive health. The NIH Toolbox was introduced in 2012 and adapted for the iPad in 2015. It gives researchers royalty-free measures and tools that facilitate collecting data that can then be compared with future or existing studies in diverse study designs and settings.

The NIH Toolbox provides common standards for neurological research that enhance efficiency and enables economies of scale. It is capable of measuring key constructs through the developmental stages and monitoring behavioral and neurological function over time.  It enables continuity of assessment in research since with these tools, it is possible to measure the same concepts from early childhood to old age, and this facilitates research in such changes across the lifespan of an individual.

A widely accepted questionaire set from the NIH. Standard pricing, unlimited data points during your collection period.


Standard onboarding for TOOLBOX domain modules.

Easy to use

Clear analytics and easy to use interface

Get NIH Toolbox ePRO
NIH Toolbox ePRO
How can we be helpful?
0 /

NIH Toolbox Sensation

NIH Toolbox Motor Function

NIH Toolbox Emotion

NIH Toolbox: Cognition

Who can use the NIH Toolbox?

The tool provides data to the clinicians, researchers and the patients (both children and adults). The NIH Toolbox is designed to assist scientists that are interested in neurological and behavioral research. The NIH Toolbox is mostly relevant in measuring the outcomes in large-scale longitudinal and epidemiological studies as well as clinical, intervention and prevention trials. Clinicians and students also stand to benefit from the NIH Toolbox since it enables cross-study comparison across a broad spectrum of health research. The tools can be administered to subjects ranging from 3-85 years.

How to administer the tools?

The measures employed in the NIH Toolbox make use of several advanced approaches in scoring, item development, and test construction. An example of those approaches includes Computer Adaptive Testing (CAT) and Item Response Theory (IRT).

IRT allows tests to be precise, brief but still valid. Through this procedure, items are calibrated into sets along with a continuum that comprises the full scope of the construct to be measured. It is this calibration that enables CAT. With CAT, users can administer unique, short tests to every participant with reliability as it enables immediate feedback and frequent assessments with precise evaluation at the individual level and with minimal burden to the subject. CAT is a specialized computer-based testing that facilitates short tests to be carried out with scores that can be equivalent to longer, fixed-length assessments.

Domains Covered by NIH Toolbox

The NIH project team set out the four domain batteries to be covered by the NIH Toolbox to include: cognition, sensation, motor, and emotion. Some domains have supplementary measures that can also be administered. There was a preference for instruments that had already been normed and validated for use in individuals between ages 3 – 85.


It refers to the brain activity involved in comprehension and when gaining knowledge. Such mental processes include remembering, thinking, judging, knowing and problem-solving.  The higher-level mental processes include perception, imagining, planning, language and execution of complex behaviors. It is essential to include cognition in any neurological research or study of well-being and health. Cognition is thus included in experimental and other large-scale epidemiologic studies, even when it is not the primary target of the study.

Tests in this battery are recommended for subjects of more than 7 years old and seek to assess the attention, executive function, processing speed, episodic memory, language and the working memory of the subjects. When this test is administered, in addition to the individual measure score, the following are the summary scores:

  • Crystallized Cognition Composite Score (comprises of Reading Recognition measure and Picture Vocabulary).
  • Fluid Cognition Composite Score (includes Flanker, DCCS, List Sorting, Picture Sequence Memory and Pattern Comparison measures).
  • Cognitive Function Composite Score.

For children of the ages between 3 and 6, there is an NIH Toolbox battery available that can be used for cognition. This battery includes measures such as Picture Vocabulary, Flanker, DCCS and Picture Sequence Memory. An Early Childhood Composite Score is attained on administering this battery, that is of course in addition to any individual measure scores.

Working Memory

Working memory refers to the ability to hold information in a short-term buffer, process it and manipulate it across a series of modalities and tasks. The NIH Toolbox’s List Sorting Working Memory Test can be administered to subjects from age 7 to 85. Working memory describes an individual’s ability to hold information until their capacity to store such information is exceeded.


It refers to the ability of an individual to allocate one’s capacities to manage or handle the environmental stimulation. There are different types of attention including selective, sustained and divided. Attention is the foundation of all the other forms of mental processes.

The NIH Toolbox’s Flanker Inhibitory Control and Attention Test can be administered to subjects of ages 3-85. Sustained attention is associated with the state of being alert or the level of wakefulness while selective attention is associated with the direct thought and sensory processes linked to a particular stimulus which may evoke a certain action. Divided attention on its part deals with the ability to manage more than one modality or stimulus simultaneously and may overlap with executive function.

Episodic Memory

It refers to the cognitive processes involved in storage, acquisition, and retrieval of new information. Episodic memory is associated with the recollection of information that has been learnt in a given context. The NIH Toolbox’s Picture Sequence Memory Test can be administered to subjects from ages 3 to 85. It can be a verbal or nonverbal test. Verbal in terms of remembering a conversation or items on a list or nonverbal in terms of imagining a picture that one saw or a place visited a week ago.

Executive Function

It refers to the ability to organize, monitor and plan the execution of behaviors that are directed at achieving a desired objective or goal. There are two tests that can be administered under executive function, both applicable to subjects from the age of 3 to 85; The NIH Toolbox Dimensional Change Card Sort Test and the NIH Toolbox Flanker Inhibitory Control and Attention Test. The Executive Function test measures the ability to switch or shift among multiple aspects of a given task.


The NIH Toolbox tests two aspects of language: reading and deciphering the receptive word. Language refers to the ability to translate thought into gestures or words which can then be shared during communication. Two tests can be administered under language; the first applies to subjects from ages 3 to 85 (The NIH Toolbox Picture Vocabulary Test) while the other is suited for those between the ages of 7 to 85 (The NIH Toolbox Oral Reading Recognition Test). Verbal intelligence can be easily accessed through oral reading; such intelligence is relatively undisturbed by the conditions that alter normal brain activity.

Processing Speed

Processing speed reflects mental efficiency. It refers to the amount of information that a person can process in a given duration or the time an individual takes to process a given set of information. It is crucial for many domains and cognitive functions, sensitivity to change or disease. The NIH Toolbox’s Pattern Comparison Processing Speed Test can be administered to subjects between ages 7 to 85.


Sensation refers to the neurologic and biochemical process of recognizing nervous stimulus as nervous system activity. This test can help determine or examine if an individual has intact sensory functioning. Sensation happens to overlap with other normal body functions such as cognitive and motor functioning; this makes it an important measurement in any epidemiologic and longitudinal study even if sensation is not the primary goal of the study.

Across the lifespan of an individual, substantial changes in the sensory functioning occur making it important to define the age-related sensory decline or improvement. Assessments made under sensation include Audition, Visual Acuity, Vestibular Balance,  Olfaction, Taste, and Pain.

NIH Toolbox Early Childhood Sensation Battery

  • This battery measures Olfaction, Vestibular Balance, and Visual Acuity.
  • It is recommended for ages 3-6.

NIH Toolbox Sensation and Pain Battery

  • This battery measures Visual Acuity, Audition, Vestibular Balance, Taste, Olfaction, and Pain.
  • It is recommended for subjects more than 7 years old with Taste (12+) and Pain (18+).


This domain comprises of complex physiological processes. Such processes involve different systems such as musculoskeletal, neuromuscular, sensory-perceptual, neural motor and cardiopulmonary systems. Determination of the motor functional status is important as it is directly connected to the day to day functioning and quality of one’s life. It can serve as an indicator of the burden of disease, current physical health status and long-term health outcomes. On account of its significance in overall neurological functioning and health, motor function is also included as a key domain in the NIH Toolbox. It includes measures of:


The Standing Balance Test is used to assess the balance of an individual.


The Grip Strength Test is used to measure the upper extremity muscle strength by use of a hand dynamometer.


Dexterity measures a person’s ability to manipulate objects and coordinate fingers in a timely manner. A 9-Hole pegboard Dexterity Test is used to assess Dexterity.


A 4-meter Walk Gait Speed is used to assess the locomotion ability of an individual.


In the context of overall fitness, endurance refers to the ability of an individual to maintain an effort that involves conjoint work capacities from biomechanical, neuromuscular and cardiopulmonary functions. A 2-minute Walk Endurance Test is administered to measure the endurance of an individual.


This domain measures any strong feelings such as sorrow, joy, and fear. Emotion refers to an affective state of consciousness whereby one of the mentioned feelings is experienced as opposed to volitional and cognitive states of consciousness.

Measures in the emotion domain are normally administered as computer adaptive tests or fixed-length forms depending on the subject’s age bracket. Measures include parent-report versions for some ages as well as self-report.

Emotions can be distressing, positive or negative. Positive emotions are an indicator of positive social relationships and well-being in our lives. This can serve as a buffer against stress and enhance a person’s health.

NIH Toolbox Parent Proxy Emotion Battery

It includes measures of Positive Peer Interaction, Positive Affect, Social Withdrawal, General Life Satisfaction, Empathic Behaviors, Peer Rejection, Fear, Self-Efficacy, Anger, Perceived Stress, and sadness.

It is recommended for parents with children that are between 3 and 12 years old.

For parents with children of ages 3-7, surveys for Fear (Separation Anxiety and Over Anxious) are included.

NIH Toolbox Emotion Battery

This battery can be administered to subjects of more than 8 years.

It consists of Emotional support, Positive Affect, Loneliness, General Life Satisfaction, Perceived Rejection, Friendship, Self-Efficacy, Anger, Fear, Perceived Hostility and Perceived Stress.

For subjects 18 years and above, the battery includes surveys of purpose and meaning.

Selection of the NIH Toolbox instruments

More than 1400 instruments were evaluated by the NIH project team for possible inclusion in the batteries.  The instruments were included in the NIH Toolbox if they met the following specifications; applicability across the lifespan of a subject, royalty free, psychometrically sound, available for people over a wide range, brevity and easy to use and applicability in a variety of populations (children, Spanish speakers and people with disabilities) and settings. Most of the instruments proposed did not meet these specifications and had to be expanded or modified so as to meet these requirements. In some cases, novel instruments were created.

Validation and Norming

Validation studies were later conducted to ensure that the tools met all the rigorous scientific standards. These validation studies were carried out on 400-450 subjects across the entire age range and were later compared against the “gold standard” measures whenever applicable. Calibration samples for tests that used Item Response Theory technique to scoring included robust models that involved several thousand participants. In the end, as part of the field-test validation and calibration activities, data was collected from more than 16,000 subjects.

The NIH project team carried out a large national standardization study in both Spanish and English to facilitate normative comparisons on each assessment. A sample comprising of 4,859 subjects of ages 3-85 was administered all the NIH Toolbox measures. The participants chosen from various sites in the country represented different races/ethnicities, genders, and socioeconomic status. The normative scores for ages 3-85 are now available enabling accurate comparison of any targeted subject groups against the U.S. population.

The NIH project team engaged an expert team on early childhood when developing the NIH Toolbox instrument that would be used to assess young children (ages 3-6), ensuring that the tests administered were developmentally appropriate for this age group. The expert team gave input on measuring development, provided appropriate guidelines for testing the young and reviewed the NIH Toolbox measures in light of the needs of the children in this age group.

Advantages of the NIH Toolbox

  • Royalty free
  • Developed to be handy for measuring outcomes in longitudinal studies.
  • The entire battery can be administered in a short period (2hrs).
  • Created to be adaptable to advances in technology and changes in measurement.
  • It employs CAT enabling tests to be calibrated with a participant’s abilities.
  • It is a cheaper alternative in research as compared to the other alternatives.
  • It employs the state-of-the-art psychometric approaches.
  • Developed to enable cross-measure comparisons.
  • It is also available in English and Spanish.


For a long time, researchers had a hard time comparing data within a longitudinal study or across different studies. This was mainly due to lack of a common standard in data collection on the different aspects of research. Researchers often used all sorts of different assessments and measures. Researchers also had a hard time following up an individual over a long period of time as there were different tests applied across the separate age groups. That is now all in the past, thanks to the NIH Toolbox.

Before the NIH Toolbox was developed, it was very expensive in terms of time and subject burden for scientists to carry out research on neural function. Collecting data on the four domains of sensation, emotion, cognition, and function was not as easy as it is now with the NIH toolbox. In addition to this, studies that collected such information did not apply uniform measures that were to be used to obtain the constructs. The NIH Toolbox has brought a common language in research which has enabled the sharing of large data sets and the assessment of function using a common metric. This supports scientific discovery as the domains provide streamlined measures that have minimal cost and subject burden.


NIH tool box brochure. (2017). Retrieved from

NIH Toolbox. (2017). Retrieved from