How are Measures Evaluated?
Measurement validity exists when a measure (scale) measures what we think it measures and performs the functions that it purports to perform.
There are three major criteria utilized to evaluate how well a scale measures what it is intended to measure:
(1) Construct validity, often considered most important, is established by demonstrating a scale is related to a variety of other measures as specified in a theory.
(2) Criterion validity is established when results obtained on one scale can be reliably compared to those obtained with a more direct or already validated measure of the same criterion (phenomenon).
(3) Content validity establishes that a measure covers the full range of a concept’s meaning.
Using the standards established by the scientific community, the AdSAM® Emotional Response measurement system passes tests for validity on all three of the major criteria. See the three key criteria listed below for a thorough look at how AdSAM® fulfills each criteria:
AdSAM® Construct Validity
First and considered most important is construct validity. Based on the 1957 classic work by Osgood, Suci and Tannenbaum, “Measurement of Meaning”, and later work by Mehrabian, “Framework for a comprehensive description and measurement of emotional states”, AdSAM® is the theoretically related, non-verbal representation of the three PAD dimensions.
Several studies in both the United States and abroad have validated the SAM scales effectiveness in measuring emotional response as proposed by the PAD theory of emotions. In validating SAM’s ability to measure the PAD dimension, subjects were asked to rate verbal stimuli with both the PAD semantic differential scales and the non-verbal SAM scales (Lang, 1980). The correlations between the scales were very strong (Pleasure = +. 937, Arousal = .938, Dominance = +. 660). Near perfect agreement has been obtained in ratings of Pleasure and Arousal for a set of pictures using the semantic differential scale and SAM (Bradley & Lang, 1994).
Morris et al (1992) assessed the effectiveness of SAM in evaluating emotional responses to advertising messages. This was accomplished by using a set of advertisements that had established PAD scores assessed by the verbal semantic differential scales. Scores obtained in the 1988 Holbrook and Batra study are significantly correlated with SAM ratings (Pleasure r = .36, Arousal r = .65, Dominance r = .37).
AdSAM® Criterion (Predictive) Validity
The second standard used to examine a scale, criterion or predictive validity, has been investigated in repeated scientific studies using SAM. Agreement has been obtained in ratings of Pleasure and Arousal for a set of pictures using the semantic differential scale and SAM (Bradley & Lang, 1994). High correlations have also been obtained in tests of emotional response and attitude to the brand and attitude to the ad.
SAM has been compared to previously validated semantic differential scales to measure emotional responses in a variety of situations, including reactions to pictures (Greenwald, Cook. & Lang, 1989; Lang, Greenwald, Bradley, & Hamm, 1993), images (Miller, Levin, Kozak, Cook, McLean, & Lang, 1987), sounds (Bradley, 1994), advertisements (Morris, Bradley, Waine, & Lang, 1995), painful stimuli (McNeil & Brunetti, 1992) and more. In addition, SAM has been used with children (Greenbaum, Turner, Cook, & Melamed, 1990), anxiety patients (Cook, Melamed, Cuthbert, McNeil, & Lang, 1988), analogue phobics (Hamm, Globisch, Cuthbert, & Vaiti, 1991), psychopaths (Patrick, Bradley, & Lang, 1993), and other clinical populations.
AdSAM® Content Validity
Finally, as a non-verbal measure of emotion, AdSAM® does not suffer from the content validity issues of semantic differential scales. In AdSAM® tests, respondents are not judging samples from a pool of “all items”; instead, because of the visual-self-report nature of the device, subjects are judging the complete width and breadth of the dimension.
Morris and McMullen (1993) demonstrated that multiple emotional responses would be found within a single advertisement. In this study, SAM was shown to be useful not only in measuring the full range of emotions to an entire ad, but also to parts of advertisements.
Morris and Karrah (1994) illustrate the stability of SAM in tapping into the PAD emotions elicited by both advertisements and PSAs across product categories. An important finding in this study was the lack of any systematic difference in the responses of males and females, demonstrating no gender differences in using SAM to measure the full spectrum of emotional response generated by advertisements.
Differences obtained in judgments of Dominance suggest that SAM might be more accurate in tracking the subjects’ — rather than the stimulus’ — feelings of control. SAM allows rapid assessment of what appear to be fundamental dimensions in the organization of human emotional experience, as well as an assessment of the shape of the resulting affective space. Covariation between reports of pleasure and arousal using SAM and responses in physiological and behavioral systems has already demonstrated strong concordance. SAM has been shown to be an effective method for measuring existing feeling states, relating them to other indices of emotional response, and assessing changes due to time, therapeutic intervention, or other processes affecting affective reactions to contextual stimuli.
AdSAM® Emotion Adjectives®
Research utilizing subjects ranging from 18-64, has validated the existence of over 165 emotion-denoting adjectives within the three dimensions of PAD emotion space. Subjects are given decks of emotion-denoting adjectives and asked to use SAM to indicate the feeling. Unstable adjectives (i.e. those with little consistency in response) are deleted from the database and stable adjectives are kept. The adjectives are continually tested to assess the stability. The adjectives are regularly used as a frame of reference for analyzing results and guiding strategic decisions.
Cross Cultural Studies
Osgood, May and Miron (1975) collected data from twenty-one communities of diverse languages and cultures and demonstrated that three PAD affective factors are cross-culturally universal. SAM has been empirically used to assess emotional responses in different European countries (Bradley et al. 1993). Results of these studies demonstrate that SAM is reliable and able to detect cross-cultural differences in emotional responses.
Morris et al. (1996) examined the applicability of AdSAM® in a non-English speaking culture to examine SAM’s ability to examine affective responses for global products. The data shows that Hispanic and general market respondents followed similar patterns in their response. Statistical analysis demonstrated there was no significant difference in scores between groups and that SAM did provide comparable measurements to television commercials across cultures.
Morris and Pai (1997) investigated the different emotional responses to global standardized TV commercials with SAM using Taiwanese and Americans 18 to 81 years old. The results showed that there were no significant overall differences in emotional response to ads. These results further support AdSAM® as a method for measuring emotional response globally, across cultures.
The aforementioned research demonstrates that AdSAM® is a viable alternative to well-established verbal measures of emotional response. These studies strongly establish the construct validity (the highest level of validity scientists attempt to achieve in a scale) of the Self-Assessment Manikin. Not only does AdSAM® measure the core human emotions, it also provides a much easier to administer and less cumbersome scale than verbal based measures, allowing many more marketing stimuli to be examined in a single research setting. In conjunction with the Attitude Self Assessment Manikin, the AdSAM® diagnostic system provides the necessary tools to gather and interpret the full range of emotional response data to any marketing situation or stimuli.
The visual nature of the AdSAM scales and the universality of human emotions enable the AdSAM® system to be utilized to gather and interpret emotional response to global advertisements and cross cultures without additional research or minimal adjustments to the system.
Strictly speaking, one validates the use of the instrument, rather than the instrument itself. Because of this, the developers of AdSAM® work with each client on each questionnaire to insure questions are phrased properly and constructed to uncover the maximum amount of emotional response information.