Scale Reliability and Validity
The previous chapter examined some of the difficulties with measuring constructs in
social science research. For instance, how do we know whether we are measuring
“compassion” and not the “empathy”, since both constructs are somewhat similar in meaning?
Or is compassion the same thing as empathy? What makes it more complex is that sometimes
these constructs are imaginary concepts (i.e., they don’t exist in reality), and multi-dimensional
(in which case, we have the added problem of identifying their constituent dimensions). Hence,
it is not adequate just to measure social science constructs using any scale that we prefer. We
also must test these scales to ensure that: (1) these scales indeed measure the unobservable
construct that we wanted to measure (i.e., the scales are “valid”), and (2) they measure the
intended construct consistently and precisely (i.e., the scales are “reliable”). Reliability and
validity, jointly called the “psychometric properties” of measurement scales, are the yardsticks
against which the adequacy and accuracy of our measurement procedures are evaluated in
scientific research.
A measure can be reliable but not valid, if it is measuring something very consistently
but is consistently measuring the wrong construct. Likewise, a measure can be valid but not
reliable if it is measuring the right construct, but not doing so in a consistent manner. Using the
analogy of a shooting target, as shown in Figure 7.1, a multiple-item measure of a construct that
is both reliable and valid consists of shots that clustered within a narrow range near the center
of the target. A measure that is valid but not reliable will consist of shots centered on the target
but not clustered within a narrow range, but rather scattered around the target. Finally, a
measure that is reliable but not valid will consist of shots clustered within a narrow range but
off from the target. Hence, reliability and validity are both needed to assure adequate
measurement of the constructs of interest.
Figure 7.1. Comparison of reliability and validity
56 | S o c i a l S c i e n c e R e s e a r c h
Reliability
Reliability is the degree to which the measure of a construct is consistent or
dependable. In other words, if we use this scale to measure the same construct multiple times,
do we get pretty much the same result every time, assuming the underlying phenomenon is not
changing? An example of an unreliable measurement is people guessing your weight. Quite
likely, people will guess differently, the different measures will be inconsistent, and therefore,
the “guessing” technique of measurement is unreliable. A more reliable measurement may be
to use a weight scale, where you are likely to get the same value every time you step on the
scale, unless your weight has actually changed between measurements.
Note that reliability implies consistency but not accuracy. In the previous example of
the weight scale, if the weight scale is calibrated incorrectly (say, to shave off ten pounds from
your true weight, just to make you feel better!), it will not measure your true weight and is
therefore not a valid measure. Nevertheless, the miscalibrated weight scale will still give you
the same weight every time (which is ten pounds less than your true weight), and hence the
scale is reliable.
What are the sources of unreliable observations in social science measurements? One of
the primary sources is the observer’s (or researcher’s) subjectivity. If employee morale in a
firm is measured by watching whether the employees smile at each other, whether they make
jokes, and so forth, then different observers may infer different measures of morale if they are
watching the employees on a very busy day (when they have no time to joke or chat) or a light
day (when they are more jovial or chatty). Two observers may also infer different levels of
morale on the same day, depending on what they view as a joke and what is not. “Observation”
is a qualitative measurement technique. Sometimes, reliability may be improved by using
quantitative measures, for instance, by counting the number of grievances filed over one month
as a measure of (the inverse of) morale. Of course, grievances may or may not be a valid
measure of morale, but it is less subject to human subjectivity, and therefore more reliable. A
second source of unreliable observation is asking imprecise or ambiguous questions. For
instance, if you ask people what their salary is, different respondents may interpret this
question differently as monthly salary, annual salary, or per hour wage, and hence, the resulting
observations will likely be highly divergent and unreliable. A third source of unreliability is
asking questions about issues that respondents are not very familiar about or care about, such
as asking an American college graduate whether he/she is satisfied with Canada’s relationship
with Slovenia, or asking a Chief Executive Officer to rate the effectiveness of his company’s
technology strategy – something that he has likely delegated to a technology executive.
So how can you create reliable measures? If your measurement involves soliciting
information from others, as is the case with much of social science research, then you can start
by replacing data collection techniques that depends more on researcher subjectivity (such as
observations) with those that are less dependent on subjectivity (such as questionnaire), by
asking only those questions that respondents may know the answer to or issues that they care
about, by avoiding ambiguous items in your measures (e.g., by clearly stating whether you are
looking for annual salary), and by simplifying the wording in your indicators so that they not
misinterpreted by some respondents (e.g., by avoiding difficult words whose meanings they
may not know). These strategies can improve the reliability of our measures, even though they
will not necessarily make the measurements completely reliable. Measurement instruments
must still be tested for reliability. There are many ways of estimating reliability, which are
discussed next.
Add Your Gadget Here
HIGHLIGHT OF THE WEEK
-
Survey Research Survey research a research method involving the use of standardized questionnaires or interviews to collect data about peop...
-
Inter-rater reliability. Inter-rater reliability, also called inter-observer reliability, is a measure of consistency between two or more i...
-
discriminant validity is exploratory factor analysis. This is a data reduction technique which aggregates a given set of items to a smalle...
-
can estimate parameters of this line, such as its slope and intercept from the GLM. From highschool algebra, recall that straight lines can...
-
Positivist Case Research Exemplar Case research can also be used in a positivist manner to test theories or hypotheses. Such studies are ra...
-
Quantitative Analysis: Descriptive Statistics Numeric data collected in a research project can be analyzed quantitatively using statistical...
-
Probability Sampling Probability sampling is a technique in which every unit in the population has a chance (non-zero probability) of being...
-
Experimental Research Experimental research, often considered to be the “gold standard” in research designs, is one of the most rigorous of...
-
Bivariate Analysis Bivariate analysis examines how two variables are related to each other. The most common bivariate statistic is the biva...
-
Case Research Case research, also called case study, is a method of intensively studying a phenomenon over time within its natural setting ...
Sunday, 13 March 2016
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment