Pages

Add Your Gadget Here

HIGHLIGHT OF THE WEEK

Sunday, 13 March 2016

can estimate parameters of this line, such as its slope and intercept from the GLM. From highschool algebra, recall that straight lines can be represented using the mathematical equation y = mx + c, where m is the slope of the straight line (how much does y change for unit change in x) and c is the intercept term (what is the value of y when x is zero). In GLM, this equation is represented formally as: y = β0 + β1 x + ε where β0 is the slope, β1 is the intercept term, and ε is the error term. ε represents the deviation of actual observations from their estimated values, since most observations are close to the line but do not fall exactly on the line (i.e., the GLM is not perfect). Note that a linear model can have more than two predictors. To visualize a linear model with two predictors, imagine a threedimensional cube, with the outcome (y) along the vertical axis, and the two predictors (say, x1 and x2) along the two horizontal axes along the base of the cube. A line that describes the relationship between two or more variables is called a regression line, β0 and β1 (and other beta values) are called regression coefficients, and the process of estimating regression coefficients is called regression analysis. The GLM for regression analysis with n predictor variables is: y = β0 + β1 x1 + β2 x2 + β3 x3 + … + βn xn + ε In the above equation, predictor variables xi may represent independent variables or covariates (control variables). Covariates are variables that are not of theoretical interest but may have some impact on the dependent variable y and should be controlled, so that the residual effects of the independent variables of interest are detected more precisely. Covariates capture systematic errors in a regression equation while the error term (ε) captures random errors. Though most variables in the GLM tend to be interval or ratio-scaled, this does not have to be the case. Some predictor variables may even be nominal variables (e.g., gender: male or female), which are coded as dummy variables. These are variables that can assume one of only two possible values: 0 or 1 (in the gender example, “male” may be designated as 0 and “female” as 1 or vice versa). A set of n nominal variables is represented using n–1 dummy variables. For instance, industry sector, consisting of the agriculture, manufacturing, and service sectors, may be represented using a combination of two dummy variables (x1, x2), with (0, 0) for agriculture, (0, 1) for manufacturing, and (1, 1) for service. It does not matter which level of a nominal variable is coded as 0 and which level as 1, because 0 and 1 values are treated as two distinct groups (such as treatment and control groups in an experimental design), rather than as numeric quantities, and the statistical parameters of each group are estimated separately. The GLM is a very powerful statistical tool because it is not one single statistical method, but rather a family of methods that can be used to conduct sophisticated analysis with different types and quantities of predictor and outcome variables. If we have a dummy predictor variable, and we are comparing the effects of the two levels (0 and 1) of this dummy variable on the outcome variable, we are doing an analysis of variance (ANOVA). If we are doing ANOVA while controlling for the effects of one or more covariate, we have an analysis of covariance (ANCOVA). We can also have multiple outcome variables (e.g., y1, y1, … yn), which are represented using a “system of equations” consisting of a different equation for each outcome variable (each with its own unique set of regression coefficients). If multiple outcome variables are modeled as being predicted by the same set of predictor variables, the resulting analysis is called multivariate regression. If we are doing ANOVA or ANCOVA analysis with multiple outcome variables, the resulting analysis is a multivariate ANOVA (MANOVA) or multivariate ANCOVA (MANCOVA) respectively. If we model the outcome in one regression equation as a 132 | S o c i a l S c i e n c e R e s e a r c h predictor in another equation in an interrelated system of regression equations, then we have a very sophisticated type of analysis called structural equation modeling. The most important problem in GLM is model specification, i.e., how to specify a regression equation (or a system of equations) to best represent the phenomenon of interest. Model specification should be based on theoretical considerations about the phenomenon being studied, rather than what fits the observed data best. The role of data is in validating the model, and not in its specification. Two-Group Comparison One of the simplest inferential analyses is comparing the post-test outcomes of treatment and control group subjects in a randomized post-test only control group design, such as whether students enrolled to a special program in mathematics perform better than those in a traditional math curriculum. In this case, the predictor variable is a dummy variable (1=treatment group, 0=control group), and the outcome variable, performance, is ratio scaled (e.g., score of a math test following the special program). The analytic technique for this simple design is a one-way ANOVA (one-way because it involves only one predictor variable), and the statistical test used is called a Student’s t-test (or t-test, in short). The t-test was introduced in 1908 by William Sealy Gosset, a chemist working for the Guiness Brewery in Dublin, Ireland to monitor the quality of stout – a dark beer popular with 19th century porters in London. Because his employer did not want to reveal the fact that it was using statistics for quality control, Gosset published the test in Biometrika using his pen name “Student” (he was a student of Sir Ronald Fisher), and the test involved calculating the value of t, which was a letter used frequently by Fisher to denote the difference between two groups. Hence, the name Student’s t-test, although Student’s identity was known to fellow statisticians. The t-test examines whether the means of two groups are statistically different from each other (non-directional or two-tailed test), or whether one group has a statistically larger (or smaller) mean than the other (directional or one-tailed test). In our example, if we wish to examine whether students in the special math curriculum perform better than those in traditional curriculum, we have a one-tailed test. This hypothesis can be stated as: H0: μ1 ≤ μ2 (null hypothesis) H1: μ1 > μ2 (alternative hypothesis) where μ1 represents the mean population performance of students exposed to the special curriculum (treatment group) and μ2 is the mean population performance of students with traditional curriculum (control group). Note that the null hypothesis is always the one with the “equal” sign, and the goal of all statistical significance tests is to reject the null hypothesis. How can we infer about the difference in population means using data from samples drawn from each population? From the hypothetical frequency distributions of the treatment and control group scores in Figure 15.2, the control group appears to have a bell-shaped (normal) distribution with a mean score of 45 (on a 0-100 scale), while the treatment group appear to have a mean score of 65. These means look different, but they are really sample means ( ), which may differ from their corresponding population means (μ) due to sampling error. Sample means are probabilistic estimates of population means within a certain confidence interval (95% CI is sample mean + two standard errors, where standard error is the standard deviation of the distribution in sample means as taken from infinite samples of the population. Hence, statistical significance of population means depends not only on sample Q u a n t i t a t i v e A n a l y s i s : I n f e r e n t i a l S t a t i s t i c s | 133 mean scores, but also on the standard error or the degree of spread in the frequency distribution of the sample means. If the spread is large (i.e., the two bell-shaped curves have a lot of overlap), then the 95% CI of the two means may also be overlapping, and we cannot conclude with high probability (p<0.05) that that their corresponding population means are significantly different. However, if the curves have narrower spreads (i.e., they are less overlapping), then the CI of each mean may not overlap, and we reject the null hypothesis and say that the population means of the two groups are significantly different at p<0
Quantitative Analysis: Inferential Statistics Inferential statistics are the statistical procedures that are used to reach conclusions about associations between variables. They differ from descriptive statistics in that they are explicitly designed to test hypotheses. Numerous statistical procedures fall in this category, most of which are supported by modern statistical software such as SPSS and SAS. This chapter provides a short primer on only the most basic and frequent procedures; readers are advised to consult a formal text on statistics or take a course on statistics for more advanced procedures. Basic Concepts British philosopher Karl Popper said that theories can never be proven, only disproven. As an example, how can we prove that the sun will rise tomorrow? Popper said that just because the sun has risen every single day that we can remember does not necessarily mean that it will rise tomorrow, because inductively derived theories are only conjectures that may or may not be predictive of future phenomenon. Instead, he suggested that we may assume a theory that the sun will rise every day without necessarily proving it, and if the sun does not rise on a certain day, the theory is falsified and rejected. Likewise, we can only reject hypotheses based on contrary evidence but can never truly accept them because presence of evidence does not mean that we may not observe contrary evidence later. Because we cannot truly accept a hypothesis of interest (alternative hypothesis), we formulate a null hypothesis as the opposite of the alternative hypothesis, and then use empirical evidence to reject the null hypothesis to demonstrate indirect, probabilistic support for our alternative hypothesis. A second problem with testing hypothesized relationships in social science research is that the dependent variable may be influenced by an infinite number of extraneous variables and it is not plausible to measure and control for all of these extraneous effects. Hence, even if two variables may seem to be related in an observed sample, they may not be truly related in the population, and therefore inferential statistics are never certain or deterministic, but always probabilistic. How do we know whether a relationship between two variables in an observed sample is significant, and not a matter of chance? Sir Ronald A. Fisher, one of the most prominent statisticians in history, established the basic guidelines for significance testing. He said that a statistical result may be considered significant if it can be shown that the probability of it being rejected due to chance is 5% or less. In inferential statistics, this probability is called the p- 130 | S o c i a l S c i e n c e R e s e a r c h value, 5% is called the significance level (α), and the desired relationship between the p-value and α is denoted as: p≤0.05. The significance level is the maximum level of risk that we are willing to accept as the price of our inference from the sample to the population. If the p-value is less than 0.05 or 5%, it means that we have a 5% chance of being incorrect in rejecting the null hypothesis or having a Type I error. If p>0.05, we do not have enough evidence to reject the null hypothesis or accept the alternative hypothesis. We must also understand three related statistical concepts: sampling distribution, standard error, and confidence interval. A sampling distribution is the theoretical distribution of an infinite number of samples from the population of interest in your study. However, because a sample is never identical to the population, every sample always has some inherent level of error, called the standard error. If this standard error is small, then statistical estimates derived from the sample (such as sample mean) are reasonably good estimates of the population. The precision of our sample estimates is defined in terms of a confidence interval (CI). A 95% CI is defined as a range of plus or minus two standard deviations of the mean estimate, as derived from different samples in a sampling distribution. Hence, when we say that our observed sample estimate has a CI of 95%, what we mean is that we are confident that 95% of the time, the population parameter is within two standard deviations of our observed sample estimate. Jointly, the p-value and the CI give us a good idea of the probability of our result and how close it is from the corresponding population parameter. General Linear Model Most inferential statistical procedures in social science research are derived from a general family of statistical models called the general linear model (GLM). A model is an estimated mathematical equation that can be used to represent a set of data, and linear refers to a straight line. Hence, a GLM is a system of equations that can be used to represent linear patterns of relationships in observed data. Figure 15.1. Two-variable linear model The simplest type of GLM is a two-variable linear model that examines the relationship between one independent variable (the cause or predictor) and one dependent variable (the effect or outcome). Let us assume that these two variables are age and self-esteem respectively. The bivariate scatterplot for this relationship is shown in Figure 15.1, with age (predictor) along the horizontal or x-axis and self-esteem (outcome) along the vertical or y-axis. From the scatterplot, it appears that individual observations representing combinations of age and selfesteem generally seem to be scattered around an imaginary upward sloping straight line.
The easiest way to test for the above hypothesis is to look up critical values of r from statistical tables available in any standard text book on statistics or on the Internet (most software programs also perform significance testing). The critical value of r depends on our desired significance level (α = 0.05), the degrees of freedom (df), and whether the desired test is a one-tailed or two-tailed test. The degree of freedom is the number of values that can vary freely in any calculation of a statistic. In case of correlation, the df simply equals n – 2, or for the data in Table 14.1, df is 20 – 2 = 18. There are two different statistical tables for one-tailed and two-tailed test. In the two-tailed table, the critical value of r for α = 0.05 and df = 18 is 0.44. For our computed correlation of 0.79 to be significant, it must be larger than the critical value of 0.44 or less than -0.44. Since our computed value of 0.79 is greater than 0.44, we conclude that there is a significant correlation between age and self-esteem in our data set, or in other words, the odds are less than 5% that this correlation is a chance occurrence. Therefore, we can reject the null hypotheses that r ≤ 0, which is an indirect way of saying that the alternative hypothesis r > 0 is probably correct. Most research studies involve more than two variables. If there are n variables, then we will have a total of n*(n-1)/2 possible correlations between these n variables. Such correlations are easily computed using a software program like SPSS, rather than manually using the formula for correlation (as we did in Table 14.1), and represented using a correlation matrix, as shown in Table 14.2. A correlation matrix is a matrix that lists the variable names along the first row and the first column, and depicts bivariate correlations between pairs of variables in the appropriate cell in the matrix. The values along the principal diagonal (from the top left to the bottom right corner) of this matrix are always 1, because any variable is always perfectly correlated with itself. Further, since correlations are non-directional, the correlation between variables V1 and V2 is the same as that between V2 and V1. Hence, the lower triangular matrix (values below the principal diagonal) is a mirror reflection of the upper triangular matrix (values above the principal diagonal), and therefore, we often list only the lower triangular matrix for simplicity. If the correlations involve variables measured using interval scales, then this specific type of correlations are called Pearson product moment correlations. Another useful way of presenting bivariate data is cross-tabulation (often abbreviated to cross-tab, and sometimes called more formally as a contingency table). A cross-tab is a table that describes the frequency (or percentage) of all combinations of two or more nominal or categorical variables. As an example, let us assume that we have the following observations of gender and grade for a sample of 20 students, as shown in Figure 14.3. Gender is a nominal variable (male/female or M/F), and grade is a categorical variable with three levels (A, B, and C). A simple cross-tabulation of the data may display the joint distribution of gender and grades (i.e., how many students of each gender are in each grade category, as a raw frequency count or as a percentage) in a 2 x 3 matrix. This matrix will help us see if A, B, and C grades are equally 126 | S o c i a l S c i e n c e R e s e a r c h distributed across male and female students. The cross-tab data in Table 14.3 shows that the distribution of A grades is biased heavily toward female students: in a sample of 10 male and 10 female students, five female students received the A grade compared to only one male students. In contrast, the distribution of C grades is biased toward male students: three male students received a C grade, compared to only one female student. However, the distribution of B grades was somewhat uniform, with six male students and five female students. The last row and the last column of this table are called marginal totals because they indicate the totals across each category and displayed along the margins of the table. Table 14.2. A hypothetical correlation matrix for eight variables Table 14.3. Example of cross-tab analysis Although we can see a distinct pattern of grade distribution between male and female students in Table 14.3, is this pattern real or “statistically significant”? In other words, do the above frequency counts differ from that that may be expected from pure chance? To answer this question, we should compute the expected count of observation in each cell of the 2 x 3 cross-tab matrix. This is done by multiplying the marginal column total and the marginal row total for each cell and dividing it by the total number of observations. For example, for the male/A grade cell, expected count = 5 * 10 / 20 = 2.5. In other words, we were expecting 2.5 male students to receive an A grade, but in reality, only one student received the A grade. Whether this difference between expected and actual count is significant can be tested using a chi-square test. The chi-square statistic can be computed as the average difference between 
Bivariate Analysis Bivariate analysis examines how two variables are related to each other. The most common bivariate statistic is the bivariate correlation (often, simply called “correlation”), which is a number between -1 and +1 denoting the strength of the relationship between two variables. Let’s say that we wish to study how age is related to self-esteem in a sample of 20 respondents, i.e., as age increases, does self-esteem increase, decrease, or remains unchanged. If self-esteem increases, then we have a positive correlation between the two variables, if selfesteem decreases, we have a negative correlation, and if it remains the same, we have a zero correlation. To calculate the value of this correlation, consider the hypothetical dataset shown in Table 14.1. Q u a n t i t a t i v e A n a l y s i s : D e s c r i p t i v e S t a t i s t i c s | 123 Figure 14.2. Normal distribution Table 14.1. Hypothetical data on age and self-esteem The two variables in this dataset are age (x) and self-esteem (y). Age is a ratio-scale variable, while self-esteem is an average score computed from a multi-item self-esteem scale measured using a 7-point Likert scale, ranging from “strongly disagree” to “strongly agree.” The histogram of each variable is shown on the left side of Figure 14.3. The formula for calculating bivariate correlation is: where rxy is the correlation, x and y are the sample means of x and y, and sx and sy are the standard deviations of x and y. The manually computed value of correlation between age and self-esteem, using the above formula as shown in Table 14.1, is 0.79. This figure indicates 124 | S o c i a l S c i e n c e R e s e a r c h that age has a strong positive correlation with self-esteem, i.e., self-esteem tends to increase with increasing age, and decrease with decreasing age. Such pattern can also be seen from visually comparing the age and self-esteem histograms shown in Figure 14.3, where it appears that the top of the two histograms generally follow each other. Note here that the vertical axes in Figure 14.3 represent actual observation values, and not the frequency of observations (as was in Figure 14.1), and hence, these are not frequency distributions but rather histograms. The bivariate scatter plot in the right panel of Figure 14.3 is essentially a plot of self-esteem on the vertical axis against age on the horizontal axis. This plot roughly resembles an upward sloping line (i.e., positive slope), which is also indicative of a positive correlation. If the two variables were negatively correlated, the scatter plot would slope down (negative slope), implying that an increase in age would be related to a decrease in self-esteem and vice versa. If the two variables were uncorrelated, the scatter plot would approximate a horizontal line (zero slope), implying than an increase in age would have no systematic bearing on self-esteem. Figure 14.3. Histogram and correlation plot of age and self-esteem After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis: H0: r = 0 H1: r ≠ 0 H0 is called the null hypotheses, and H1 is called the alternative hypothesis (sometimes, also represented as Ha). Although they may seem like two hypotheses, H0 and H1 actually represent a single hypothesis since they are direct opposites of each other. We are interested in testing H1 rather than H0. Also note that H1 is a non-directional hypotheses since it does not specify whether r is greater than or less than zero. Directional hypotheses will be specified as H0: r ≤ 0; H1: r > 0 (if we are testing for a positive correlation). Significance testing of directional hypothesis is done using a one-tailed t-test, while that for non-directional hypothesis is done using a two-tailed t-test. Q u a n t i t a t i v e A n a l y s i s : D e s c r i p t i v e S t a t i s t i c s | 125 In statistical testing, the alternative hypothesis cannot be tested directly. Rather, it is tested indirectly by rejecting the null hypotheses with a certain level of probability. Statistical testing is always probabilistic, because we are never sure if our inferences, based on sample data, apply to the population, since our sample never equals the population. The probability that a statistical inference is caused pure chance is called the p-value. The p-value is compared with the significance level (α), which represents the maximum level of risk that we are willing to take that our inference is incorrect. For most statistical analysis, α is set to 0.05. A p-value less than α=0.05 indicates that we have enough statistical evidence to reject the null hypothesis, and thereby, indirectly accept the alternative hypothesis. If p>0.05, then we do not have adequate statistical evidence to reject the null hypothesis or accept the alternative hypothesis
Univariate Analysis Univariate analysis, or analysis of a single variable, refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: (1) frequency distribution, (2) central tendency, and (3) dispersion. The frequency distribution of a variable is a summary of the frequency (or percentages) of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services (as a measure of their “religiosity”) using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for “did not answer.” If we count the number (or percentage) of observations within each category (except “did not answer” which is really a missing value rather than a category), and display it in the form of a table as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category. Figure 14.1. Frequency distribution of religiosity With very large samples where observations are independent and random, the frequency distribution tends to follow a plot that looked like a bell-shaped curve (a smoothed bar chart of the frequency distribution) similar to that shown in Figure 14.2, where most observations are clustered toward the center of the range of values, and fewer and fewer observations toward the extreme ends of the range. Such a curve is called a normal distribution. Central tendency is an estimate of the center of a distribution of values. There are three major estimates of central tendency: mean, median, and mode. The arithmetic mean (often simply called the “mean”) is the simple average of all values in a given distribution. Consider a set of eight test scores: 15, 22, 21, 18, 36, 15, 25, 15. The arithmetic mean of these values is (15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8 = 20.875. Other types of means include geometric mean (nth root of the product of n numbers in a distribution) and harmonic mean (the reciprocal of the arithmetic means of the reciprocal of each value in a distribution), but these means are not very popular for statistical analysis of social research data. 122 | S o c i a l S c i e n c e R e s e a r c h The second measure of central tendency, the median, is the middle value within a range of values in a distribution. This is computed by sorting all values in a distribution in increasing order and selecting the middle value. In case there are two middle values (if there is an even number of values in a distribution), the average of the two middle values represent the median. In the above example, the sorted values are: 15, 15, 15, 18, 22, 21, 25, 36. The two middle values are 18 and 22, and hence the median is (18 + 22)/2 = 20. Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic. Dispersion refers to the way values are spread around the central tendency, for example, how tightly or how widely are the values clustered around the mean. Two common measures of dispersion are the range and standard deviation. The range is the difference between the highest and lowest values in a distribution. The range in our previous example is 36-15 = 21. The range is particularly sensitive to the presence of outliers. For instance, if the highest value in the above distribution was 85 and the other vales remained the same, the range would be 85-15 = 70. Standard deviation, the second measure of dispersion, corrects for such outliers by using a formula that takes into account how close or how far each value from the distribution mean: where σ is the standard deviation, xi is the ith observation (or value), µ is the arithmetic mean, n is the total number of observations, and Σ means summation across all observations. The square of the standard deviation is called the variance of a distribution. In a normally distributed frequency distribution, it is seen that 68% of the observations lie within one standard deviation of 
Quantitative Analysis: Descriptive Statistics Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarize themselves with one of these programs for understanding the concepts described in this chapter. Data Preparation In research projects, data may be collected from a variety of sources: mail-in surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine-readable, numeric format, such as in a spreadsheet or a text file, so that they can be analyzed by computer programs like SPSS or SAS. Data preparation usually follows the following steps. Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a five-point, seven-point, or some other type of scale), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from “strongly disagree” to “strongly agree”, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analyzed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, for measuring a construct such as “benefits of computers,” if a survey provided respondents with a checklist of 120 | S o c i a l S c i e n c e R e s e a r c h benefits that they could select from (i.e., they could choose as many of those benefits as they wanted), then the total number of checked items can be used as an aggregate measure of benefits. Note that many other forms of data, such as interview transcripts, cannot be converted into a numeric format for statistical analysis. Coding is especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data. Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format (e.g., SPSS stores data as .sav files), which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database, where they can be reorganized as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet such as Microsoft Excel, while larger dataset with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet and each measurement item can be represented as one column. The entered data should be frequently checked for accuracy, via occasional spot checks on a set of items or observations, during and after entry. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the “strongly agree” response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis. Missing values. Missing data is an inevitable part of any empirical data set. Respondents may not answer certain questions if they are ambiguously worded or too sensitive. Such problems should be detected earlier during pretests and corrected before the main data collection process begins. During data entry, some statistical programs automatically treat blank entries as missing values, while others require a specific numeric value such as -1 or 999 to be entered to denote a missing value. During data analysis, the default mode of handling missing values in most software programs is to simply drop the entire observation containing even a single missing value, in a technique called listwise deletion. Such deletion can significantly shrink the sample size and make it extremely difficult to detect small effects. Hence, some software programs allow the option of replacing missing values with an estimated value via a process called imputation. For instance, if the missing value is one item in a multiitem scale, the imputed value may be the average of the respondent’s responses to remaining items on that scale. If the missing value belongs to a single-item scale, many researchers use the average of other respondent’s responses to that item as the imputed value. Such imputation may be biased if the missing value is of a systematic nature rather than a random nature. Two methods that can produce relatively unbiased estimates for imputation are the maximum likelihood procedures and multiple imputation methods, both of which are supported in popular software programs such as SPSS and SAS. Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items, where items convey the opposite meaning of that of their underlying construct, should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set Q u a n t i t a t i v e A n a l y s i s : D e s c r i p t i v e S t a t i s t i c s | 121 of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges). 
Hermeneutic Analysis Hermeneutic analysis is a special type of content analysis where the researcher tries to “interpret” the subjective meaning of a given text within its socio-historic context. Unlike grounded theory or content analysis, which ignores the context and meaning of text documents during the coding process, hermeneutic analysis is a truly interpretive technique for analyzing qualitative data. This method assumes that written texts narrate an author’s experience within a socio-historic context, and should be interpreted as such within that context. Therefore, the researcher continually iterates between singular interpretation of the text (the part) and a holistic understanding of the context (the whole) to develop a fuller understanding of the phenomenon in its situated context, which German philosopher Martin Heidegger called the 20 Schilling, J. (2006). “On the Pragmatics of Qualitative Assessment: Designing the Process for Content Analysis,” European Journal of Psychological Assessment (22:1), 28-37. Q u a l i t a t i v e A n a l y s i s | 117 hermeneutic circle. The word hermeneutic (singular) refers to one particular method or strand of interpretation. More generally, hermeneutics is the study of interpretation and the theory and practice of interpretation. Derived from religious studies and linguistics, traditional hermeneutics, such as biblical hermeneutics, refers to the interpretation of written texts, especially in the areas of literature, religion and law (such as the Bible). In the 20th century, Heidegger suggested that a more direct, non-mediated, and authentic way of understanding social reality is to experience it, rather than simply observe it, and proposed philosophical hermeneutics, where the focus shifted from interpretation to existential understanding. Heidegger argued that texts are the means by which readers can not only read about an author’s experience, but also relive the author’s experiences. Contemporary or modern hermeneutics, developed by Heidegger’s students such as Hans-Georg Gadamer, further examined the limits of written texts for communicating social experiences, and went on to propose a framework of the interpretive process, encompassing all forms of communication, including written, verbal, and non-verbal, and exploring issues that restrict the communicative ability of written texts, such as presuppositions, language structures (e.g., grammar, syntax, etc.), and semiotics (the study of written signs such as symbolism, metaphor, analogy, and sarcasm). The term hermeneutics is sometimes used interchangeably and inaccurately with exegesis, which refers to the interpretation or critical explanation of written text only and especially religious texts. Conclusions Finally, standard software programs, such as ATLAS.ti.5, NVivo, and QDA Miner, can be used to automate coding processes in qualitative research methods. These programs can quickly and efficiently organize, search, sort, and process large volumes of text data using userdefined rules. To guide such automated analysis, a coding schema should be created, specifying the keywords or codes to search for in the text, based on an initial manual examination of sample text data. The schema can be organized in a hierarchical manner to organize codes into higher-order codes or constructs. The coding schema should be validated using a different sample of texts for accuracy and adequacy. However, if the coding schema is biased or incorrect, the resulting analysis of the entire population of text may be flawed and noninterpretable. However, software programs cannot decipher the meaning behind the certain words or phrases or the context within which these words or phrases are used (such as those in sarcasms or metaphors), 
selectively sampled to validate the central category and its relationships to other categories (i.e., the tentative theory). Selective coding limits the range of analysis, and makes it move fast. At the same time, the coder must watch out for other categories that may emerge from the new data that may be related to the phenomenon of interest (open coding), which may lead to further refinement of the initial theory. Hence, open, axial, and selective coding may proceed simultaneously. Coding of new data and theory refinement continues until theoretical saturation is reached, i.e., when additional data does not yield any marginal change in the core categories or the relationships. The “constant comparison” process implies continuous rearrangement, aggregation, and refinement of categories, relationships, and interpretations based on increasing depth of understanding, and an iterative interplay of four stages of activities: (1) comparing incidents/texts assigned to each category (to validate the category), (2) integrating categories and their properties, (3) delimiting the theory (focusing on the core concepts and ignoring less relevant concepts), and (4) writing theory (using techniques like memoing, storylining, and diagramming that are discussed in the next chapter). Having a central category does not necessarily mean that all other categories can be integrated nicely around it. In order to identify key categories that are conditions, action/interactions, and consequences of the core category, Strauss and Corbin (1990) recommend several integration techniques, such as storylining, memoing, or concept mapping. In storylining, categories and relationships are used to explicate and/or refine a story of the observed phenomenon. Memos are theorized write-ups of ideas about substantive concepts and their theoretically coded relationships as they evolve during ground theory analysis, and are important tools to keep track of and refine ideas that develop during the analysis. Memoing is the process of using these memos to discover patterns and relationships between categories using two-by-two tables, diagrams, or figures, or other illustrative displays. Concept mapping is a graphical representation of concepts and relationships between those concepts (e.g., using boxes and arrows). The major concepts are typically laid out on one or more sheets of paper, blackboards, or using graphical software programs, linked to each other using arrows, and readjusted to best fit the observed data. After a grounded theory is generated, it must be refined for internal consistency and logic. Researchers must ensure that the central construct has the stated characteristics and dimensions, and if not, the data analysis may be repeated. Researcher must then ensure that the characteristics and dimensions of all categories show variation. For example, if behavior frequency is one such category, then the data must provide evidence of both frequent performers and infrequent performers of the focal behavior. Finally, the theory must be validated by comparing it with raw data. If the theory contradicts with observed evidence, the coding process may be repeated to reconcile such contradictions or unexplained variations. Content Analysis Content analysis is the systematic analysis of the content of a text (e.g., who says what, to whom, why, and to what extent and with what effect) in a quantitative or qualitative manner. Content analysis typically conducted as follows. First, when there are many texts to analyze (e.g., newspaper stories, financial reports, blog postings, online reviews, etc.), the researcher begins by sampling a selected set of texts from the population of texts for analysis. This process is not random, but instead, texts that have more pertinent content should be chosen selectively. Second, the researcher identifies and applies rules to divide each text into segments or “chunks” that can be treated as separate units of analysis. This process is called unitizing. For example, 116 | S o c i a l S c i e n c e R e s e a r c h assumptions, effects, enablers, and barriers in texts may constitute such units. Third, the researcher constructs and applies one or more concepts to each unitized text segment in a process called coding. For coding purposes, a coding scheme is used based on the themes the researcher is searching for or uncovers as she classifies the text. Finally, the coded data is analyzed, often both quantitatively and qualitatively, to determine which themes occur most frequently, in what contexts, and how they are related to each other. A simple type of content analysis is sentiment analysis – a technique used to capture people’s opinion or attitude toward an object, person, or phenomenon. Reading online messages about a political candidate posted on an online forum and classifying each message as positive, negative, or neutral is an example of such an analysis. In this case, each message represents one unit of analysis. This analysis will help identify whether the sample as a whole is positively or negatively disposed or neutral towards that candidate. Examining the content of online reviews in a similar manner is another example. Though this analysis can be done manually, for very large data sets (millions of text records), natural language processing and text analytics based software programs are available to automate the coding process, and maintain a record of how people sentiments fluctuate with time. A frequent criticism of content analysis is that it lacks a set of systematic procedures that would allow the analysis to be replicated by other researchers. Schilling (2006)20 addressed this criticism by organizing different content analytic procedures into a spiral model. This model consists of five levels or phases in interpreting text: (1) convert recorded tapes into raw text data or transcripts for content analysis, (2) convert raw data into condensed protocols, (3) convert condensed protocols into a preliminary category system, (4) use the preliminary category system to generate coded protocols, and (5) analyze coded protocols to generate interpretations about the phenomenon of interest. Content analysis has several limitations. First, the coding process is restricted to the information available in text form. For instance, if a researcher is interested in studying people’s views on capital punishment, but no such archive of text documents is available, then the analysis cannot be done. Second, sampling must be done carefully to avoid sampling bias. For instance, if your population is the published research literature on a given topic, then you have systematically omitted unpublished research or the most recent work that is yet to be published. 
Qualitative Analysis Qualitative analysis is the analysis of qualitative data such as text data from interview transcripts. Unlike quantitative analysis, which is statistics driven and largely independent of the researcher, qualitative analysis is heavily dependent on the researcher’s analytic and integrative skills and personal knowledge of the social context where the data is collected. The emphasis in qualitative analysis is “sense making” or understanding a phenomenon, rather than predicting or explaining. A creative and investigative mindset is needed for qualitative analysis, based on an ethically enlightened and participant-in-context attitude, and a set of analytic strategies. This chapter provides a brief overview of some of these qualitative analysis strategies. Interested readers are referred to more authoritative and detailed references such as Miles and Huberman’s (1984)17 seminal book on this topic. Grounded Theory How can you analyze a vast set qualitative data acquired through participant observation, in-depth interviews, focus groups, narratives of audio/video recordings, or secondary documents? One of these techniques for analyzing text data is grounded theory – an inductive technique of interpreting recorded data about a social phenomenon to build theories about that phenomenon. The technique was developed by Glaser and Strauss (1967)18 in their method of constant comparative analysis of grounded theory research, and further refined by Strauss and Corbin (1990)19 to further illustrate specific coding techniques – a process of classifying and categorizing text data segments into a set of codes (concepts), categories (constructs), and relationships. The interpretations are “grounded in” (or based on) observed empirical data, hence the name. To ensure that the theory is based solely on observed evidence, the grounded theory approach requires that researchers suspend any preexisting theoretical expectations or biases before data analysis, and let the data dictate the formulation of the theory. Strauss and Corbin (1998) describe three coding techniques for analyzing text data: open, axial, and selective. Open coding is a process aimed at identifying concepts or key ideas 17 Miles M. B., Huberman A. M. (1984). Qualitative Data Analysis: A Sourcebook of New Methods. Newbury Park, CA: Sage Publications. 18 Glaser, B. and Strauss, A. (1967). The Discovery of Grounded Theory: Strategies for Qualitative Research, Chicago: Aldine. 19 Strauss, A. and Corbin, J. (1990). Basics of Qualitative Research: Grounded Theory Procedures and Techniques, Beverly Hills, CA: Sage Publications. 114 | S o c i a l S c i e n c e R e s e a r c h that are hidden within textual data, which are potentially related to the phenomenon of interest. The researcher examines the raw textual data line by line to identify discrete events, incidents, ideas, actions, perceptions, and interactions of relevance that are coded as concepts (hence called in vivo codes). Each concept is linked to specific portions of the text (coding unit) for later validation. Some concepts may be simple, clear, and unambiguous while others may be complex, ambiguous, and viewed differently by different participants. The coding unit may vary with the concepts being extracted. Simple concepts such as “organizational size” may include just a few words of text, while complex ones such as “organizational mission” may span several pages. Concepts can be named using the researcher’s own naming convention or standardized labels taken from the research literature. Once a basic set of concepts are identified, these concepts can then be used to code the remainder of the data, while simultaneously looking for new concepts and refining old concepts. While coding, it is important to identify the recognizable characteristics of each concept, such as its size, color, or level (e.g., high or low), so that similar concepts can be grouped together later. This coding technique is called “open” because the researcher is open to and actively seeking new concepts relevant to the phenomenon of interest. Next, similar concepts are grouped into higher order categories. While concepts may be context-specific, categories tend to be broad and generalizable, and ultimately evolve into constructs in a grounded theory. Categories are needed to reduce the amount of concepts the researcher must work with and to build a “big picture” of the issues salient to understanding a social phenomenon. Categorization can be done is phases, by combining concepts into subcategories, and then subcategories into higher order categories. Constructs from the existing literature can be used to name these categories, particularly if the goal of the research is to extend current theories. However, caution must be taken while using existing constructs, as such constructs may bring with them commonly held beliefs and biases. For each category, its characteristics (or properties) and dimensions of each characteristic should be identified. The dimension represents a value of a characteristic along a continuum. For example, a “communication media” category may have a characteristic called “speed”, which can be dimensionalized as fast, medium, or slow. Such categorization helps differentiate between different kinds of communication media and enables researchers identify patterns in the data, such as which communication media is used for which types of tasks. The second phase of grounded theory is axial coding, where the categories and subcategories are assembled into causal relationships or hypotheses that can tentatively explain the phenomenon of interest. Although distinct from open coding, axial coding can be performed simultaneously with open coding. The relationships between categories may be clearly evident in the data or may be more subtle and implicit. In the latter instance, researchers may use a coding scheme (often called a “coding paradigm”, but different from the paradigms discussed in Chapter 3) to understand which categories represent conditions (the circumstances in which the phenomenon is embedded), actions/interactions (the responses of individuals to events under these conditions), and consequences (the outcomes of actions/ interactions). As conditions, actions/interactions, and consequences are identified, theoretical propositions start to emerge, and researchers can start explaining why a phenomenon occurs, under what conditions, and with what consequences. The third and final phase of grounded theory is selective coding, which involves identifying a central category or a core variable and systematically and logically relating this central category to other categories. The central category can evolve from existing categories or can be a higher order category that subsumes previously coded categories. 
Rigor in Interpretive Research While positivist research employs a “reductionist” approach by simplifying social reality into parsimonious theories and laws, interpretive research attempts to interpret social reality through the subjective viewpoints of the embedded participants within the context where the reality is situated. These interpretations are heavily contextualized, and are naturally less generalizable to other contexts. However, because interpretive analysis is subjective and sensitive to the experiences and insight of the embedded researcher, it is often considered less rigorous by many positivist (functionalist) researchers. Because interpretive research is based on different set of ontological and epistemological assumptions about social phenomenon than positivist research, the positivist notions of rigor, such as reliability, internal validity, and generalizability, do not apply in a similar manner. However, Lincoln and Guba (1985)16 provide an alternative set of criteria that can be used to judge the rigor of interpretive research. Dependability. Interpretive research can be viewed as dependable or authentic if two researchers assessing the same phenomenon using the same set of evidence independently arrive at the same conclusions or the same researcher observing the same or a similar phenomenon at different times arrives at similar conclusions. This concept is similar to that of reliability in positivist research, with agreement between two independent researchers being similar to the notion of inter-rater reliability, and agreement between two observations of the same phenomenon by the same researcher akin to test-retest reliability. To ensure dependability, interpretive researchers must provide adequate details about their phenomenon of interest and the social context in which it is embedded so as to allow readers to independently authenticate their interpretive inferences. Credibility. Interpretive research can be considered credible if readers find its inferences to be believable. This concept is akin to that of internal validity in functionalistic research. The credibility of interpretive research can be improved by providing evidence of the researcher’s extended engagement in the field, by demonstrating data triangulation across subjects or data collection techniques, and by maintaining meticulous data management and analytic procedures, such as verbatim transcription of interviews, accurate records of contacts and interviews, and clear notes on theoretical and methodological decisions, that can allow an independent audit of data collection and analysis if needed. Confirmability. Confirmability refers to the extent to which the findings reported in interpretive research can be independently confirmed by others (typically, participants). This is similar to the notion of objectivity in functionalistic research. Since interpretive research rejects the notion of an objective reality, confirmability is demonstrated in terms of “inter- 16 Lincoln, Y. S., and Guba, E. G. (1985). Naturalistic Inquiry. Beverly Hills, CA: Sage Publications. I n t e r p r e t i v e R e s e a r c h | 111 subjectivity”, i.e., if the study’s participants agree with the inferences derived by the researcher. For instance, if a study’s participants generally agree with the inferences drawn by a researcher about a phenomenon of interest (based on a review of the research paper or report), then the findings can be viewed as confirmable. Transferability. Transferability in interpretive research refers to the extent to which the findings can be generalized to other settings. This idea is similar to that of external validity in functionalistic research. The researcher must provide rich, detailed descriptions of the research context (“thick description”) and thoroughly describe the structures, assumptions, and processes revealed from the data so that readers can independently assess whether and to what extent are the reported findings transferable to other settings.
Interpretive Data Collection Data is collected in interpretive research using a variety of techniques. The most frequently used technique is interviews (face-to-face, telephone, or focus groups). Interview types and strategies are discussed in detail in a previous chapter on survey research. A second technique is observation. Observational techniques include direct observation, where the researcher is a neutral and passive external observer and is not involved in the phenomenon of interest (as in case research), and participant observation, where the researcher is an active I n t e r p r e t i v e R e s e a r c h | 107 participant in the phenomenon and her inputs or mere presence influence the phenomenon being studied (as in action research). A third technique is documentation, where external and internal documents, such as memos, electronic mails, annual reports, financial statements, newspaper articles, websites, may be used to cast further insight into the phenomenon of interest or to corroborate other forms of evidence. Interpretive Research Designs Case research. As discussed in the previous chapter, case research is an intensive longitudinal study of a phenomenon at one or more research sites for the purpose of deriving detailed, contextualized inferences and understanding the dynamic process underlying a phenomenon of interest. Case research is a unique research design in that it can be used in an interpretive manner to build theories or in a positivist manner to test theories. The previous chapter on case research discusses both techniques in depth and provides illustrative exemplars. Furthermore, the case researcher is a neutral observer (direct observation) in the social setting rather than an active participant (participant observation). As with any other interpretive approach, drawing meaningful inferences from case research depends heavily on the observational skills and integrative abilities of the researcher. Action research. Action research is a qualitative but positivist research design aimed at theory testing rather than theory building (discussed in this chapter due to lack of a proper space). This is an interactive design that assumes that complex social phenomena are best understood by introducing changes, interventions, or “actions” into those phenomena and observing the outcomes of such actions on the phenomena of interest. In this method, the researcher is usually a consultant or an organizational member embedded into a social context (such as an organization), who initiates an action in response to a social problem, and examines how her action influences the phenomenon while also learning and generating insights about the relationship between the action and the phenomenon. Examples of actions may include organizational change programs, such as the introduction of new organizational processes, procedures, people, or technology or replacement of old ones, initiated with the goal of improving an organization’s performance or profitability in its business environment. The researcher’s choice of actions must be based on theory, which should explain why and how such actions may bring forth the desired social change. The theory is validated by the extent to which the chosen action is successful in remedying the targeted problem. Simultaneous problem solving and insight generation is the central feature that distinguishes action research from other research methods (which may not involve problem solving) and from consulting (which may not involve insight generation). Hence, action research is an excellent method for bridging research and practice. There are several variations of the action research method. The most popular of these method is the participatory action research, designed by Susman and Evered (1978)13. This method follows an action research cycle consisting of five phases: (1) diagnosing, (2) action planning, (3) action taking, (4) evaluating, and (5) learning (see Figure 10.1). Diagnosing involves identifying and defining a problem in its social context. Action planning involves identifying and evaluating alternative solutions to the problem, and deciding on a future course of action (based on theoretical rationale). Action taking is the implementation of the planned course of action. The evaluation stage examines the extent to which the initiated action is 13 Susman, G.I. and Evered, R.D. (1978). “An Assessment of the Scientific Merits of Action Research,” Administrative Science Quarterly, (23), 582-603. 108 | S o c i a l S c i e n c e R e s e a r c h successful in resolving the original problem, i.e., whether theorized effects are indeed realized in practice. In the learning phase, the experiences and feedback from action evaluation are used to generate insights about the problem and suggest future modifications or improvements to the action. Based on action evaluation and learning, the action may be modified or adjusted to address the problem better, and the action research cycle is repeated with the modified action sequence. It is suggested that the entire action research cycle be traversed at least twice so that learning from the first cycle can be implemented in the second cycle. The primary mode of data collection is participant observation, although other techniques such as interviews and documentary evidence may be used to corroborate the researcher’s observations. Figure 10.1. Action research cycle Ethnography. The ethnographic research method, derived largely from the field of anthropology, emphasizes studying a phenomenon within the context of its culture. The researcher must be deeply immersed in the social culture over an extended period of time (usually 8 months to 2 years) and should engage, observe, and record the daily life of the studied culture and its social participants within their natural setting. The primary mode of data collection is participant observation, and data analysis involves a “sense-making” approach. In addition, the researcher must take extensive field notes, and narrate her experience in descriptive detail so that readers may experience the same culture as the researcher. In this method, the researcher has two roles: rely on her unique knowledge and engagement to generate insights (theory), and convince the scientific community of the transsituational nature of the studied phenomenon. The classic example of ethnographic research is Jane Goodall’s study of primate behaviors, where she lived with chimpanzees in their natural habitat at Gombe National Park in Tanzania, observed their behaviors, interacted with them, and shared their lives. During that process, she learnt and chronicled how chimpanzees seek food and shelter, how they socialize with each other, their communication patterns, their mating behaviors, and so forth. A more contemporary example of ethnographic research is Myra Bluebond-Langer’s (1996)14 study of decision making in families with children suffering from life-threatening illnesses, and the physical, psychological, environmental, ethical, legal, and cultural issues that influence such decision-making. The researcher followed the experiences of approximately 80 children with 14 Bluebond-Langer, M. (1996). In the Shadow of Illness: Parents and Siblings of the Chronically Ill Child. Princeton, NJ: Princeton University Press. I n t e r p r e t i v e R e s e a r c h | 109 incurable illnesses and their families for a period of over two years. Data collection involved participant observation and formal/informal conversations with children, their parents and relatives, and health care providers to document their lived experience. Phenomenology. Phenomenology is a research method that emphasizes the study of conscious experiences as a way of understanding the reality around us. It is based on the ideas of German philosopher Edmund Husserl in the early 20th century who believed that human experience is the source of all knowledge. Phenomenology is concerned with the systematic reflection and analysis of phenomena associated with conscious experiences, such as human judgment, perceptions, and actions, with the goal of (1) appreciating and describing social reality from the diverse subjective perspectives of the participants involved, and (2) understanding the symbolic meanings (“deep structure”) underlying these subjective experiences. Phenomenological inquiry requires that researchers eliminate any prior assumptions and personal biases, empathize with the participant’s situation, and tune into existential dimensions of that situation, so that they can fully understand the deep structures that drives the conscious thinking, feeling, and behavior of the studied participants
Benefits and Challenges of Interpretive Research Interpretive research has several unique advantages. First, they are well-suited for exploring hidden reasons behind complex, interrelated, or multifaceted social processes, such as inter-firm relationships or inter-office politics, where quantitative evidence may be biased, inaccurate, or otherwise difficult to obtain. Second, they are often helpful for theory construction in areas with no or insufficient a priori theory. Third, they are also appropriate for studying context-specific, unique, or idiosyncratic events or processes. Fourth, interpretive research can also help uncover interesting and relevant research questions and issues for follow-up research. At the same time, interpretive research also has its own set of challenges. First, this type of research tends to be more time and resource intensive than positivist research in data collection and analytic efforts. Too little data can lead to false or premature assumptions, while too much data may not be effectively processed by the researcher. Second, interpretive research requires well-trained researchers who are capable of seeing and interpreting complex social phenomenon from the perspectives of the embedded participants and reconciling the diverse perspectives of these participants, without injecting their personal biases or preconceptions into their inferences. Third, all participants or data sources may not be equally credible, unbiased, or knowledgeable about the phenomenon of interest, or may have undisclosed political agendas, which may lead to misleading or false impressions. Inadequate trust between participants and researcher may hinder full and honest self-representation by participants, and such trust building takes time. It is the job of the interpretive researcher to “see through the smoke” (hidden or biased agendas) and understand the true nature of the problem. Fourth, given the heavily contextualized nature of inferences drawn from interpretive research, such inferences do not lend themselves well to replicability or generalizability. Finally, interpretive research may sometimes fail to answer the research questions of interest or predict future behaviors. Characteristics of Interpretive Research All interpretive research must adhere to a common set of principles, as described below. Naturalistic inquiry: Social phenomena must be studied within their natural setting. Because interpretive research assumes that social phenomena are situated within and cannot 106 | S o c i a l S c i e n c e R e s e a r c h be isolated from their social context, interpretations of such phenomena must be grounded within their socio-historical context. This implies that contextual variables should be observed and considered in seeking explanations of a phenomenon of interest, even though context sensitivity may limit the generalizability of inferences. Researcher as instrument: Researchers are often embedded within the social context that they are studying, and are considered part of the data collection instrument in that they must use their observational skills, their trust with the participants, and their ability to extract the correct information. Further, their personal insights, knowledge, and experiences of the social context is critical to accurately interpreting the phenomenon of interest. At the same time, researchers must be fully aware of their personal biases and preconceptions, and not let such biases interfere with their ability to present a fair and accurate portrayal of the phenomenon. Interpretive analysis: Observations must be interpreted through the eyes of the participants embedded in the social context. Interpretation must occur at two levels. The first level involves viewing or experiencing the phenomenon from the subjective perspectives of the social participants. The second level is to understand the meaning of the participants’ experiences in order to provide a “thick description” or a rich narrative story of the phenomenon of interest that can communicate why participants acted the way they did. Use of expressive language: Documenting the verbal and non-verbal language of participants and the analysis of such language are integral components of interpretive analysis. The study must ensure that the story is viewed through the eyes of a person, and not a machine, and must depict the emotions and experiences of that person, so that readers can understand and relate to that person. Use of imageries, metaphors, sarcasm, and other figures of speech is very common in interpretive analysis. Temporal nature: Interpretive research is often not concerned with searching for specific answers, but with understanding or “making sense of” a dynamic social process as it unfolds over time. Hence, such research requires an immersive involvement of the researcher at the study site for an extended period of time in order to capture the entire evolution of the phenomenon of interest. Hermeneutic circle: Interpretive interpretation is an iterative process of moving back and forth from pieces of observations (text) to the entirety of the social phenomenon (context) to reconcile their apparent discord and to construct a theory that is consistent with the diverse subjective viewpoints and experiences of the embedded participants. Such iterations between the understanding/meaning of a phenomenon and observations must continue until “theoretical saturation” is reached, whereby any additional iteration does not yield any more insight into the phenomenon of interes
The last chapter introduced interpretive research, or more specifically, interpretive case research. This chapter will explore other kinds of interpretive research. Recall that positivist or deductive methods, such as laboratory experiments and survey research, are those that are specifically intended for theory (or hypotheses) testing, while interpretive or inductive methods, such as action research and ethnography, are intended for theory building. Unlike a positivist method, where the researcher starts with a theory and tests theoretical postulates using empirical data, in interpretive methods, the researcher starts with data and tries to derive a theory about the phenomenon of interest from the observed data. The term “interpretive research” is often used loosely and synonymously with “qualitative research”, although the two concepts are quite different. Interpretive research is a research paradigm (see Chapter 3) that is based on the assumption that social reality is not singular or objective, but is rather shaped by human experiences and social contexts (ontology), and is therefore best studied within its socio-historic context by reconciling the subjective interpretations of its various participants (epistemology). Because interpretive researchers view social reality as being embedded within and impossible to abstract from their social settings, they “interpret” the reality though a “sense-making” process rather than a hypothesis testing process. This is in contrast to the positivist or functionalist paradigm that assumes that the reality is relatively independent of the context, can be abstracted from their contexts, and studied in a decomposable functional manner using objective techniques such as standardized measures. Whether a researcher should pursue interpretive or positivist research depends on paradigmatic considerations about the nature of the phenomenon under consideration and the best way to study it. However, qualitative versus quantitative research refers to empirical or data-oriented considerations about the type of data to collect and how to analyze them. Qualitative research relies mostly on non-numeric data, such as interviews and observations, in contrast to quantitative research which employs numeric data such as scores and metrics. Hence, qualitative research is not amenable to statistical procedures such as regression analysis, but is coded using techniques like content analysis. Sometimes, coded qualitative data is tabulated quantitatively as frequencies of codes, but this data is not statistically analyzed. Many puritan interpretive researchers reject this coding approach as a futile effort to seek consensus or objectivity in a social phenomenon which is essentially subjective. Although interpretive research tends to rely heavily on qualitative data, quantitative data may add more precision and clearer understanding of the phenomenon of interest than 104 | S o c i a l S c i e n c e R e s e a r c h qualitative data. For example, Eisenhardt (1989), in her interpretive study of decision making n high-velocity firms (discussed in the previous chapter on case research), collected numeric data on how long it took each firm to make certain strategic decisions (which ranged from 1.5 months to 18 months), how many decision alternatives were considered for each decision, and surveyed her respondents to capture their perceptions of organizational conflict. Such numeric data helped her clearly distinguish the high-speed decision making firms from the low-speed decision makers, without relying on respondents’ subjective perceptions, which then allowed her to examine the number of decision alternatives considered by and the extent of conflict in high-speed versus low-speed firms. Interpretive research should attempt to collect both qualitative and quantitative data pertaining to their phenomenon of interest, and so should positivist research as well. Joint use of qualitative and quantitative data, often called “mixedmode designs”, may lead to unique insights and are highly prized in the scientific community. Interpretive research has its roots in anthropology, sociology, psychology, linguistics, and semiotics, and has been available since the early 19th century, long before positivist techniques were developed. Many positivist researchers view interpretive research as erroneous and biased, given the subjective nature of the qualitative data collection and interpretation process employed in such research. However, the failure of many positivist techniques to generate interesting insights or new knowledge have resulted in a resurgence of interest in interpretive research since the 1970’s, albeit with exacting methods and stringent criteria to ensure the reliability and validity of interpretive inferences. Distinctions from Positivist Research In addition to fundamental paradigmatic differences in ontological and epistemological assumptions discussed above, interpretive and positivist research differ in several other ways. First, interpretive research employs a theoretical sampling strategy, where study sites, respondents, or cases are selected based on theoretical considerations such as whether they fit the phenomenon being studied (e.g., sustainable practices can only be studied in organizations that have implemented sustainable practices), whether they possess certain characteristics that make them uniquely suited for the study (e.g., a study of the drivers of firm innovations should include some firms that are high innovators and some that are low innovators, in order to draw contrast between these firms), and so forth. In contrast, positivist research employs random sampling (or a variation of this technique), where cases are chosen randomly from a population, for purposes of generalizability. Hence, convenience samples and small samples are considered acceptable in interpretive research as long as they fit the nature and purpose of the study, but not in positivist research. Second, the role of the researcher receives critical attention in interpretive research. In some methods such as ethnography, action research, and participant observation, the researcher is considered part of the social phenomenon, and her specific role and involvement in the research process must be made clear during data analysis. In other methods, such as case research, the researcher must take a “neutral” or unbiased stance during the data collection and analysis processes, and ensure that her personal biases or preconceptions does not taint the nature of subjective inferences derived from interpretive research. In positivist research, however, the researcher is considered to be external to and independent of the research context and is not presumed to bias the data collection and analytic procedures. Third, interpretive analysis is holistic and contextual, rather than being reductionist and isolationist. Interpretive interpretations tend to focus on language, signs, and meanings from
Positivist Case Research Exemplar Case research can also be used in a positivist manner to test theories or hypotheses. Such studies are rare, but Markus (1983)12 provides an exemplary illustration in her study of technology implementation at the Golden Triangle Company (a pseudonym). The goal of this study was to understand why a newly implemented financial information system (FIS), 12 Markus, M. L. (1983). “Power, Politics, and MIS Implementation,” Communications of the ACM (26:6), 430-444. 100 | S o c i a l S c i e n c e R e s e a r c h intended to improve the productivity and performance of accountants at GTC was supported by accountants at GTC’s corporate headquarters but resisted by divisional accountants at GTC branches. Given the uniqueness of the phenomenon of interest, this was a single-case research study. To explore the reasons behind user resistance of FIS, Markus posited three alternative explanations: (1) system-determined theory: resistance was caused by factors related to an inadequate system, such as its technical deficiencies, poor ergonomic design, or lack of user friendliness, (2) people-determined theory: resistance was caused by factors internal to users, such as the accountants’ cognitive styles or personality traits that were incompatible with using the system, and (3) interaction theory: resistance was not caused not by factors intrinsic to the system or the people, but by the interaction between the two set of factors. Specifically, interaction theory suggested that the FIS engendered a redistribution of intra-organizational power, and accountants who lost organizational status, relevance, or power as a result of FIS implementation resisted the system while those gaining power favored it. In order to test the three theories, Markus predicted alternative outcomes expected from each theoretical explanation and analyzed the extent to which those predictions matched with her observations at GTC. For instance, the system-determined theory suggested that since user resistance was caused by an inadequate system, fixing the technical problems of the system would eliminate resistance. The computer running the FIS system was subsequently upgraded with a more powerful operating system, online processing (from initial batch processing, which delayed immediate processing of accounting information), and a simplified software for new account creation by managers. One year after these changes were made, the resistant users were still resisting the system and felt that it should be replaced. Hence, the system-determined theory was rejected. The people-determined theory predicted that replacing individual resistors or co-opting them with less resistant users would reduce their resistance toward the FIS. Subsequently, GTC started a job rotation and mobility policy, moving accountants in and out of the resistant divisions, but resistance not only persisted, but in some cases increased! In one specific instance, one accountant, who was one of the system’s designers and advocates when he worked for corporate accounting, started resisting the system after he was moved to the divisional controller’s office. Failure to realize the predictions of the people-determined theory led to the rejection of this theory. Finally, the interaction theory predicted that neither changing the system or the people (i.e., user education or job rotation policies) will reduce resistance as long as the power imbalance and redistribution from the pre-implementation phase were not addressed. Before FIS implementation, divisional accountants at GTC felt that they owned all accounting data related to their divisional operations. They maintained this data in thick, manual ledger books, controlled others’ access to the data, and could reconcile unusual accounting events before releasing those reports. Corporate accountants relied heavily on divisional accountants for access to the divisional data for corporate reporting and consolidation. Because the FIS system automatically collected all data at source and consolidated them into a single corporate database, it obviated the need for divisional accountants, loosened their control and autonomy over their division’s accounting data, and making their job somewhat irrelevant. Corporate accountants could now query the database and access divisional data directly without going through the divisional accountants, analyze and compare the performance of individual divisions, and report unusual patterns and activities to the executive committee, resulting in C a s e R e s e a r c h | 101 further erosion of the divisions’ power. Though Markus did not empirically test this theory, her observations about the redistribution of organizational power, coupled with the rejection of the two alternative theories, led to the justification of interaction theory. Comparisons with Traditional Research Positivist case research, aimed at hypotheses testing, is often criticized by natural science researchers as lacking in controlled observations, controlled deductions, replicability, and generalizability of findings – the traditional principles of positivist research. However, these criticisms can be overcome through appropriate case research designs. For instance, the problem of controlled observations refers to the difficulty of obtaining experimental or statistical control in case research. However, case researchers can compensate for such lack of controls by employing “natural controls.” This natural control in Markus’ (1983) study was the corporate accountant who was one of the system advocates initially, but started resisting it once he moved to controlling division. In this instance, the change in his behavior may be attributed to his new divisional position. However, such natural controls cannot be anticipated in advance, and case researchers may overlook then unless they are proactively looking for such controls. Incidentally, natural controls are also used in natural science disciplines such as astronomy, geology, and human biology, such as wait for comets to pass close enough to the earth in order to make inferences about comets and their composition. The problem of controlled deduction refers to the lack of adequate quantitative evidence to support inferences, given the mostly qualitative nature of case research data. Despite the lack of quantitative data for hypotheses testing (e.g., t-tests), controlled deductions can still be obtained in case research by generating behavioral predictions based on theoretical considerations and testing those predictions over time. Markus employed this strategy in her study by generating three alternative theoretical hypotheses for user resistance, and rejecting two of those predictions when they did not match with actual observed behavior. In this case, the hypotheses were tested using logical propositions rather than using mathematical tests, which are just as valid as statistical inferences since mathematics is a subset of logic. Third, the problem of replicability refers to the difficulty of observing the same phenomenon given the uniqueness and idiosyncrasy of a given case site. However, using Markus’ three theories as an illustration, a different researcher can test the same theories at a different case site, where three different predictions may emerge based on the idiosyncratic nature of the new case site, and the three resulting predictions may be tested accordingly. In other words, it is possible to replicate the inferences of case research, even if the case research site or context may not be replicable. Fourth, case research tends to examine unique and non-replicable phenomena that may not be generalized to other settings. Generalizability in natural sciences is established through additional studies. Likewise, additional case studies conducted in different contexts with different predictions can establish generalizability of findings if such findings are observed to be consistent across studies. Lastly, British philosopher Karl Popper described four requirements of scientific theories: (1) theories should be falsifiable, (2) they should be logically consistent, (3) they should have adequate predictive ability, and (4) they should provide better explanation than rival theories. In case research, the first three requirements can be increased by increasing the degrees of freedom of observed findings, such as by increasing the number of case sites, the 102 | S o c i a l S c i e n c e R e s e a r c h number of alternative predictions, and the number of levels of analysis examined. This was accomplished in Markus’ study by examining the behavior of multiple groups (divisional accountants and corporate accountants) and providing multiple (three) rival explanations. Popper’s fourth condition was accomplished in this study when one hypothesis was found to match observed evidence better than the two rival hypothese
Reviewing the prior literature on executive decision-making, Eisenhardt found several patterns, although none of these patterns were specific to high-velocity environments. The literature suggested that in the interest of expediency, firms that make faster decisions obtain input from fewer sources, consider fewer alternatives, make limited analysis, restrict user participation in decision-making, centralize decision-making authority, and has limited internal conflicts. However, Eisenhardt contended that these views may not necessarily explain how decision makers make decisions in high-velocity environments, where decisions must be made quickly and with incomplete information, while maintaining high decision quality. To examine this phenomenon, Eisenhardt conducted an inductive study of eight firms in the personal computing industry. The personal computing industry was undergoing dramatic changes in technology with the introduction of the UNIX operating system, RISC architecture, and 64KB random access memory in the 1980’s, increased competition with the entry of IBM into the personal computing business, and growing customer demand with double-digit demand growth, and therefore fit the profile of the high-velocity environment. This was a multiple case design with replication logic, where each case was expected to confirm or disconfirm inferences from other cases. Case sites were selected based on their access and proximity to the researcher; however, all of these firms operated in the high-velocity personal computing industry in California’s Silicon Valley area. The collocation of firms in the same industry and the same area ruled out any “noise” or variance in dependent variables (decision speed or performance) attributable to industry or geographic differences. The study employed an embedded design with multiple levels of analysis: decision (comparing multiple strategic decisions within each firm), executive teams (comparing different teams responsible for strategic decisions), and the firm (overall firm performance). Data was collected from five sources:  Initial interviews with Chief Executive Officers: CEOs were asked questions about their firm’s competitive strategy, distinctive competencies, major competitors, performance, and recent/ongoing major strategic decisions. Based on these interviews, several strategic decisions were selected in each firm for further investigation. Four criteria were used to select decisions: (1) the decisions involved the firm’s strategic positioning, (2) the decisions had high stakes, (3) the decisions involved multiple functions, and (4) the decisions were representative of strategic decision-making process in that firm.  Interviews with divisional heads: Each divisional head was asked sixteen open-ended questions, ranging from their firm’s competitive strategy, functional strategy, top management team members, frequency and nature of interaction with team, typical decision making processes, how each of the previously identified decision was made, and how long it took them to make those decisions. Interviews lasted between 1.5 and 2 hours, and sometimes extended to 4 hours. To focus on facts and actual events rather than respondents’ perceptions or interpretations, a “courtroom” style questioning was employed, such as when did this happen, what did you do, etc. Interviews were conducted by two people, and the data was validated by cross-checking facts and impressions made by the interviewer and note-taker. All interview data was recorded, however notes were also taken during each interview, which ended with the interviewer’s overall impressions. Using a “24-hour rule”, detailed field notes were completed within 24 hours of the interview, so that some data or impressions were not lost to recall. C a s e R e s e a r c h | 99  Questionnaires: Executive team members at each firm were completed a survey questionnaire that captured quantitative data on the extent of conflict and power distribution in their firm.  Secondary data: Industry reports and internal documents such as demographics of the executive teams (responsible for strategic decisions), financial performance of firms, and so forth, were examined.  Personal observation: Lastly, the researcher attended a 1-day strategy session and a weekly executive meeting at two firms in her sample. Data analysis involved a combination of quantitative and qualitative techniques. Quantitative data on conflict and power were analyzed for patterns across firms/decisions. Qualitative interview data was combined into decision climate profiles, using profile traits (e.g., impatience) mentioned by more than one executive. For within-case analysis, decision stories were created for each strategic decision by combining executive accounts of the key decision events into a timeline. For cross-case analysis, pairs of firms were compared for similarities and differences, categorized along variables of interest such as decision speed and firm performance. Based on these analyses, tentative constructs and propositions were derived inductively from each decision story within firm categories. Each decision case was revisited to confirm the proposed relationships. The inferred propositions were compared with findings from the existing literature to reconcile examine differences with the extant literature and to generate new insights from the case findings. Finally, the validated propositions were synthesized into an inductive theory of strategic decision-making by firms in high-velocity environments. Inferences derived from this multiple case research contradicted several decisionmaking patterns expected from the existing literature. First, fast decision makers in highvelocity environments used more information, and not less information as suggested by the previous literature. However, these decision makers used more real-time information (an insight not available from prior research), which helped them identify and respond to problems, opportunities, and changing circumstances faster. Second, fast decision makers examined more (not fewer) alternatives. However, they considered these multiple alternatives in a simultaneous manner, while slower decision makers examined fewer alternatives in a sequential manner. Third, fast decision makers did not centralize decision making or restrict inputs from others, as the literature suggested. Rather, these firms used a two-tiered decision process in which experienced counselors were asked for inputs in the first stage, following by a rapid comparison and decision selection in the second stage. Fourth, fast decision makers did not have less conflict, as expected from the literature, but employed better conflict resolution techniques to reduce conflict and improve decision-making speed. Finally, fast decision makers exhibited superior firm performance by virtue of their built-in cognitive, emotional, and political processes that led to rapid closure of major decisions
Conducting Case Research Most case research studies tend to be interpretive in nature. Interpretive case research is an inductive technique where evidence collected from one or more case sites is systematically analyzed and synthesized to allow concepts and patterns to emerge for the purpose of building new theories or expanding existing ones. Eisenhardt (1989)10 propose a “roadmap” for building theories from case research, a slightly modified version of which is described below. For positivist case research, some of the following stages may need to be rearranged or modified; however sampling, data collection, and data analytic techniques should generally remain the same. Define research questions. Like any other scientific research, case research must also start with defining research questions that are theoretically and practically interesting, and identifying some intuitive expectations about possible answers to those research questions or preliminary constructs to guide initial case design. In positivist case research, the preliminary constructs are based on theory, while no such theory or hypotheses should be considered ex ante in interpretive research. These research questions and constructs may be changed in interpretive case research later on, if needed, but not in positivist case research. Select case sites. The researcher should use a process of “theoretical sampling” (not random sampling) to identify case sites. In this approach, case sites are chosen based on theoretical, rather than statistical, considerations, for instance, to replicate previous cases, to extend preliminary theories, or to fill theoretical categories or polar types. Care should be taken to ensure that the selected sites fit the nature of research questions, minimize extraneous variance or noise due to firm size, industry effects, and so forth, and maximize variance in the dependent variables of interest. For instance, if the goal of the research is to examine how some firms innovate better than others, the researcher should select firms of similar size within the 10 Eisenhardt, K. M. (1989). “Building Theories from Case Research,” Academy of Management Review (14:4), 532-550. 96 | S o c i a l S c i e n c e R e s e a r c h same industry to reduce industry or size effects, and select some more innovative and some less innovative firms to increase variation in firm innovation. Instead of cold-calling or writing to a potential site, it is better to contact someone at executive level inside each firm who has the authority to approve the project or someone who can identify a person of authority. During initial conversations, the researcher should describe the nature and purpose of the project, any potential benefits to the case site, how the collected data will be used, the people involved in data collection (other researchers, research assistants, etc.), desired interviewees, and the amount of time, effort, and expense required of the sponsoring organization. The researcher must also assure confidentiality, privacy, and anonymity of both the firm and the individual respondents. Create instruments and protocols. Since the primary mode of data collection in case research is interviews, an interview protocol should be designed to guide the interview process. This is essentially a list of questions to be asked. Questions may be open-ended (unstructured) or closed-ended (structured) or a combination of both. The interview protocol must be strictly followed, and the interviewer must not change the order of questions or skip any question during the interview process, although some deviations are allowed to probe further into respondent’s comments that are ambiguous or interesting. The interviewer must maintain a neutral tone, not lead respondents in any specific direction, say by agreeing or disagreeing with any response. More detailed interviewing techniques are discussed in the chapter on surveys. In addition, additional sources of data, such as internal documents and memorandums, annual reports, financial statements, newspaper articles, and direct observations should be sought to supplement and validate interview data. Select respondents. Select interview respondents at different organizational levels, departments, and positions to obtain divergent perspectives on the phenomenon of interest. A random sampling of interviewees is most preferable; however a snowball sample is acceptable, as long as a diversity of perspectives is represented in the sample. Interviewees must be selected based on their personal involvement with the phenomenon under investigation and their ability and willingness to answer the researcher’s questions accurately and adequately, and not based on convenience or access. Start data collection. It is usually a good idea to electronically record interviews for future reference. However, such recording must only be done with the interviewee’s consent. Even when interviews are being recorded, the interviewer should take notes to capture important comments or critical observations, behavioral responses (e.g., respondent’s body language), and the researcher’s personal impressions about the respondent and his/her comments. After each interview is completed, the entire interview should be transcribed verbatim into a text document for analysis. Conduct within-case data analysis. Data analysis may follow or overlap with data collection. Overlapping data collection and analysis has the advantage of adjusting the data collection process based on themes emerging from data analysis, or to further probe into these themes. Data analysis is done in two stages. In the first stage (within-case analysis), the researcher should examine emergent concepts separately at each case site and patterns between these concepts to generate an initial theory of the problem of interest. The researcher can interview data subjectively to “make sense” of the research problem in conjunction with using her personal observations or experience at the case site. Alternatively, a coding strategy such as Glasser and Strauss’ (1967) grounded theory approach, using techniques such as open coding, axial coding, and selective coding, may be used to derive a chain of evidence and C a s e R e s e a r c h | 97 inferences. These techniques are discussed in detail in a later chapter. Homegrown techniques, such as graphical representation of data (e.g., network diagram) or sequence analysis (for longitudinal data) may also be used. Note that there is no predefined way of analyzing the various types of case data, and the data analytic techniques can be modified to fit the nature of the research project. Conduct cross-case analysis. Multi-site case research requires cross-case analysis as the second stage of data analysis. In such analysis, the researcher should look for similar concepts and patterns between different case sites, ignoring contextual differences that may lead to idiosyncratic conclusions. Such patterns may be used for validating the initial theory, or for refining it (by adding or dropping concepts and relationships) to develop a more inclusive and generalizable theory. This analysis may take several forms. For instance, the researcher may select categories (e.g., firm size, industry, etc.) and look for within-group similarities and between-group differences (e.g., high versus low performers, innovators versus laggards). Alternatively, she can compare firms in a pair-wise manner listing similarities and differences across pairs of firms. Build and test hypotheses. Based on emergent concepts and themes that are generalizable across case sites, tentative hypotheses are constructed. These hypotheses should be compared iteratively with observed evidence to see if they fit the observed data, and if not, the constructs or relationships should be refined. Also the researcher should compare the emergent constructs and hypotheses with those reported in the prior literature to make a case for their internal validity and generalizability. Conflicting findings must not be rejected, but rather reconciled using creative thinking to generate greater insight into the emergent theory. When further iterations between theory and data yield no new insights or changes in the existing theory, “theoretical saturation” is reached and the theory building process is complete. Write case research report. In writing the report, the researcher should describe very clearly the detailed process used for sampling, data collection, data analysis, and hypotheses development, so that readers can independently assess the reasonableness, strength, and consistency of the reported inferences. A high level of clarity in research methods is needed to ensure that the findings are not biased by the researcher’s preconceptions. Interpretive Case Research Exemplar Perhaps the best way to learn about interpretive case research is to examine an illustrative example. One such example is Eisenhardt’s (1989)11 study of how executives make decisions in high-velocity environments (HVE). Readers are advised to read the original paper published in Academy of Management Journal before reading the synopsis in this chapter. In this study, Eisenhardt examined how executive teams in some HVE firms make fast decisions, while those in other firms cannot, and whether faster decisions improve or worsen firm performance in such environments. HVE was defined as one where demand, competition, and technology changes so rapidly and discontinuously that the information available is often inaccurate, unavailable or obsolete. The implicit assumptions were that (1) it is hard to make fast decisions with inadequate information in HVE, and (2) fast decisions may not be efficient and may result in poor firm performance.