Quantitative Analysis:
Descriptive Statistics
Numeric data collected in a research project can be analyzed quantitatively using
statistical tools in two different ways. Descriptive analysis refers to statistically describing,
aggregating, and presenting the constructs of interest or associations between these constructs.
Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this
chapter, we will examine statistical techniques used for descriptive analysis, and the next
chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative
data analysis is conducted using software programs such as SPSS or SAS. Readers are advised
to familiarize themselves with one of these programs for understanding the concepts described
in this chapter.
Data Preparation
In research projects, data may be collected from a variety of sources: mail-in surveys,
interviews, pretest or posttest experimental data, observational data, and so forth. This data
must be converted into a machine-readable, numeric format, such as in a spreadsheet or a text
file, so that they can be analyzed by computer programs like SPSS or SAS. Data preparation
usually follows the following steps.
Data coding. Coding is the process of converting data into numeric format. A codebook
should be created to guide the coding process. A codebook is a comprehensive document
containing detailed description of each variable in a research study, items or measures for that
variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e.,
whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a
five-point, seven-point, or some other type of scale), and how to code each value into a numeric
format. For instance, if we have a measurement item on a seven-point Likert scale with anchors
ranging from “strongly disagree” to “strongly agree”, we may code that item as 1 for strongly
disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between.
Nominal data such as industry type can be coded in numeric form using a coding scheme such
as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course,
nominal data cannot be analyzed statistically). Ratio scale data such as age, income, or test
scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated
into a different form than the format used for data collection. For instance, for measuring a
construct such as “benefits of computers,” if a survey provided respondents with a checklist of
120 | S o c i a l S c i e n c e R e s e a r c h
benefits that they could select from (i.e., they could choose as many of those benefits as they
wanted), then the total number of checked items can be used as an aggregate measure of
benefits. Note that many other forms of data, such as interview transcripts, cannot be
converted into a numeric format for statistical analysis. Coding is especially important for large
complex studies involving many variables and measurement items, where the coding process is
conducted by different people, to help the coding team code data in a consistent manner, and
also to help others understand and interpret the coded data.
Data entry. Coded data can be entered into a spreadsheet, database, text file, or
directly into a statistical program like SPSS. Most statistical programs provide a data editor for
entering data. However, these programs store data in their own native format (e.g., SPSS stores
data as .sav files), which makes it difficult to share that data with other statistical programs.
Hence, it is often better to enter data into a spreadsheet or database, where they can be
reorganized as needed, shared across programs, and subsets of data can be extracted for
analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a
spreadsheet such as Microsoft Excel, while larger dataset with millions of observations will
require a database. Each observation can be entered as one row in the spreadsheet and each
measurement item can be represented as one column. The entered data should be frequently
checked for accuracy, via occasional spot checks on a set of items or observations, during and
after entry. Furthermore, while entering data, the coder should watch out for obvious evidence
of bad data, such as the respondent selecting the “strongly agree” response to all items
irrespective of content, including reverse-coded items. If so, such data can be entered but
should be excluded from subsequent analysis.
Missing values. Missing data is an inevitable part of any empirical data set.
Respondents may not answer certain questions if they are ambiguously worded or too
sensitive. Such problems should be detected earlier during pretests and corrected before the
main data collection process begins. During data entry, some statistical programs automatically
treat blank entries as missing values, while others require a specific numeric value such as -1 or
999 to be entered to denote a missing value. During data analysis, the default mode of handling
missing values in most software programs is to simply drop the entire observation containing
even a single missing value, in a technique called listwise deletion. Such deletion can
significantly shrink the sample size and make it extremely difficult to detect small effects.
Hence, some software programs allow the option of replacing missing values with an estimated
value via a process called imputation. For instance, if the missing value is one item in a multiitem
scale, the imputed value may be the average of the respondent’s responses to remaining
items on that scale. If the missing value belongs to a single-item scale, many researchers use the
average of other respondent’s responses to that item as the imputed value. Such imputation
may be biased if the missing value is of a systematic nature rather than a random nature. Two
methods that can produce relatively unbiased estimates for imputation are the maximum
likelihood procedures and multiple imputation methods, both of which are supported in
popular software programs such as SPSS and SAS.
Data transformation. Sometimes, it is necessary to transform data values before they
can be meaningfully interpreted. For instance, reverse coded items, where items convey the
opposite meaning of that of their underlying construct, should be reversed (e.g., in a 1-7 interval
scale, 8 minus the observed value will reverse the value) before they can be compared or
combined with items that are not reverse coded. Other kinds of transformations may include
creating scale measures by adding individual scale items, creating a weighted index from a set
Q u a n t i t a t i v e A n a l y s i s : D e s c r i p t i v e S t a t i s t i c s | 121
of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing
incomes into income ranges).
Add Your Gadget Here
HIGHLIGHT OF THE WEEK
-
Survey Research Survey research a research method involving the use of standardized questionnaires or interviews to collect data about peop...
-
Inter-rater reliability. Inter-rater reliability, also called inter-observer reliability, is a measure of consistency between two or more i...
-
discriminant validity is exploratory factor analysis. This is a data reduction technique which aggregates a given set of items to a smalle...
-
can estimate parameters of this line, such as its slope and intercept from the GLM. From highschool algebra, recall that straight lines can...
-
Positivist Case Research Exemplar Case research can also be used in a positivist manner to test theories or hypotheses. Such studies are ra...
-
Quantitative Analysis: Descriptive Statistics Numeric data collected in a research project can be analyzed quantitatively using statistical...
-
Probability Sampling Probability sampling is a technique in which every unit in the population has a chance (non-zero probability) of being...
-
Experimental Research Experimental research, often considered to be the “gold standard” in research designs, is one of the most rigorous of...
-
Bivariate Analysis Bivariate analysis examines how two variables are related to each other. The most common bivariate statistic is the biva...
-
Case Research Case research, also called case study, is a method of intensively studying a phenomenon over time within its natural setting ...
Sunday, 13 March 2016
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment