Quantitative Analysis:
Inferential Statistics
Inferential statistics are the statistical procedures that are used to reach conclusions
about associations between variables. They differ from descriptive statistics in that they are
explicitly designed to test hypotheses. Numerous statistical procedures fall in this category,
most of which are supported by modern statistical software such as SPSS and SAS. This chapter
provides a short primer on only the most basic and frequent procedures; readers are advised to
consult a formal text on statistics or take a course on statistics for more advanced procedures.
Basic Concepts
British philosopher Karl Popper said that theories can never be proven, only disproven.
As an example, how can we prove that the sun will rise tomorrow? Popper said that just
because the sun has risen every single day that we can remember does not necessarily mean
that it will rise tomorrow, because inductively derived theories are only conjectures that may or
may not be predictive of future phenomenon. Instead, he suggested that we may assume a
theory that the sun will rise every day without necessarily proving it, and if the sun does not
rise on a certain day, the theory is falsified and rejected. Likewise, we can only reject
hypotheses based on contrary evidence but can never truly accept them because presence of
evidence does not mean that we may not observe contrary evidence later. Because we cannot
truly accept a hypothesis of interest (alternative hypothesis), we formulate a null hypothesis as
the opposite of the alternative hypothesis, and then use empirical evidence to reject the null
hypothesis to demonstrate indirect, probabilistic support for our alternative hypothesis.
A second problem with testing hypothesized relationships in social science research is
that the dependent variable may be influenced by an infinite number of extraneous variables
and it is not plausible to measure and control for all of these extraneous effects. Hence, even if
two variables may seem to be related in an observed sample, they may not be truly related in
the population, and therefore inferential statistics are never certain or deterministic, but always
probabilistic.
How do we know whether a relationship between two variables in an observed sample
is significant, and not a matter of chance? Sir Ronald A. Fisher, one of the most prominent
statisticians in history, established the basic guidelines for significance testing. He said that a
statistical result may be considered significant if it can be shown that the probability of it being
rejected due to chance is 5% or less. In inferential statistics, this probability is called the p-
130 | S o c i a l S c i e n c e R e s e a r c h
value, 5% is called the significance level (α), and the desired relationship between the p-value
and α is denoted as: p≤0.05. The significance level is the maximum level of risk that we are
willing to accept as the price of our inference from the sample to the population. If the p-value
is less than 0.05 or 5%, it means that we have a 5% chance of being incorrect in rejecting the
null hypothesis or having a Type I error. If p>0.05, we do not have enough evidence to reject
the null hypothesis or accept the alternative hypothesis.
We must also understand three related statistical concepts: sampling distribution,
standard error, and confidence interval. A sampling distribution is the theoretical
distribution of an infinite number of samples from the population of interest in your study.
However, because a sample is never identical to the population, every sample always has some
inherent level of error, called the standard error. If this standard error is small, then statistical
estimates derived from the sample (such as sample mean) are reasonably good estimates of the
population. The precision of our sample estimates is defined in terms of a confidence interval
(CI). A 95% CI is defined as a range of plus or minus two standard deviations of the mean
estimate, as derived from different samples in a sampling distribution. Hence, when we say that
our observed sample estimate has a CI of 95%, what we mean is that we are confident that 95%
of the time, the population parameter is within two standard deviations of our observed sample
estimate. Jointly, the p-value and the CI give us a good idea of the probability of our result and
how close it is from the corresponding population parameter.
General Linear Model
Most inferential statistical procedures in social science research are derived from a
general family of statistical models called the general linear model (GLM). A model is an
estimated mathematical equation that can be used to represent a set of data, and linear refers to
a straight line. Hence, a GLM is a system of equations that can be used to represent linear
patterns of relationships in observed data.
Figure 15.1. Two-variable linear model
The simplest type of GLM is a two-variable linear model that examines the relationship
between one independent variable (the cause or predictor) and one dependent variable (the
effect or outcome). Let us assume that these two variables are age and self-esteem respectively.
The bivariate scatterplot for this relationship is shown in Figure 15.1, with age (predictor)
along the horizontal or x-axis and self-esteem (outcome) along the vertical or y-axis. From the
scatterplot, it appears that individual observations representing combinations of age and selfesteem
generally seem to be scattered around an imaginary upward sloping straight line.
Add Your Gadget Here
HIGHLIGHT OF THE WEEK
-
Quantitative Analysis: Descriptive Statistics Numeric data collected in a research project can be analyzed quantitatively using statistical...
-
Rigor in Interpretive Research While positivist research employs a “reductionist” approach by simplifying social reality into parsimonious ...
-
Hermeneutic Analysis Hermeneutic analysis is a special type of content analysis where the researcher tries to “interpret” the subjective me...
-
selectively sampled to validate the central category and its relationships to other categories (i.e., the tentative theory). Selective codi...
-
Survey Research Survey research a research method involving the use of standardized questionnaires or interviews to collect data about peop...
-
Qualitative Analysis Qualitative analysis is the analysis of qualitative data such as text data from interview transcripts. Unlike quantita...
-
Non-Probability Sampling Nonprobability sampling is a sampling technique in which some units of the population have zero chance of selectio...
-
Benefits and Challenges of Interpretive Research Interpretive research has several unique advantages. First, they are well-suited for explo...
-
Experimental Research Experimental research, often considered to be the “gold standard” in research designs, is one of the most rigorous of...
-
The easiest way to test for the above hypothesis is to look up critical values of r from statistical tables available in any standard text ...
Sunday 13 March 2016
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment