Recently, my wife and I met my boss
for dinner to chat career prospects. The conversation was mild (like the curry,
and no complaints on both fronts!) and thought provoking. As we walked home dazed
and enlightened in equal measure, something lingered in my mind. While on the
subject of research interests, Pammi had been asked to elaborate a bit about
her analytical skills. What had me thinking was not the subject of the question
- it is commonplace for PIs to try and gauge the skill-sets that budding
researchers already possess, or wish to acquire. Rather, I tried to wrap my
mind around how the question had been phrased. In the context of discussing
potentially interesting research projects, it had been presented as though statistical
knowledge, rather than merely a set of skills and tools functioning to unravel
biologically interesting phenomena, ought to have been a stand-alone research
interest all by itself.
The very word “Statistics” frequently,
and I think somewhat prematurely, sounds intimidating to an unconditioned ear.
Even as I finish writing this sentence, I smother a smile – it is somewhat illogical
to blindly categorize something completely novel as “intimidating”. Yet people
frequently foster preconceived notions; it is one of the things that make us
all too human. Keeping this in mind, the agenda here is neither to elaborate on
the power of specific statistical tests like Linear Modeling or Chi-squared,
nor riddle this piece with Scatterplots, coefficients, and levels of
significance. Rather, it is to ideologically assess the scientific uniqueness of statistics
and thereby, dispel some trepidations or misconceptions that may set in before
an amateur researcher embarks on the statistical voyage…
In America, Asian students are
often stereotyped as being both obsessed and sensational with numbers. “You’ll
get an A in Stats because you’re from India!”. This is one of the biggest myths to come out of attending graduate school in the US (aside from the idea that it is a necessary
path to take to pay off student loans!), for at least three reasons. First,
such statements are almost never backed up by verifiable accounts or,
ironically enough, by statistical data. Second, there are plenty of kids in
India who, left to themselves, would give up a good chunk of their math tuition
time in exchange for more hours on Facebook or Twitter. The structure of
India’s education system is such that kids are over-worked, and “learning” math
functions as a survival effort that fizzles out once a secure job position is
procured. Third, although it is fundamentally based on numbers, applying
Statistical tests on a day-to-day basis is somewhat different from what is
surgically implanted into our brains in school as ‘Math’.
Picking up on this last point provides the
perfect basis to ask whether statistics functions more as a tool than as a
stand-alone science. To better assess this, it is beneficial to first recap the
scientific approach. Scientific endeavor, irrespective of discipline, adheres
for the most part to a single, stepwise paradigm: (1) pick an interesting
phenomenon, (2) generate a set of hypotheses and make specific predictions
under each, (3) gather data (whether experimental or natural) to test these,
(4) analyze this data, and (5) draw conclusions and speculations from the
results. So the ‘tool’ rather than ‘stand-alone science’ argument stems from
the idea that the field (Statistics) constitutes but one step (4) of this
paradigm, rather than the entire paradigm being applicable to the field! In
other words, before choosing the right kind of test or model (4), it is
imperative to know what it is to be tested (1-3)! This may largely depend on
the skill with which facts are handled to generate questions, or testable
hypotheses.
Yet scientific phenomena are seldom
set in stone. For instance, new postulates are constantly being added even to
well-established theories like natural selection, based on exploratory studies
of hitherto poorly understood systems. To better understand such novel systems,
researchers have to abandon the step-wise paradigm, and ‘boldly go where no one
has gone before’. So they resort to throwing the kitchen’s sink,
or assessing the impact of several possible factors that might affect an
outcome, rather than designing and testing specific statistical equations (or models) that are constructed based on
informed hypotheses. For instance, say that a completely novel infectious
pathogen has recently been isolated from some animals within a wildlife population.
Without any prior information on the pathogen, it is impossible to construct
specific hypotheses. Instead, investigators would have to ‘explore’ a wide
variety of possibilities – environmental infection, types of contact with other
animals, potential insect carriers or transmitters, to name a few -- that may
have influenced the outcome of why some individuals were infected and others were not. In such cases, statistical analysis, rather than being informed by
hypothesis construction, actually facilitates subsequent hypothesis
construction that may be more generally applicable to similar types of
pathogens and wildlife populations in the future. But does that swing
the pendulum back in the direction of statistics holding its ground as a unique
science? Only if the definition: “Statistics: the science of throwing the kitchen’s sink” has a
serious ring to it! Here statistics ventures into the art realm. Still
functioning as a tool, statistical knowledge in such contexts additionally
involves the skill with which a researcher
decides how best to represent the effects of an entire suite of characteristics
that may affect a desired outcome. Diagnostic plots, graphs, and charts that help better visualize what is going on will precede mathematical equations and models. These are constructed much later and after some basic knowledge about the system is gleaned.