Students

Multiple-choice test on basic material

Here is a multiple choice test for you to attempt. There are 30 items and I would expect that most departments would give you an hour to sit the test. Hence try to time yourself and answer within the hour. Answers are given only after you have submitted an answer to each question and you will then receive a score out of 30 and your percentage score.

The material tested is basic and only two condition statistical tests are involved. Hence, it would be appropriate, in many universities, to the first year, semester or term of methods teaching.

Study for use with questions 5–19

Participants were allocated at random to two conditions of an experiment. In one condition, participants were asked to learn a list of 15 words while reciting a string of digits. In the other condition, the participants were asked to learn the words but with no digit recitation. Both groups were tested for recall a short while after learning the words.

Digit recitation condition		Control condition
Participant	Correct words recalled	Participant	Correct words recalled
A	8	K	11
B	11	L	12
C	8	M	8
D	12	N	14
E	6	O	13
F	4	P	12
G	8	Q	15
H	13	R	12
I	5	S	13
J	7	T	13

The researcher predicted that digit recitation would interfere with word recall. The difference between recall scores in the digit recitation and control conditions was found to be significant with p < .05

Study design description for questions 23 and 24

A psychology tutor used his own class in an experiment to investigate the effects of caffeine on reaction time. In a research methods workshop he told students they would receive a liquid to drink and that they would then perform a reaction time task. Students were given either non-diet Red Bull, non-diet cola or decaffeinated diet cola. The data were never published.

Glossary

A

(α) Alpha
Percentage of the probability area under H0 that forms the ‘rejection region’; level set for acceptable probability of Type I error under H0.

(β) Beta
If the null hypothesis is not true, this is the probability that a Type II error will be made.

A priori comparisons/planned comparisons
Tests of differences between selected means, or sets of means, which, from prior theory, were predicted to differ.

Action research
Practical intervention in everyday situations, often organisations, using applied psychology to produce change and monitor results.

Adjacent value
On a box-plot the first value of the data set inside either of the outer fences, nearer to median.

Alternative hypothesis (H1)
Assumption that an effect exists (e.g., that populations differ or population correlation is not zero).

Analysis of co-variance (ANCOVA)
Statistical procedure that performs an ANOVA while partialling out the effect of a variable that correlates with the dependent variable (the ‘co-variate’).

Analysis of variance (ANOVA)
Statistical technique that compares variances within and between samples in order to estimate the significance of differences between a set of means.

Analysis
Investigation of data for patterns or evidence of an effect.

Analytic induction
Method of moving from particular to general via instances; theory is modified in the light of features of new instances.

Analytic procedure
The methodological procedure used to analyse data and its epistemological justification; usually located in methods sections of qualitative reports.

Anonymity
Keeping participant’s or client’s identity away from publication or any possible inadvertent disclosure.

Asymmetrical order effect
Order effect that has greater strength in one particular order and where, therefore, counterbalancing would be ineffective.

Attitude scales

Likert
Scale using a response format where respondents select from an ordered range, e.g., ‘strongly disagree’ (1), ‘disagree’ (2) etc., and a ranked score is given to the response as shown in the brackets.

Semantic differential
Scale measuring meaning of an object for the respondent by having them place it between the extremes of several bi-polar adjectives.

Thurstone
Scale in which raters assess the relative strength of each item and respondents agreeing with that item receive the average ‘scale value’ for it.

Visual analogue
Scale where respondents mark their position on a line between two polar opposites and the distance of their mark from one extreme is measured and becomes a score.

Attrition
Loss of participants from a research study.

Axial coding
Procedure following open coding in some versions of grounded theory; promoted by Glaser (1998) but seen as distracting by some (see text).

B

b weight
The amount by which a criterion variable will increase for a one-unit increase in a predictor variable; a predictor’s coefficient in the multiple regression equation.

Back translation
System of translating a psychological scale from language A into language B and then back to language A again to ensure equivalence.

Backward hierarchical downwards log-linear analysis
Removing interactions from a saturated log-linear model moving towards one-way effects.

Bar chart
Chart in which (usually) the x-axis represents a categorical variable and the y-axis can represent frequency, average, percentage, etc.

Baseline measure
Measure of what would occur if no experimental level of the independent variable were applied; how ‘untreated’ participants perform.

Beta value
Standardised b weights (i.e., as expressed in standard deviations).

Between conditions variation
Variation, calculated in a repeated measures design, which comes from how scores vary between the conditions with the between subjects variance accounted for.

Between groups ANOVA
ANOVA analysis where only unrelated factors are involved.

Between groups sum of squares
Sum of squares of deviations of sample means from the grand mean.

Between groups variance
Variance of sample means around grand mean.

Between subjects variance
The variance among data attributable to variation among the participants’ overall performances.

Bibliography
A list of sources used, but not cited, in the preparation of an essay or report. Not required in psychology reports.

Bi-modal distribution
Data set with two modes.

Binomial sign test (S)
Nominal-level test for difference between two sets of paired/related data using direction of each difference only.

Biserial (correlation coefficient)
Correlation used where one variable is artificially dichotomous; formed by categorising from an underlying continuous and normal distribution.

Bonferroni t tests
Procedure for testing means pairwise, which involves raising the critical values of t.

Box-plot
Exploratory data chart showing median, central spread of data and position of relative extremes.

C

Capitalising on chance
Making too many tests with α set at .05 on the same data, hence increasing the likelihood of a Type I error.

Categorical variable
Variable where cases are merely placed into independent, separate categories.

Ceiling effect
Occurs where measure produces most values near the top end of a scale.

Census
Survey of whole population.

Central limit theorem
Used in the theoretical estimation of the standard error of a sampling distribution from the standard deviation of a sample.

Central tendency
Formal term for any measure of the typical or middle value in a group.

Chi-square (χ²)
Statistic used in tests of association between two unrelated categorical variables. Also used in goodness-of-fit test, log-linear analysis and several other tests.

Chi-square change
Change in chi-square as interactions are removed from the saturated model in log-linear analysis.

Class intervals
Categories into which a continuous data scale can be divided in order to summarise frequencies.

Clinical method
Interview method using structured questions but may be tailored in response to interviewee’s answers; seeks to test specific hypothesis.

Closed questions
Question with only a specified set of responses that the respondent can choose from, e.g., ‘yes/no’.

Code (coding)
Quantifying by giving similar observed instances of behaviour a symbol.

Coding
Giving ‘dummy’ numbers to discrete levels of an independent variable.

Coding unit
Item categories identified in qualitative data using content analysis.

Cohort
Large sample of people, often children of the same age, identified for longitudinal or cross-sectional study.

Cohort effect
Confounding in cross-sectional study when two different age groups have had quite different experiences.

Collaborative research
Research in which participants are fully involved to the extent of organising their own processes of research and change. Researcher as consultant.

Collectivist
System of social norms and beliefs in which the individual’s needs and aspirations are subsidiary to those of the group, often the family. Duty and responsibility to others rule over independence and self-seeking goals.

Collinearity
Extent of correlations between predictor variables in multiple regression.

Confidence limits/intervals
Estimated limits (e.g., ‘with 95% confidence’) to the likely range (interval) within which a population mean lies, based on an estimate from a sample mean and standard error.

Confidentiality
Keeping data from participants or clients away from publication.

Confounding variable
Variable that is uncontrolled and obscures any effect sought, varying with the independent variable in a systematic manner.

Constant comparative analysis
Regular checking of the emergent category system (in GT) with raw data and sub-categories in order to rearrange and produce the tightest fit.

Constructivism
Theory holding knowledge to be relative and ‘facts’ to be social constructions, not permanent realities.

Content analysis
Search of qualitative materials (especially text) to find ‘coding units’ (usually words, phrases or themes); analysis often concentrates on quantitative treatment of frequencies but can be a purely qualitative approach.

Contextualist constructionist
Theory of knowledge (epistemological position), which sees knowledge and truth as relative; different versions are possible depending on the context in which knowledge claims are made.

Continuous scale/variable
Scale where there are no discrete steps; theoretically, all points along the scale are meaningful.

Control group
Group used as baseline measure against which the performance of the experimental group is assessed.

Co-operative enquiry
Investigation involving researcher and participants working together.

Correlation
A (standardised) measure of relationship of co-variance between two variables.

Coefficient
Number signifying strength of correlation between two variables.

Curvilinear
Correlation between two variables with low r value because the relationship does not fit a straight line but a good curve.

Negative
Correlation where, as values of one variable increase, related values of another variable tend to decrease.

Positive
Correlation where, as values of one variable increase, related values of another variable also tend to increase.

Correlational study
Study of the extent to which one variable is related to another, often referring to non-manipulated variables measured outside the laboratory.

Counterbalancing
Half participants do conditions in a particular order and the other half take the conditions in the opposite order – this is done to balance possible order effects.

Co-variate
A variable that correlates with a dependent variable on which two groups differ and which can be partialled out using ANCOVA.

Cramer’s phi or V
General statistic used to estimate effect size in chi-square analyses.

Criterion/target/dependent variable
Variable on which values are being predicted in regression.

Critical value
Value that the result of the test statistic (e.g., z) must reach in order for the null hypothesis to be rejected.

Cross-cultural study
Comparative study of two or more different societies, or social/ethnic sub-groups.

Cross-generational
Confounding occurring when one longitudinally studied group is compared with another that has generally had quite different social experiences.

Cross-lagged

Cross-sectional
Comparative study of several cross-sectional groups taken at intervals in the short term, over a relatively short period (say, two or three years) longitudinal study.

Cross-tabs table
Term for table of frequencies on levels of a variable by levels of a second variable.

Cultural relativity
View that a person’s behaviour and characteristics can only be understood through that person’s own cultural environment.

Cumulative frequency
Distribution (table or chart) that shows the number of cases that have occurred up to and including the current category.

D

d, Cohen’s
Measure of effect size; used here in calculating power.

Data set
Group of data points or values that can be summarised or analysed.

Data
Relatively uninterpreted information; gathered facts.

Debriefing
Informing participants about the full nature and rationale of the study they’ve experienced, and attempting to reverse any negative influence.

Deception
Leading participants to believe that something other than the true independent variable is involved, or withholding information such that the reality of the investigative situation is distorted.

Deciles
Points on a measured scale that mark off each 10% of the data set or population.

Deduction
Logical argument using rules to derive a conclusion from premises.

Degrees of freedom
Common term in statistical analysis having to do with the number of individual data points that are free to vary given that overall summary values are known.

Delta (δ)
Statistic used to estimate power using effect size.

Demand characteristics
Cues in a study that help the participant to work out what is expected.

Dependent variable (DV)
Variable that is assumed to be directly affected by changes in the independent variable in an experiment.

Derived etic
General/universal psychological construct modified from its origin in one culture after researcher’s immersion in one or more new cultures.

Design
Structure and strategy of a piece of research.

Deviation score/value
Amount by which a particular score differs from the mean of its set.

Diagnostic item
Item not obviously or directly connected to the attitude object, yet which correlates well with overall scores and therefore has discriminatory power and predictive power.

Diary method
Data-gathering method where participant makes regular (often daily) record of relevant events.

Dichotomous variable
Variable with just two exhaustive values (e.g., male/female).

Difference mean
Mean of differences between pairs of scores in a related design.

Directional hypothesis
Hypothesis that states which way a difference or correlation exists – e.g., population mean A > population mean B, or correlation is negative.

Disclosure
Letting people know that they are the object of observation.

Discourse analysis (DA)
Qualitative analysis of interactive speech, which assumes people use language to construct the world; talk is organised according to context and personal stake; it is not evidence of internal psychological processes.

Discrete scale/variable
Scale on which not all subdivisions are meaningful; often one where the underlying construct to be measured can only come in whole units (e.g., number of children).

Discriminatory power
Extent to which an item, or the test as a whole, separates people along the scoring dimension.

Disguise
Feature of questioning approach that keeps respondents ignorant of the aims of the questioning.

Dispersion
Technical and general term for any measure of the spread of values in a sample of data or population.

Distribution dependent test
Significance test using estimations of population parameters.

Distribution free test
Significance test that does not depend on estimated parameters of an underlying distribution.

Distribution
Shape and spread of data sets/populations.

Double blind
Experimental procedure where neither participants nor data gatherers/assessors know which treatment participants have received.

E

Effect
A difference or correlation between samples leading to an assumed relationship between variables in the population.

Effect size
The size of the effect being investigated (difference or correlation) as it exists in the population.

Emergent theory
Theory that emerges from data as they are analysed; not based on prior research literature.

Emic
Psychological construct applicable within one or only a few cultures.

Empirical method
Scientific method of gathering information and summarising it in the hope of identifying general patterns.

Enlightenment
Tendency for people to be familiar with psychological research findings.

Epistemology
Theory of knowledge and of how knowledge is constructed.

Epsilon
Statistic calculated for use when the sphericity assumption is violated; df are multiplied by this statistic in order to reduce them and avoid Type II errors.

Equal probability selection method (epsem)
Procedure for producing a sample into which every case in the target population has an equal probability of being selected.

Error between subjects
Error term associated with the between groups portion of the sum of squares division in a mixed ANOVA design.

Error rate per comparison
Given the significance level set, the likelihood of a Type I error in each test made on the data if H0 is true.

Error sum of squares
Sum of squares of deviations of each score from its own group mean (also: within groups SS).

Error variance
Total variance of all scores from their group means, caused by the operation of randomly acting variables (also: within groups variance).

Error within subjects
Error term associated with the within subjects portion of the sum of squares division in a mixed design.

Eta-squared Η²
Measure of effect size.

Ethnocentrism
Bias of viewing and valuing another’s culture from one’s own cultural perspective.

Etic
Universal psychological construct, applicable to all cultures.

Evaluation apprehension
Participants’ concern about being tested, which may affect results.

Event coding
Recording pre-specified behavioural events as they occur.

Expected frequencies
Frequencies expected in table if no association exists between variables – i.e., if the null hypothesis is true.

Experiment
Study in which an independent variable is manipulated under strictly controlled conditions.

Experimental designs

Factorial design
Experiment in which more than one independent variable is manipulated.

Independent samples (between groups; one group of participants groups/subjects independent/unrelated)
Each condition of the independent variable is experienced by only.

Matched pairs
Each participant in one group/condition is paired on specific variable(s) with a participant in another group/condition.

Repeated measures (within subjects/groups)
Each participant experiences all levels of the independent variable.

Related
Design in which individual scores in one condition can be paired with individual scores in other conditions.

Single participant
Design in which only one participant is tested in several trials at all independent variable levels.

Small N design
Design in which there is only a small number of participants, typically in clinical or counselling work but also where participants need substantial training for a highly skilled task.

Unrelated
Design in which individual scores in one condition cannot be paired (or linked) in any way with individual scores in any other condition.

Experimental realism
Effect of attention-grabbing, interesting experiment in compensating for artificiality or demand characteristics.

Experimenter expectancy
Tendency for experimenter’s knowledge of what is being tested to influence the outcome of research.

Exploratory data analysis
Close examination of data by a variety of means, including visual display, before submitting them to significance testing; recommended by Tukey.

Extraneous variable
Anything other than the independent variable that could affect the dependent variable; it may or may not have been allowed for and/or controlled.

F

F test/ratio
Statistic giving ratio of between groups to within groups variance.

Face-to-face
Interview in which researcher and interviewee talk together in each other’s presence.

Factor
Independent variable in a multi-factorial design.

Factor analysis
Statistical technique, using patterns of test or sub-test correlations, that provides support for theoretical constructs by locating correlational ‘clusters’ and identifying explanatory factors.

Factorial ANOVA
Design involving the analysis of the effects of two or more factors (independent variables) on differences between group means.

Falsifiability
Principle that theories must be defined in a way that makes it possible to show how they could be wrong.

Family-wise error rate
The probability of making at least one Type I error in all the tests made on a set of data, assuming H₀ is true.

Field experiment
Experimentally designed field study.

Field study
Study carried out outside the laboratory and usually in the participants’ normal environment.

Fishing
Term used to describe situation where a student/researcher uses a lot of measures and investigates results to see if there are any significant differences or correlations. Frowned upon because it is likely to generate many Type I errors.

Floor effect
Phenomenon where measure produces very many low scores.

Focus group
Group, often with common interest, who meet to discuss an issue in a collective interview.

Frequency data/ Frequencies
Numbers of cases in specific categories.

Frequency distribution
Distribution showing how often certain values occur.

Frequency polygon
Histogram showing only the peaks of class intervals.

Frequency
How often a certain event (e.g., score) occurs.

Friedman’s (χ²) test
Non-parametric rank test for significant differences between two or more related samples.

G

Goodness of fit
Test of whether a distribution of frequencies differs significantly from a theoretical pattern.

Grand mean
Mean of all scores in a data set, irrespective of conditions or groups.

Grounded theory (GT)
Theory driving the analysis of qualitative data in which patterns emerge from the data and are not imposed on them before they are gathered.

Group difference study
A study that compares the measurement of an existing variable in two contrasting groups categorised by long-term or inherent characteristics such as sex, gender, ethnicity, personality, social class and so on.

H

Halo effect and reverse halo effect, devil effect, horns effect
Tendency for people to judge a person’s characteristics as positive if they have already observed one central trait to be positive or have gained an overall positive first impression. Reverse effect occurs if an initial negative impression causes traits to be assessed negatively.

Hawthorne effect
Effect on human performance caused solely by the knowledge that one is being observed.

Heteroscedasticity
Degree to which the variance of residuals is not similar across different values of predicted levels of the criterion in multiple regression.

Hinge position
For constructing a box-plot, the position from the bottom of the data set where the first quartile falls and the position from the top of the data set where the third quartile falls.

Hinge spread
On a box-plot, the distance between the lower and upper hinges.

Histogram
Chart containing whole of a continuous data set divided into proportional class intervals.

Homogeneity of variance
Situation where sample variances are the same or similar.

Hypothesis
Precise statement of assumed relationship between variables.

Hypothesis-testing
Research that analyses data for a predicted effect.

Hypothetical construct
Phenomenon or construct assumed to exist, and used to explain observed effects, but as yet unconfirmed; stays as an explanation of effects while evidence supports it.

Hypothetico-deductive
Method of recording observations, developing explanatory theories and testing predictions from those theories.

I

Idiographic
Approach that emphasises unique characteristics and experiences of the individual, not common traits.

Imposed etic
Psychological construct from researcher’s own culture, applied to a new culture without modification.

Independent variable (IV)
Variable which experimenter manipulates in an experiment and which is assumed to have a direct effect on the dependent variable.

Individualistic
System of social norms and beliefs where individual needs and goals dominate over responsibility to others. The self is paramount and independence from others is a primary value.

Induction
Process of moving from particular instances to a generalised pattern.

Inductive analysis
Work with qualitative data, which permits theory and hypotheses to evolve from the data rather than hypothetico-deductive testing of hypotheses set before data are obtained.

Inferential test/statistics
Procedures for making inferences about whole populations from which samples are drawn, e.g., significance tests.

Informed consent
Agreement to participate in research in the full knowledge of the research context and participant rights.

Interaction effect
Significant effect where effect of one factor is different across levels of another factor.

Inter-observer reliability
Extent to which observers agree in their rating or coding.

Interpretive phenomenological analysis (IPA)
Approach that attempts to describe an individual’s experiences from their own perspective as closely as possible, but recognises the interpretive influence of the researcher on the research product.

Interquartile range
Distance between first and third quartile in a distribution.

Interval coding
Recording what behaviour is occurring, or the typical behaviour, in specified time intervals.

Intervention
Research that makes some alteration to people’s lives beyond the specific research setting, in some cases because there is an intention to ameliorate specific human conditions.

Involuntary participation
Taking part in research without agreement or knowledge of the study.

J

Jonckheere trend test
Non-parametric statistical test for the significance of a trend in the dependent variable across unrelated conditions.

K

Kruskal–Wallis test
Non-parametric between-groups test of difference between several groups (Mann-Whitney is the two-condition equivalent).

Kurtosis
Overall shape of a distribution in terms of height and width compared with normal distribution.

L

Leptokurtic distribution
Non-normal distribution that is closely bunched in the centre and tall.

Levels (of the IV)
The different values taken by the independent variable; often, the conditions of an experiment, e.g., levels of caffeine at 50mg, 100mg and 200mg in the investigation of memory recall.

Levels of measurement
Levels at which data are categorised or measured.

Interval
Level of measurement at which each unit on a scale represents an equal change in the variable measured.

Nominal
Level of measurement at which numbers are only labels for categories.

Ordinal
Level of measurement at which cases are arranged in rank positions.

Quasi-interval
Scale that appears to be interval but where equal intervals do not necessarily measure equal amounts of the construct.

Ratio
Interval-type scale where proportions on the scale are meaningful; usually an absolute zero exists.

Likelihood ratio chi-square
Type of chi-square statistic used in log-linear analysis.

Line chart
Chart joining continuous data points in a single line.

Linear coefficients
Values to be entered into an equation for calculating linear contrasts.

Linear contrasts
Procedure for testing between individual pairs of means or combinations of means, a priori (i.e., predicted).

Linear regression
Procedure of predicting values on a criterion variable from a predictor or predictors using correlation.

Linearity
Extent to which a relationship between two variables can be represented by a straight line rather than, say, a curved line.

Literature review
A review of relevant literature on the topic of the report. This must be used in the argument towards the hypotheses, predictions or aims.

Log-linear analysis
Analysis similar to chi-square but which will deal with three-way tables or greater.

Log-linear model
A theoretical and statistical structure proposed to explain cell frequency variation in a multi-way frequency table.

Longitudinal study
Comparative study of one individual or group over a relatively long period (possibly including a control group).

Lower hinge
On a box-plot, the first quartile.

M

Main effect
In a multi-factorial ANOVA analysis, the effect of one factor across all its levels, irrespective of any other factors.

Mann-Whitney U test
Ordinal-level significance test for differences between two sets of unrelated data.

MANOVA
Statistical procedure using ANOVA on more than one dependent variable.

Marginals
The total of each column and row, and the overall total of frequencies, in a cross-tabs table.

Mauchly’s test
Test of sphericity calculated in SPSS.

Mean (arithmetic)
Average of values found by adding them all and dividing by the number of values in the set.

Mean deviation
Measure of dispersion – mean of all absolute deviations.

Mean sum of squares
Sum of squares divided by df.

Measured variable
Variable where cases measured on it are placed on some sort of scale that has direction.

Median
Measure of central tendency; middle value of data set.

Median position/location
Position where median is to be found in an ordered data set.

Median split method
Dividing a set of measured values into two groups by dividing them into high and low at their median.

Meta-analysis
Statistical analysis of results of multiple equivalent studies of the same, or very similar, effects in order to assess validity more thoroughly.

Mixed design ANOVA
ANOVA analysis where both related and unrelated factors are involved.

Mixed methods
An emerging school of thought that promotes research where qualitative and qualitative methods are used together to answer for different aspects of the research question.

Mode/modal value
Measure of central tendency – most frequent value in a data set.

Multiple correlation coefficient
Value of the correlation between several combined predictor variables and a criterion variable.

Multiple regression
Technique in which the value of one ‘criterion’ variable is estimated using its known correlation with several other ‘predictor’ variables.

Mundane realism
Feature of design where experiment resembles everyday life but is not necessarily engaging.

N

Narrative psychology
Research approach that sees human activity as ‘storied’; that is, humans tend to recall and talk about their lives in stories rather than in a logical and factual manner.

Natural experiment
Events beyond researcher’s direct control but where an IV and DV can be identified.

Naturalistic design
Design in which experimenters investigate participants in their everyday environment.

Negative case analysis
Process of seeking contradictions of emergent categories or theory in order to adjust category system to incorporate and explain more of the data.

Negatively skewed
Description of distribution that has a longer tail of lower values.

Newman–Keuls post hoc analysis
Post hoc test of means pairwise; safe so long as number of means is relatively low.

Nomothetic
Approach that looks for common and usually measurable factors on which all individuals differ.

Non-directional hypothesis
Hypothesis that does not state in which direction a difference or correlation exists.

Non-directive interview
Interview in which the interviewer does not direct discussion and remains non-judgmental.

Non-equivalent groups
A possible confounding variable where two or more groups in an independent samples design experiment differ on a skill or characteristic relevant to the dependent variable.

Non-parametric test
Significance test that does not make estimations of parameters of an underlying distribution; also known as a distribution free test.

Normal distribution
Continuous distribution, bell-shaped, symmetrical about its mid-point.

Null hypothesis
Assumption of no effect in the population from which samples are drawn (e.g., no mean population difference or no correlation).

O

Observation types

Controlled
Observation in controlled setting, often a laboratory or observation room.

Indirect/archival
Observations not made on people directly but using available records.

Naturalistic
Observation without intervention in observed people’s own environment.

Participant
Observation in which observer takes part or plays a role in the group observed.

Structured/systematic
Observation that uses an explicitly defined coding framework for data recording.

Observational design
Study that is solely observational and does not include any experimentation.

Observational study
Research which gathers data by watching and recording behaviour.

Observational technique
Procedure using observation in some way and that may or may not be part of an experiment.

Observed frequencies
Frequencies obtained in a research study using categorical variables.

Observer bias
Threat to validity of observational recordings caused solely by characteristics of the observer.

One-tailed test
Test referring to only one tail of the distribution under H0; may be used if the alternative hypothesis is directional (but controversial).

Open-ended questions
Type of interview/questionnaire item to which interviewees respond at length.

Operational definition
Definition of phenomenon in terms of the precise procedures taken to measure it.

Order effect
A confounding effect caused by experiencing one condition, then another, such as practice or fatigue.

Outer fence
Extreme position on a box-plot being, for the lower fence, lower hinge - 1.5 x the hinge spread, and for the upper fence, upper hinge + 1.5 x the hinge spread.

Outliers
Values that fall more than 1.5 times the IQR above or below the most extreme values in the IQR set. Often removed from analysis of data set because they unnecessarily distort statistics, but this procedure must be openly reported.

P

p ≤ .01
Significance level preferred for greater confidence than that given by the conventional one and that should be set where research is controversial or a one-shot-only trial.

p ≤ .05
Conventional significance level.

p ≤ .1
Significance level generally considered too high for rejection of the null hypothesis but where, if p under H0 is this low, further investigation might be merited.

Page trend test
Test for repeated measures design with three or more levels where a specific order of magnitude for all the levels has been predicted, i.e., which one will be highest, next highest and so on.

Pairwise comparison

Panel design
Design in which the same group of participants is tested at the beginning and end of one interval or more.

Panel
Stratified group who are consulted in order for opinion to be assessed.

Paradigm
A prevailing agreed system of scientific thinking and behaviour within which research is conducted.

Paralinguistics
Body movements and vocal sounds that accompany speech and modify its meaning.

Parametric test
Relatively powerful significance test that uses estimations of population parameters; the data tested must usually therefore satisfy certain assumptions; also known as a distribution dependent test.

Partial correlation
Method of finding the correlation of A with B after the common variance of a third correlated variable, C, has been removed.

Participant expectancy
Effect of participants’ expectancy about what they think is supposed to happen in a study.

Participant variables
Person variables (e.g., memory ability) differing in proportion across different experimental groups, and possibly confounding results.

Participant
Person who takes part in a psychological investigation as a member of a sample or individual case.

Participative research
Research in which participants are substantially involved in the investigative process as active enquirers.

Pearson’s product moment correlation coefficient
Parametric measure of correlation.

Percentile
Point on a measured scale that marks off certain percentage of cases in an ordered data set.

Phenomenology
A philosophical approach that concentrates on the study of consciousness and the objects of direct experience. Mental awareness is primary.

Φ (Phi)
Phi statistic for estimating power in ANOVA analyses.

Phi coefficient (Φ)
Statistic used for effect size estimate in a 2 x 2 table after c2 analysis; also:

Phi coefficient
Measure of correlation between two truly dichotomous variables.

Pilot study/trials; piloting
Preliminary study or trials often carried out to predict snags and assess features of a main study to follow.

Placebo group
Group of participants who don’t receive the critical ‘treatment’ but do receive everything else the experimental group receives; used in order to eliminate placebo effects – participants may perform differently simply because they think they have received an effective treatment.

Plagiarism
Claiming that other authors’ work is your own, e.g., by not providing quotation marks or appropriate references. Plagiarism occurs whether or not the writer knew they were using another author’s exact words or structure.

Platykurtic distribution
Non-normal distribution that is widely spaced out and low in the centre.

Pleasing the experimenter
Tendency of participants to act in accordance with what they think the experimenter would like to happen.

Point biserial correlation
Measure of correlation where one variable is truly dichotomous and the other is at interval level.

Pooled variance
Combination of two sample variances into an average in order to estimate population variance.

Population
All possible members of a category from which a sample is drawn.

Population parameter
Statistical measure of a population (e.g., mean, standard deviation).

Positively skewed
Description of distribution that contains a longer tail of higher values.

Positivism
Methodological belief that the world’s phenomena, including human experience and social behaviour, are reducible to observable facts and the mathematical relationships between them. Includes the belief that the only phenomena relevant to science are those that can be measured.

Post facto research
Research where pre-existing and non-manipulated variables among people are measured for difference or correlation.

Post hoc comparisons/tests
Tests between means, or groups of means, conducted after inspection of data from initial analysis.

Power
1- β. The probability of not making a Type II error if a real effect exists; the probability of obtaining a case or sample above the level cut off by β in the population defined by the alternative hypothesis.

Power efficiency
Comparison of the power of two different tests of significance.

Predictor
Variable used in combination with others to predict values of a criterion variable in multiple regression.

Pre-test
Measure of participants before an experiment in order to balance or compare groups, or to assess change by comparison with scores after the experiment.

Primary reference
An original source that the writer has not read but about which they have obtained information in a secondary source.

Probability
A numerical measure of pure ‘chance’ (randomly based) occurrence of events.

Empirical
A measure of probability based on existing frequencies of occurrence of target events.

Logical
A measure of probability calculated from logical first principles.

Probability distribution
A histogram of the probabilities associated with the complete range of possible events.

Probe
General request for further information used in semi-structured interview.

Prompt
Pre-set request for further information used in semi-structured interview if the information is not offered spontaneously by interviewee on a particular item.

Psychometric test
Test that attempts to quantify through measurement psychological constructs such as skills, abilities, character, etc.

Psychometrist/psychometrician
Person who creates and is a specialist with psychometric tests.

Psychometry
The technology of test creation for the quantification of psychological constructs.

Q

Qualitative approach
Methodological stance gathering qualitative data which usually holds that information about human events and experiences, if reduced to numerical form, loses most of its important meaning for research.

Qualitative data
Data left in their original forms of meaning (e.g., speech, text) and not quantified in numerical form.

Quantitative approach
Methodological stance gathering quantitative data following a belief that science requires accurate measurement and quantitative data.

Quantitative data
Data in numerical form, i.e., counts or measurements, the results of measurement or counting.

Quartiles
Points on a measured scale that mark the 25th, 50th and 75th percentiles of a distribution.

Quasi-experiment
Experiment in which experimenter does not have complete control over all central variables.

R

Radical constructionist
Theory of knowledge (epistemological position) that sees knowledge and truth as semantic construction.

Random error
Any error possible in measuring a variable, excluding error that is systematic.

Random number
Number not predictable from those preceding it.

Randomisation
Putting stimulus items or trial types into random order for the purpose of elimination of order effects.

Randomise
To put the trials of, or stimuli used in, an experiment into an unbiased sequence, where prediction of the next item is impossible.

Randomly allocate
To put people into different conditions of an experiment on a random basis.

Range
Measure of dispersion – top to bottom value (plus one).

Range restriction
A selection of cases from a larger potential data set, which has the effect of distorting the true population correlation.

Raw data/scores
Untreated, unconverted values obtained directly from measuring process used in a study.

Reactive study/design

Realism
Theory of knowledge holding that there is a unitary reality in the world that can be discovered using the appropriate investigative methods.

Reflexivity
Researchers’ recognition that their personal perspective influences and/or constructs the research interpretation.

Regression coefficient
Amount by which predictor variable values are multiplied in a regression equation in order to estimate criterion variable values.

Regression line
Line of best fit on a scatterplot, which minimises residuals in regression.

Reification
Tendency to treat abstract concepts as real entities.

Rejection region
Area of (sampling) distribution where, if a result falls within it, H0 is rejected; the area cut off by the critical value.

Related t test
Parametric difference test for related data at interval level or above.

Relativism
Theory of knowledge holding that objective facts are an illusion and that knowledge is constructed by each individual through a unique personal framework.

Reliability
Extent to which findings or measures can be repeated with similar results; consistency of measures and consistency of a psychological scale.

Internal
Consistency between the items of a scale or test.

Cronbach’s alpha
A measure of scale reliability using the variance of respondents’ scores on each item in relation to overall variance on the scale.

External (or stability or test-retest method)
Stability of a test: its tendency to produce the same results when tested on the same people at two different times.

Internal
Consistency of a test within itself. Tendency for people to score at the same strength on similar items.

Item analysis
Checking each item in a scale by comparing its relationship with total scores on the scale.

Kuder–Richardson

Split-half
Correlation between scores on two equal parts of a test.

External
Consistency of a test with itself when administered more than once.

Test-retest
Testing of the same group of respondents twice on separate occasions in order to estimate external reliability.

Replication
Repeating a completed study.

Representative design
Extent to which the conditions of an experiment represent those outside the laboratory to which the experimental effect is to be generalized.

Research prediction
Prediction in precise terms about how variables should be related in the analysis of data if a hypothesis is to be supported.

Research question
The question a researcher is trying to answer in an investigation.

Residual (y – yˆ)
Difference between an actual score and what it would be as predicted by a predictor variable or by a set of predictor variables.

Respondent
Person who is questioned in an interview or survey.

Respondent validation/ Member checking
Attempt to validate findings and interpretations by presenting these to original participants for comments and verification.

Response (acquiescence) set
Tendency for people to agree with test items as a habitual response.

Right to privacy
Right that upholds people’s expectation that their personal lives will not be intruded upon by voluntary or involuntary research participation.

Robustness
Tendency of test to give satisfactory probability estimates even when data assumptions are violated.

Role-play
Study in which participants act out given parts.

S

S
See binomial sign test.

Sample
Group selected from population for an investigation.

Biased
Sample in which members of a sub-group of the target population are over- or under-represented.

Cluster
Groups in the population selected at random from among other similar groups and assumed to be representative of a population.

Convenience / Opportunity
Sample selected because they are easily available for testing.

Expert choice
See purposive sample below.

Haphazard
Sample selected from population with no conscious bias (but likely not to be truly random).

Purposive
Non-random sampling of individuals likely to be able to make a significant contribution to the data collection for a qualitative project either because of their specific experiences or because of their expertise on a topic.

Quota
Sample selected, not randomly, but so that specified groups will appear in numbers proportional to their size in the target population.

Representative

Self-selecting
Sample selected for study on the basis of members’ own action in arriving at the sampling point.

Simple random
Sample selected in which every member of the target population has an equal chance of being selected and all possible combinations can be drawn.

Stratified
Sample selected so that specified sub-groups will appear in numbers proportional to their size in the target population; within each sub-group cases are randomly selected.

Systematic (random)
Sample selected by taking every nth case from a list of the target population; ‘random’ if starting point for n is selected at random.

Sample statistic
Statistical measure of a sample (e.g., mean, standard deviation).

Sampling bias (or selection bias)
Systematic tendency towards over- or under-representation of some categories in a sample.

Sampling distribution (of means)
Theoretical distribution that would be obtained by taking the same statistic from many same size randomly selected samples (e.g., the mean).

Sampling error
Difference between a sample statistic and the true population statistic, usually assumed to be random in origin.

Sampling frame
The specified range of people from whom a sample will be drawn. Those within a population who can be sampled.

Saturated model
Model in log-linear analysis that explains all variation in a multi-way frequency table so that chi-square is zero and expected frequencies are the same as observed frequencies.

Saturation
Point in GT work where additional data make only trivial contributions and cannot alter the emerged framework of categories and themes.

Scale value
On a Thurstone scale, the average of judges’ ratings of an item; respondent is given this score if they agree with it.

Scatterplot
Diagram showing placement of paired values on a two-dimensional chart.

Scheffé post hoc analysis
Post hoc test that takes into account all possible comparisons of combinations of means (most conservative post hoc test).

Scientific method
General method of investigation using induction and deduction.

Secondary reference
Source in which the writer obtained information about an original or primary source.

Selective coding
Higher order treatment of initial themes and categories where superordinate themes may emerge that bind lower categories together.

Self-report method
A general term for methods in which people knowingly provide information about themselves.

Semi-interquartile range
Half the distance between first and third quartile in a distribution.

Semi-partial correlation
Correlation between a criterion variable B with the residuals of A, after A has been regressed on C. Removes the common variance of A and C from the correlation of A with B.

Semi-structured interview
Interview with pre-set list of topics but in which an informal conversational tone is attempted and the interviewer ‘plays it by ear’ as to whether sufficient information has been provided by the interviewee.

Sign test
See binomial sign test.

Significance levels
Levels of probability at which it is agreed to reject H0. If the probability of obtained results under H0 is less than the set level, H0 is rejected.

Significance test/decision
Test performed in order to decide whether the null hypothesis should be retained or rejected.

Simple effect
Occurs where one level of one factor has a significant effect across levels of another factor.

Simulation
Study in which participants re-create and play through, to some extent, a social interaction.

Single blind
Procedure in an experiment where either participants or data assessors do not know which treatment each participant received.

Skew/skewed distributions
Non-normal distributions that have a lot more scores on one side of the mode than on the other.

Social desirability
Tendency of research participants to want to ‘look good’ and provide socially acceptable answers.

Spearman-Brown correction
In split-half reliability testing, provides an estimate of the true split-half reliability value from the correlation between two test halves, recognising that the raw split-half correlation is based on a set of items only half the length of the actual scale.

Spearman’s rho
Non-parametric, ordinal level measure of correlation; Pearson correlation on ranks of the paired raw scores.

Sphericity
Condition where there is homogeneity of variance among treatment variables and the variances of their differences are also similar.

Standard deviation
Measure of dispersion – the square root of: the sum of all squared deviations divided by N or N – 1.

Standard error
Standard deviation of a sampling distribution.

Standard score
Number of standard deviations a particular score is from its sample mean.

Standardisation
Setting up of measurement norms for the populations for whom a psychometric test is intended.

Standardised procedure
Tightly controlled steps taken by an experimenter with each participant and used to avoid experimenter bias or expectancy effects.

Standardised regression coefficient
Full name for beta values in multiple regression.

Stem and leaf chart
Exploratory data analysis tool showing every value in a data set but organised into class intervals to give a histogram shape.

Structure
Dimension of design which is the extent to which questions and procedure are identical for everyone.

Sum of squares
Addition of the squares of deviations around a mean.

Survey
Relatively structured questioning of large sample.

T

T
See Wilcoxon test.

t
See related and unrelated t test.

Target population
Similar to sampling frame but more theoretical. The assumed population of people from which a sample is to be drawn. Very often the aim is to be able to generalise sample results to this population.

Test norms
Test statistics for known and identifiable groups who have taken the test. These can be used to make a fair comparison for individual test takers.

Thematic analysis (TA)
General analysis of qualitative data into super-ordinate and subordinate themes which are extracted from the data. Not allied to any epistemological position.

Theoretical sampling
Use of purposive sampling (see above) to find data that might support or contradict an emergent explanatory framework.

Ties (tied ranks)
Feature of data when scores are given identical rank values.

Time-lag study
Comparative study where measures are repeated at long intervals on an equivalent sample each time (say, new sample of 5 year olds each year).

Time sampling

Time series
Line chart showing measures of a variable at progressive time intervals.

Time-series design
Design in which behaviour is recorded for a certain period before and after a treatment point in order to look for relatively sudden change.

Total variance
Variance of all scores in a set around their grand mean.

Transcription
Written recording of directly recorded speech as exactly as possible; often includes pauses, intonation, etc.

Transformation of data
Performed in order to remove skew from a data set so that it conforms to a normal distribution thus enabling the use of parametric tests.

Triangulation
Comparison of at least two views/explanations of the same thing(s) – events, behaviour, actions, etc.

Trimmed mean
The mean of a data set with its most extreme 5% of values removed.

True experiment
Experiment in which the experimenter has complete control over the manipulation of the independent variable and control of all other relevant variables, including random allocation to conditions.

Tukeya (HSD) post hoc
Post hoc test of all possible pairwise comparisons; appropriate analysis choice with a large number of means; considered conservative.

Tukeyb post hoc analysis
Less conservative post hoc test than Tukeya.

Two-tailed test
Test referring to both tails of the probability distribution under H0; must be used if alternative hypothesis is non-directional.

Type I error
Mistake made in rejecting the null hypothesis when it is true.

Type II error
Mistake made in retaining the null hypothesis when it is false.

U

U
See Mann-Whitney test.

Unbiased estimate (of SD)
Version of standard deviation or variance that is used for population estimates (uses N – 1 as denominator).

Uncorrected (SD)
Version of standard deviation or variance that is used if only wanting summary statistics for the group and not making population estimates (uses N as denominator).

Unrelated t test
Parametric difference test for unrelated data at interval level or above.

Upper hinge
On a box-plot, the third quartile.

V

Validity
The extent to which an effect demonstrated in research is genuine, not produced by spurious variables and not limited to a specific context. Extent to which a test measures the construct that it was intended to measure. Extent to which instruments measure what they were intended to measure. Also, extent to which a research effect can be trusted as real or as not ‘contaminated’ or confounded.

Concurrent
Extent to which test results conform with those on another test assumed to measure the same construct and taken at the same time.

Construct
Extent to which conceptions and operational measures of variables encompass the intended theoretical constructs. The constructs can be of persons (samples), treatments (IVs), observations (DV measures) and settings. Extent to which the existence of a construct is established through an interlinked set of diverse research findings. The theoretical establishment of a psychological construct through concerted and logically related psychological research.

Content
Extent to which test covers the whole of the relevant topic area, as assessed by experts.

Criterion
Extent to which test scores can predict phenomena such as difference between groups.

Internal
Extent to which an effect found in a study can be taken to be genuinely caused by manipulation of the independent variable.

Ecological
Widely overused term which can generally be replaced with ‘representative design’. Also used to refer to extent a research effect generalises across situations. The original meaning comes from cognitive psychology and refers to the degree to which a proximal stimulus predicts the distal stimulus for the observer. Should not be automatically applied to the laboratory/field distinction.

External
Extent to which results of research can be generalised across people, places and times.

Face
Extent to which the validity of a test is self-evident.

Known groups
Test of criterion validity involving groups between whom scores on the test should differ.

Population
Extent to which research effect can be generalised across people.

Predictive
Extent to which test scores can be used to make a specific prediction on some other variable.

Threat to
Any aspect of the design or method of a study that weakens the likelihood that a real effect has been demonstrated or that might obscure the existence of a real effect.

Variable
Quantity that can change; usually used to refer to a measure of phenomena.

Variance estimate
Estimate of variance in a variable accounted for by the correlation of another variable (or other variables) with it.

Variance ratio test
Full name for the test producing the F statistic – see above.

Variance
Measure of dispersion – square of standard deviation.

Variation ratio
Measure of dispersion – proportion of non-modal values to all values.

Verbal protocol
Recording of participant’s speech when they have been asked to talk or think aloud during a task.

Vignette
A story, scenario or other description given to all participants but with certain details altered and this difference constitutes the independent variable.

W

Wilcoxon’s T – matched pairs signed ranks
Ordinal-level significance test for differences between two related sets of data test.

Within groups ANOVA
ANOVA analysis where only related factors are involved.

Within groups sum of squares
Sum of squares of deviations of scores around their sample squares. Also: error SS.

Within groups variance
Total variance of scores around sample mean. Also: error variance.

X

Y

Z

z score/value
Alternative term for standard score.

Flashcards

Weblinks

Psychology, science and research

http://www.bps.org.uk/the-society/code-of-conduct/code-of-conduct_home.cfm
This is the British Psychological Society’s code of ethics and conduct.

http://www.bps.org.uk
British Psychological Society main site.

http://www.apa.org
American Psychological Association main site.

Experiments and experimental designs in psychology

http://kilgarriff.co.uk/bnc-readme.html
This is a link to word frequency lists.

Observational methods – watching and being with people

http://psychclassics.yorku.ca/Watson/emotion.htm
This is a link to the original paper by Watson and Rayner (1920) describing the conditioning of ‘Little Albert’. (‘Conditioned emotional reactions’, Journal of Experimental Psychology, 3(1), 1–14).

http://archive.is/5OwBm
http://www.thefullwiki.org/Chris_Costner-Sizemore
Both of these sites have useful information on Chris Costner-Sizemore (‘Little Eve’).

Psychological tests and measurement scales

As this is a chapter about psychological scales you might find it useful to look at the scales at the links below.

http://www.apa.org/science/programs/testing/find-tests.aspx
This is the American Psychological Association’s advice on finding tests. It is very comprehensive, linking to several databases such as Psyctest, but is obviously US-oriented.

http://www.psychtest.com
Here is a link to Psyctest (mentioned above). Most of the tests here need to be paid for.

http://www.yorku.ca/rokada/psyctest
This is a link to a small set of tests that you can use without worry about copyright (see the text for web page detail). The link is from York University, Ontario, Canada and was set up by Ron Okada, an Emeritus Professor. He says on the site that he will not be able to monitor the site after May 2013 so sadly you might find it becomes unavailable.

An example of the freely available tests found in Okada’s list is Rosenberg’s (1965) self-esteem scale, which is also in the public domain. Note that this is a good example of a Likert scale with reversed items to avoid response set. Also note that the Likert response format has no central ‘neutral’ or ‘undecided’ point. Here is the Rosenberg link:
http://www.wwnorton.com/college/psych/psychsci/media/rosenberg.htm

Rotter’s (1966) Locus of Control scale is easily found on the internet (it is public domain), as are several others, but make sure you get the scoring scale too otherwise the test will be useless for your own research. The scale uses a format that is not described in the Coolican text. The idea is that you are forced to choose between two extreme views on each item, one indicating internal LoC (you are in control; you make your own decision in life) and the other indicating external LoC (things happen to you; fate is in command in your life). It is similar to the method used by Hammond’s error choice technique which is described in Chapter 8. Here is the link to Rosenberg’s test:
http://www.mccc.edu/~jenningh/Courses/documents/Rotter-locusofcontrolhandout.pdf

Comparison studies – cross-sectional, longitudinal and cross-cultural studies

http://allrelated.syr.edu/
This is a link to a Syracuse University mounted exhibition, “All of us are related; each of us is unique”, which explores the concept of ‘race’ and shows that from DNA evidence we now know that all of us are descended from a small group of Africans who left their continent around 120,000 years ago or so. Since then we have diversified enormously but there is no scientific evidence whatsoever for the Victorian concept of separate ‘races’. We are all one in that sense.

http://scholarworks.gvsu.edu/orpc/
Readings in cross-cultural psychology posted by the International Association for Cross-cultural Psychology.

http://www.wwu.edu/culture/readings.htm
Online readings in psychology and culture posted by the Centre for Cross-cultural Research at Washington University.

Ethical issues in psychological research

http://www.bps.org.uk/content/psychological-debriefing
This is the British Psychological Society’s paper on debriefing.

http://www.bps.org.uk/what-we-do/ethics-standards/ethics-standards
The British Psychological Society’s Code of Conduct.

http://www.bps.org.uk/system/files/Public%20files/inf206-guidelines-for-internet-mediated-research.pdf
The British Psychological Society’s Guidelines for Ethical Practice in Psychological Research Online.

http://www.bps.org.uk/system/files/images/guideline_for_psychologists_working_with_animals_2012_rep55_2012_web.pdf
The British Psychological Society’s animal research guidelines.

http://www.apa.org/ethics/code/index.aspx#intro
The American Psychological Society’s ethics code.

Analysing qualitative data

These are links to sites that offer software for qualitative analysis:
http://www.qsrinternational.com/
NUD*IST (a ‘sexed up’ title — sadly, nothing to do with nudism!)

http://www.qsrinternational.com/products_nvivo.aspx
NVIVO 8 Close relative of NUD*IST

http://www.scolari.co.uk/related/winmax.htm?
WINMAX

http://www.qualisresearch.com/
The Ethnograph

http://micabrera.co.uk/code-a-text/default.aspx/index.htm
Code-A-text

http://www.atlasti.com/index.html
ATLAS.ti

http://www.surrey.ac.uk/sociology/research/researchcentres/caqdas/
CAQDAS: This is a Surrey University project from which information can be gained on software and training courses for Qualitative data analysis

http://onlineqda.hud.ac.uk/methodologies.php
Online QDA – Online information about many different qualitative analysis approaches.

http://academic.csuohio.edu/kneuendorf/content
This is a useful site if you are interested in content analysis (The Content Analysis Guidebook).

Here is a link to the Higher Education Academy, the lower one to the psychology section itself:
http://www.heacademy.ac.uk/
http://www.heacademy.ac.uk/disciplines/psychology

Correlation and regression

http://www.shodor.org/interactivate/activities/Regression/
This is the link to the correlation application that lets you plot points and watch how these affect the scatter plot and regression line.

http://academic.udayton.edu/gregelvers/psy216/tables/area.htm
A specialist site just giving normal distribution and z tables with the value of the y ordinate. Only needed for the biserial correlation coefficient calculation.

Choosing a significance test for your data (and Internet resources)

These are the links to useful sites listed at the end of the chapter in the book, mostly statistical or methods oriented but as described below:

http://www.bps.org.uk/
The website of the British Psychological Society.

http://www.bps.org.uk/publications/policy-and-guidelines/research-guidelines-policy-documents/research-guidelines-poli
BPS code of conduct, ethical principles for human research, etc.

http://www.heacademy.ac.uk/disciplines/psychology
The website for the Psychology Discipline pages at the Higher Education Academy – see text above.

http://onlinestatbook.com/rvls.html
Rice Virtual Lab in Statistics – an interactive statistics site.

http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html
David Howell’s home page (US statistics-for-psychology author).

http://www4.uwsp.edu/psych/mp/APA/apa4b.htm
APA Writing guide by M. Plonsky at the University of Wisconsin. Thanks!

http://www.intute.ac.uk/cgi-bin/browse.pl?id=121139
INTUTE social science – very powerful internet link database taking you to hundreds of related sites on psychology in general, but this page gives methods resources in particular.

http://www.psychology.org/links/Resources/Statistics/
Part of the Encyclopaedia of Psychology; contains links to a couple of dozen helpful statistics sites.

http://www.onlinepsychresearch.co.uk/
A guide to resources for conducting research online and for creating websites, etc. Importantly, you can participate in many psychology experiments online and have the experience of being in an experiment run by a working psychological researcher.

http://psych.hanover.edu/Research/exponnet.html
Continuing list of research on the net in which you can participate.

http://www.quantpsy.org/chisq/chisq
A neat chi-square calculator, plus tutorial, from Kristopher Preacher at Vanderbilt University in Tennessee. Thanks!

http://bcs.whfreeman.com/ips4e/cat_010/applets/CorrelationRegression.html
You can click on the scattergram here to make the regression line change position. It teaches you how to create negative and positive correlations and so on.

http://vassarstats.net/index.html
A brilliant site where you can get almost any kind of statistical work done including ANOVA and even logistic regression. You can transform data, conduct all simple tests and get information on what each procedure is all about.

http://www.gpower.hhu.de/
The site where you can register for and download G*Power. Thanks to E. Erdfelder, F. Faul and A. Buchner and Heinrich Heine Universität Düsseldorf.

Online statistical textbooks

http://www.statsoft.com/Textbook
Statsoft.

http://www.animatedsoftware.com/statglos/statglos.htm
Internet glossary of statistical terms.

http://bmj.com/collections/statsbk/index.shtml
Statistics at Square One. Part of the British Medical Journal pages but open to all at present.

http://www.jerrydallal.com/LHSP/LHSP.htm
Little Handbook of Statistical Practice. The pages are frozen because there is now an e-book to be bought. However they work!

http://davidmlane.com/hyperstat/index.html
Hyperstat – a very comprehensive statistical teaching resource.

http://www.socialresearchmethods.net/kb/index.htm
Research methods knowledge base – used to be free but sadly you now pay for online access. Good though.

http://library.thinkquest.org/20991/alg/word.html
Maths for Morons site – friendly maths.

Planning your practical and writing up your report

http://www.bps.org.uk/publications/journals
Where you can view the abstracts of all articles in British Psychological Society journals.

http://www.apa.org/pubs/journals/
Where you can do the same for APA journals.

http://www.psypress.com/journals/
Where you can view the abstracts of all Psychology Press journal articles.

http://www.apa.org/science/programs/testing/find-tests.aspx
APA’s site for help in finding appropriate psychological tests.

Data Sets

Download Data Sets (ZIP 88KB)

Exercises

Chapter 1 Psychology, science and research

Exercise 1.1

Reproduced from www.easyweb.easynet.co.uk/~philipdnoble/snow, courtesy of Philip Noble.

Some people say instantly ‘Oh it’s a picture of a man – so what?’. Many others (including me when I first encountered it) take a very long time to see a specific ‘thing’ in it. If you concentrate on the centre of the picture you should eventually see the top half of a man. If you imagine a beret right on top of the picture in the centre this would be correctly positioned on the man’s forehead and he would look a lot like Che Guevara. Many people have seen the picture as one of Christ with a long flowing beard. It could also be a cavalier. His face is lit as if from the right hand side and so there is a lot of shadow. If you have problems with it try looking at it with friends. Someone will spot it and help you to see the whole figure.

I don’t have precise detail on where it originated. It was published in a UK newspaper as a ‘sighting of Christ’ and was reported to be snow on a mountainside. However a student I was teaching once told me it was taken by her grandfather in Japan and was snow on a hedge. I have no independent evidence to support this. The most certain thing is that it is indeed snow.

The main point of the demonstration though is this. When the man finally pops out at you, you will never again be able to see the picture as just a load of black and white blobs. You will have constructed and maintained a ‘template’ – a best bet as to what the picture is of – and this will remain as an automatic reaction in your perceptual system. Most of the time, in science and in everyday life, when we approach visual (and other sensory) material, we have a ‘best bet’ all ready and we are not aware of the perceptual system’s operation of ‘calculating’ what sensory data represent in the world.

Exercise 1.2

Give a meaning for, and an example of, the following words. Click 'Show Answer’ to see some model answers. Hopefully your answers will be similar in meaning to these.

To come to a conclusion through logical reasoning from premises. For example, if all dogs have worms and my pet is a dog then it has worms; if the number 42 bus goes either from George Circus or from Green Square but the bus stop is not at Green Square then it must go from George Circus.

A research programme that seeks to provide support for a hypothesis using observable data gathered fairly with a replicable and publicly described procedure. The study by Gabrenya, Wang and Latané (1985), described in Chapter 1, tested and supported the hypothesis that children in collectivist societies tend to work harder when in a group than when alone, contrary to the US finding that people tend to 'loaf' in groups compared with when alone.

The principle of falsifiability is that any proposed theory must be set in terms that render it possible to disconfirm it (or crudely ‘disprove’ it). This doesn’t mean the theory must or will be proved wrong. After all, it might be true. The proposer of the theory just has to give others the means to show it to be false – in case it is false. For example, I might claim that I am holding an invisible cat. When you ask to stroke it I say it is also unfeelable (and of course, unsmellable, unhearable, etc.). This is a pretty useless and uninteresting theory!

To reason from particular instance to a general conclusion. For example, all sheep I have so far seen have four legs, therefore I’m assuming that all sheep have four legs (but I could be wrong); Most people with Asperger Syndrome (AS) I have so far come across have trouble holding eye contact. Hence I’m assuming this is a central feature of AS.

A group of people selected as representative of a larger population. If something works on them we assume that it will also work on a larger range of people. This is called generalising our results – in the same way as a medical trial seeks to establish that drug A is effective in reducing the symptoms of illness B and then is used to support the administration of drug A to the wider patient population. We assume that if caffeine increases memory ability in our sample then it will do so for people in general.

Exercise 1.3

Disconfirming theories – a ‘lateral thinking’ problem

Pages 17–19 of the book discuss the attempt to disconfirm theories as a powerful aspect of scientific reasoning. One of the best ‘awkward’ problems I have come across is shown below. Read the problem and have a think before revealing the answer below.

Three philosophy professors (A, B and C) are applying for a prestigious chair of philosophy post. There is little to choose between them so the interview panel sets a logical reasoning task. The questioner gives the following instruction: ‘I am going to draw either a blue or a white spot on each of your foreheads. I will then reveal the spots to you all simultaneously. If you see a blue spot on another person’s head put your hand up. As soon as you think you can say what colour spot you have on your own forehead please speak up with your answer’. He proceeds to draw a blue spot on each forehead. When the spots are all revealed to the candidates each one, of course, puts up a hand. After a brief moment’s hesitation professor A lowers her arm and says ‘I must have a blue spot’. How did she work this out?

Problems like this one are sometimes included in the general group of ‘lateral thinking’ problems. However, you do not have to think ‘laterally’ or particularly creatively to get the answer. You do, however, have to kind of think upside down. Before rushing on to get the answer do try to think about how the professor knew what wasn’t true rather than how she knew what was true.

The answer is that she conducted a theory disconfirmation task. She thought ‘What if I had a white spot? If I did then B would quickly see that C could only have their arm up because B must have a blue spot, since my own spot, which each of them can see, would be white. But neither of them did respond quickly (remember all three are excellent at logical thinking) therefore I must have a blue spot.’ Professor A got the job!

Exercise 1.4a and 1.4b

Trusting intuition (the rationale for these exercises appears at the end)

1.4a

Imagine I have a piece of ordinary paper and that I fold it once, then again and then once more. You’ll agree I hope that the paper is now a little bit thicker. After this it will become difficult to fold but just imagine, if it were physically possible, that I folded it 50 times more. About how thick do you think it would become? Would it be as thick as a shoe-box? Would the paper reach as high as the wall you’re in now? Would it reach to the top of a house? Have a very rough guess as to how far upwards the paper would reach.

The folded paper would reach 150 million miles which is to the Sun and half way back again! Blue Peter (on BBC children’s TV) once partly demonstrated this by simply adding double the number of sheets each time to a pile of A4 paper, first one, then two, then four, then eight and so on.

1.4b

Imagine that the surface of the Earth is perfectly smooth (no mountains or valleys etc.) and that I have put a rope around it at the equator. Now imagine I want to raise the rope so that it is just 1 metre above the surface all round. About how much more rope do you think I’d need?

I would need just 6.3 extra metres. For those not put off by maths here is the proof: The circumference of any globe’s equator is the same as that of a circle which is PxD metres (pi times the diameter and pi = 3.14...). The new diameter when the rope is raised 1 metre off the surface D + 2 metres. The new circumference will be: P x (D+2) = PD + 2P New circumference – old circumference = (PD + 2P) - PD = 2P

So for any globe at all (golf ball, football, planet) you would only ever need just 6.3 extra metres of rope (string perhaps for gold balls at least!) to raise it one metre off the surface.

Rationale for exercise 1.4

I’m hoping that for 1.4a you very much underestimated the height of the paper and for 1.4b you grossly overestimated how much rope would be needed. If you didn’t then good on you! The point of the exercise is to emphasise that fact that we can never rely solely on ‘intuition’. Often when people say they got an answer ‘through intuition’ what they actually mean is that they got it without any conscious deliberation. Nevertheless usually they got their answer through the usual logic but the process was so quick and immediate that they weren’t aware of any significant mental processing. If they mean that the answer just came to them through no process at all then they were just guessing. When we just guess we are influenced by many factors and certainly cannot claim to have ‘the truth’ form nowhere. If this were possible than scientists, mathematicians and engineers could just pack up, go home and leave intuitionists to solve all problems. The world just isn’t like that. The point of the exercise was to make you wary of intuition and to recognise that such unfamiliar problems need always to be approached using well worn first principles, not mystical guesswork.

Chapter 2 Measuring people – variables, samples and the qualitative critique

Exercise 2.1

Creating variables to measure psychological constructs

In this exercise try to give at least one operationally defined measure to assess the psychological construct in the list below. Examples are provided if you click ‘Show Answer’ but these are not the ‘correct’ answers, just some possibilities to demonstrate strict measurement.

Total score on an anxiety scale which includes such items as: ‘I often lie awake thinking about tomorrow’s issues.’ The response scale might be ‘Strongly agree, Agree, Disagree, Strongly disagree’.
Person’s self-rating on a scale of 1 to 10 of their current level of anxiety (e.g., as they approach or think about a feared object).

Difference between number of beans participant estimates are in a jar and the number they were told was agreed by a previous group. (The lower the difference the more they ‘conform’.)

Participant completes story which requires assertiveness from main character to bring about a successful conclusion. Endings are coded according to scheme on which raters are intensively trained.
Number of people going back to cashier in a store after they have been deliberately short changed.

Number of single days taken off sick in one year.
Total score on ‘hassles’ scale.
Increase in errors made as task demands are increased.

Difference in number of points scored on self-assessment ‘as I am’ and ‘how I would like to be’.

Exercise 2.2

Identifying sample types

Match the appropriate term with the sampling method described.

Chapter 3 Experiments and experimental designs in psychology

Exercise 3.1

The nature of experiments

This is a True/False quiz to test your knowledge of the advantages of the experiment as a research method.

Exercise 3.2

Identifying experimental designs

In this short quiz you will need to read each research description and identify the specific experimental design.

Chapter 4 Validity in psychological research

Exercise 4.1

Tabatha and her validity threats

In this chapter of the book there is a description of a rather naff research project carried out by Tabatha. Here it is again. As you read this passage try to identify, and even name if possible, every threat to validity that she has either introduced or failed to control in her design. A list is provided in the answers below.

Tabatha feels she can train people to draw better. To do this, she asks student friends to be participants in her study, which involves training one group and having the other as a control. She tells friends that the training will take quite some time so those who are rather busy are placed in the control group and need only turn up for the test sessions. Both groups of participants are tested for artistic ability at the beginning and end of the training period, and improvement is measured as the difference between these two scores. The test is to copy a drawing of Mickey Mouse. A slight problem occurs in that Tabatha lost the original pre-test cartoon, but she was fairly confident that her post-test one was much the same. She also found the training was too much for her to conduct on her own so she had to get an artist acquaintance to help, after giving him a rough idea of how her training method worked.

Those in the trained group have had ten sessions of one hour and, at the end of this period, Tabatha feels she has got on very well with her own group, even though rather a lot have dropped out because of the time needed. One of the control group participants even remarks on how matey they all seem to be and that some members of the control group had noted that the training group seemed to have a good time in the bar each week after the sessions. Some of her trainees sign up for a class in drawing because they want to do well in the final test. Quite a few others are on an HND Health Studies course and started a module on creative art during the training, which they thought was quite fortunate.

The final difference between groups was quite small but the trained group did better. Tabatha loathes statistics so she decides to present the raw data just as they were recorded. She hasn’t yet reached the recommended reading on significance tests in her RUC self-study pack.

Answers: Possible threats to validity in the study:

Name of threat	Issue in text
Non-equivalent groups	Busy students go into the control group.
Non-equivalent measures	Different Mickey Mouse pre- and post-test; a form of construct validity threat.
Non-equivalent procedures	Training method not clearly and operationally defined for her artist acquaintance.
Mortality	More participants dropped out of the training group than from the control group.
Rivalry	Control group participants note – some trainee group participants go for extra training in order to do well.
History effect	Some participants in the training group receive creative art training on their new HND module.
Statistical conclusion validity	Not a misapplication of statistical analysis but no analysis at all!

Exercise 4.2

Spotting the confounding variables

A confounding variable is one that varies with the independent (or assumed causal) variable and is partly responsible for changes in the dependent variable, thus camouflaging the real effect. Try to spot the possible confounding variables in the following research designs. That is, look for a factor that might well have been responsible for the difference or correlation found, other than the one that the researchers assume is responsible. If possible, think of an alteration to the design that might eliminate the confounding factor. Possible factors will be revealed under each example.

A. Participants are given either a set of 20 eight-word sentences or a set of 20 sixteen-word sentences. They are asked to paraphrase each sentence. At the end of this task they are unexpectedly asked to recall key words that appeared in the sentences. The sixteen-word sentence group performed significantly worse. It is assumed that the greater processing capacity used in paraphrasing sixteen words left less capacity to store individual words.

Could be the extra time taken by the second task caused greater fatigue or confusion.

B. Male and female dreams were recorded for a week and then analysed by the researcher who was testing the hypothesis that male dream content is more aggressive than female dream content.

The researcher knew the expected result, hence researcher expectancy is a possible cause of difference. Solution is to introduce a single blind.

C. People who were fearful of motorway driving were given several sessions of anxiety reduction therapy involving simulated motorway driving. Compared with control participants who received no therapy, the therapy participants were significantly less fearful of motorway driving after a three-month period.

There was no placebo group. It could be that the therapy participants improved only because they were receiving attention. Need an ‘attention placebo’ group.

D. After a two-year period depressed adolescents were found to be more obese than non-depressed adolescents and it was assumed that depression was the major cause of the obesity increase.

Depression will probably correlate with lowered physical activity and this factor may be responsible. Needs depressed adolescents to be compared with similarly inactive non-depressed adolescents.

E. People regularly logging onto Chat ’n Share, an internet site permitting the sharing of personal information with others on a protected, one-to-one basis, were found to be more lonely after one year’s use than non-users. It was assumed that using the site was a cause of loneliness.

Those using the site had less time to spend interacting with other people off-line; need to compare with people spending equal time on other online activities.

F. Participants are asked to sort cards into piles under two conditions. First they sort cards with attractive people on them, then they sort ordinary playing cards. The first task takes much longer. The researchers argue that the pictures of people formed an inevitable distraction, which delayed decision time.

Order effect! The researcher has not counter-balanced conditions. The participants may simply have learned to perform the task faster in the second condition through practice on the first.

G. It is found that young people who are under the age limit for the violent electronic games they have been allowed to play are more aggressive than children who have only played games intended for their age group. It is assumed that the violent game playing is a factor in their increased aggression.

This is only a correlation and there may be a third causal variable that is linked to both variables. Perhaps the socio-economic areas in which children are permitted to play under age are also those areas where aggression is more likely to be a positive social norm.

Chapter 5 Quasi-experiments and non-experiments

Exercise 5.1

Some outlines of research studies are given below and your task is to decide which one of the following research designs each study used (some of which are taken from previous exercises). In the absence of specific information assume studies are conducted in a laboratory.

Research design	Full description
Lab experiment (true)	True experiment conducted in a laboratory.
Lab quasi	Quasi experiment conducted in a laboratory.
Lab non-experiment	Non-experiment conducted in a laboratory.
Field experiment (true)	Field experiment (true).
Field quasi	Field quasi experiment.
Field non-experiment	Field research study, which is not an experiment.

1. A researcher’s confederate sang identical songs on two separate days, one day dressed scruffily and the other day smartly dressed. Passers-by were asked to rate the busker’s performance on the two separate days.

Field quasi

2. Participants were allocated at random to one of two conditions of an experiment. In one condition participants were asked to learn a list of 20 words with accompanying pictures. In the other condition, the participants were asked to learn the words without the pictures.

Lab experiment (true)

3. Children in a nursery were randomly allocated either to a condition where they were shown a film in which several adults behaved quite aggressively or to a condition in which they were shown a nature film. Both groups were then observed for aggressive behaviour.

Field experiment (true)

4. Male and female dreams were recorded at home by participants for a week and then analysed by a researcher who was testing the hypothesis that male dream content is more aggressive than female dream content.

Field non-experiment

5. People attending a health clinic and who were fearful of motorway driving were given several sessions of anxiety reduction therapy involving simulated motorway driving. A control group was formed by people on a waiting list who had only recently applied for therapeutic help with the same problem. The therapy participants were significantly less fearful of motorway driving after a three-month period than the control group.

Field quasi

6. People who had recently experienced a post-traumatic stress disorder were asked by a psychological researcher to undergo a battery of psycho-motor test trials. Compared with non-stressed participants the performed significantly worse.

Lab non-experiment

7. Psychology students were invited to volunteer for a research study. Because the researcher did not want participants from one condition to discuss the procedure with participants in the other, he asked students from one course to detect stimuli under stressful condition and student from the other course to do the same task under non-stressful conditions.

Lab quasi

Chapter 6 Observational methods – watching and being with people

Exercise 6.1

Defining some key terms used in the chapter

Can you give a meaning for the following terms? Click to see a model answer. Hopefully your answers will be similar in meaning to these.

Data that exist as records and which can be used to test hypotheses about human behaviour, e.g., crime or traffic accident data.

Study of one individual, group or organisation in depth.

Categorising behaviour according to pre-arranged criteria, often to make quantitative analysis possible.

Data gathering by recording experiences or observations on a regular (often daily) basis.

Level of agreement between two or more trained observers of the same events.

Observation carried out on behaviour as it occurs naturally in the person’s or animal’s own environment.

Observing as a participant in the observed group.

Tendency for people to behave differently because they know they are being observed.

Observation that is organised, where behaviour is strictly coded and where extraneous variables are controlled.

Chapter 7 Interview methods – asking people direct questions

Exercise 7.1

Preparing an interview schedule

Prepare a set of questions for an interview investigating the issue of assisted suicide. Imagine that this is for a piece of qualitative research where you wish to explore the concept fully in terms of people’s attitudes to the issue. You particularly want to know how people rationalise their positions. Make sure that your questions cover a wide area of possible perspectives – look at the issue from different people’s points of view. After you have prepared your interview schedule as fully as you think you can, have a look at the points below (click the ‘reveal hints’ buttons) and see if you have covered all these areas and perhaps produce some that I didn’t think of.

How many of your questions will, produce short answers (e.g., ‘do you believe in assisted suicide?’ or ‘would you ever assist someone to commit suicide?’ – these are closed questions and may only produce single answers of ‘yes’ or ‘no’).

Have you used prompts and probes to facilitate elaboration of shorter answers? (e.g., ‘If no, could you tell me why?’)

Have you investigated:

The conditions under which your interviewee would agree to assist or agrees with assisted suicide – e.g., how many months to live, how much certainty of pain, etc., for degenerative diseases or just when life is unbearable?
Their reasons for wanting or not wanting to assist.
Their reasons for accepting or rejecting the idea.
The perspective of the potential suicide, the assister, the immediate family of the dying person?
The effect that publicised suicides might have on other people.
Whether the interviewee would follow the law of the land (and therefore think it permissible to assist suicide in countries where this is legal).
To what extent is the assister legally culpable?
The feelings of people assisting a loved one to suicide.
The implications of the Hippocratic oath and doctors’ commitment to sustaining life.

include

The position of carers and whether sufficient support is available for them.
The issue of whether we should wait for technological advances, e.g., in the area of pain relief.
How much medical opinion should be sought – at least two doctors (e.g.)?
The issue of maintaining dignity.
Whether suicide is ‘cowardly’, ‘selfish’ or an act of ‘bravery’, ‘courageous’.
The sanitisation of death – so it is kept away from us most of the time.
Whether removing life sustaining support is assisting suicide or ‘murder’.
The issue of control – who should decide to maintain life if a person wants to end theirs.
Religious reasons or arguments for views.
Have you teased out arguments for and/or against rather than just isolated views?

Exercise 7.2

Defining some key terms used in the chapter

Can you give a meaning for the following terms? Click to see a model answer. Hopefully your answers will be similar in meaning to these.

Questions to which the appropriate answer is one of a finite set, e.g., ‘do you believe in ghosts?’ (yes/no) or ‘what is your lucky number?’

Group selected for discussion of an issue because they share a common interest.

Questions allowing the respondent to answer at length, e.g., ‘tell me what that was like’.

Group of people selected to represent a range of views and who are often consulted on a regular basis.

Interview type in which the researcher has prepared a schedule of questions to ask but the order in which these will be presented is not fixed. The interviewer tries to keep the session as close to normal conversation as possible. If a respondent naturally produces a full enough answer, questioning on that item will end, otherwise prompts and probes may be employed.

A large-scale data gathering exercise where accurate sampling is very important since results are usually taken as indicative of the general population’s attitudes, behaviour, etc.

Chapter 8 Psychological tests and measurement scales

Exercise 8.1

Problematic items in psychological scales

Some proposed items for different kinds of psychological scales are listed below. In each case select the kind of error (from the list in the box below) that is being made with the item (see p. 207 of the book for explanations).

Leading question

Ambiguous

Technical terms

(too) Complex

(too) Emotive

(too) Personal

Double-barrelled

Double negative

Inappropriate scale

1. Violent video games can have a negative effect on children’s socio-psychological development.

Technical terms.
Explanation: Would respondents understand socio-psychological?

2. Boxers earn a lot of money (in an attitude to boxing scale).

Ambiguous
Explanation: That boxers earn a lot of money is a fact but it gives no indication of a person’s views on boxing. Both sides will agree so the item has no discriminatory power.

3. Boxing is barbaric and should be banned.

Double barrelled
Explanation: May agree it is barbaric but argue against a ban.

4. Hunters should not terrify poor defenceless little animals.

Emotive
Explanation: Could be an item in some scales but is a bit OTT on emotional tugging here.

5. I thought the advertisement was:

Good
Average
Poor
Very Poor

Inappropriate scale
Explanation: There are two negative ‘poor’ choices but only one positive choice. Also, what is meant by ‘average’?

6. Have you ever suffered from a mental disorder?

Too personal
Explanation: Should not need to ask this and may not get an honest reply. Might be relevant in specialised research but would probably not be approached so bluntly.

7. People have a natural tendency to learn though encountering problems, seeking information and testing hypotheses about the world and therefore education should be about providing resources for discovery rather than about top-down delivery and testing of knowledge.

Too complex
Explanation: Though the statement makes sense it may well need reading a few times and may tax some respondents with its vocabulary and length.

8. Don’t you think the government will miss its child poverty targets?

Leading question
Explanation: Probably wouldn’t be as blatantly leading as this but even ‘Do you think…’ invites agreement.

9. There are no grounds upon which a child should not be given a right to education.

Double negative
Explanation: Should be understood by most but double negatives do make respondents have to think twice.

Exercise 8.2

Defining some key terms used in the chapter

Can you give a meaning for the following terms? Click to see a model answer. Hopefully your answers will be similar in meaning to these.

The extent to which a psychological measure produces the same results when administered to the same people on different occasions.

Process by which clusters of correlations are identified among many measured variables such that they can be taken as statistical evidence for the existence of psychological constructs.

The internal consistency of a test, assessed by measuring whether people tend to score in the same direction, and to the same strength, as they did on all other items.

A measuring ‘instrument’ seen as a scientifically devised measure of a human characteristic or aspect of behaviour.

Tendency for people to agree with items that are positively worded.

Process of ‘fitting’ a psychological scale to a normal distribution and establishing statistical norms for specific groups of people.

Extent to which a psychological scale measures the construct that it was intended to measure.

Chapter 9 Comparison studies – cross sectional, longitudinal and cross-cultural studies

Exercise 9.1

Defining some key terms used in the chapter

Can you give a meaning for the following terms? Click to see a model answer.

A cross-sectional study comparing a group of 15 year olds with a group of 9 year olds may be invalid and confounded because one group might have had a significantly different experience – for instance, the younger group may have experienced a significantly changed national curriculum that has introduced special emphasis on mathematics and language skills.

A problem in longitudinal studies where one group, studied longitudinally in terms of social or cognitive development, may have experienced quite a different social environment from a previously studied group with which this group is being compared.

Study in which a ‘snapshot’ is taken of different groups of people at the same time. For instance, very common would be a study of reading abilities in 7, 9 and 11 year olds at the same time (e.g., in February 2010). This could be extended so that the same ages are again studied two or three years later – a design known as a time-lag study.

A study that attempts to compare effects in one culture with those in another, either in order to extend our knowledge of a psychological construct (has it the same strength and direction in culture B as it does in culture A), or to use the second culture as a level of an independent variable in order to test a hypothesis. For instance, culture B might use a language that has number terms only up to four or five and we can see whether language is important for memorising numbers of objects.

Correlation of a variable at time 12 with another variable at time 2 or vice versa. For instance, researchers might compare measured levels of parental reading at time 1 with their child’s verbal abilities at time 2 and vice versa, along with the time 1 and time 2 correlations, in order to obtain a clearer picture of the causal effect of parental reading (i.e., do children do better if parents read, or do parents read if children are more verbal to start with?).

The belief that it is almost impossible to transfer psychological constructs and measure from one culture to another. Cultures can only be properly or sensibly understood by outsiders (i.e., researchers) through long-term study immersed in that culture (i.e., by living with them for several months if not years).

A study that follows the same group of people through a long period and periodically re-assesses psychological characteristics or skills. The aim is to observe changes in development along the way.

A kind of longitudinal design where the same group is measured after a certain interval and perhaps over more intervals.

A study of a specified group of people and repeated at long intervals, say every three years. E.g., a group of nine year olds might be assessed for attitudes to authority in 2010 and then another group of nine year olds will be similarly assessed in 2013 and comparisons made.

Chapter 10 Qualitative approaches in psychology

Exercise 10.1

Matching approaches to principles

Chapter 10 introduces several well-defined approaches to the collection and analysis of qualitative data that have developed over the last few decades. Below are brief descriptions of the principles of each approach. For each one, try to select the appropriate approach from the list below.

Grounded theory

Interpretive phenomenological analysis

Discourse analysis

Thematic analysis

Ethnography

Action research

Narrative analysis

1. An approach that encourages the development of theory through the data emerging from the analysis of qualitative data patterns and are not imposed on the data before they are gathered. Data are analysed until saturated.

Grounded theory

2. This approach holds that what people say is not a source of evidence for what they have in their heads or minds. It analyses speech as people’s ways of constructing their perceptions and memories of the world as they see it. Speech is used to construct one’s ‘stake’.

Discourse analysis

3. In this approach an organisation or a culture is studied intensively from within.

Ethnography

4. An approach that analyses text for themes and which can be theory driven (theory emerges from the data) or top-down (testing hypotheses or seeking to confirm previous findings and theories). Highly versatile and not allied to any specific philosophical position.

Thematic analysis

5. This approach sees the role of psychological research as one of intervention to produce change for human benefit. An important aspect of the approach is the emphasis by the researcher on a collaborative project.

Action research

6. In this approach researchers try to access the perceptions and thoughts of the researched persons and to reflect and understand these in a way that is as close as possible to the way that the persons themselves interpret the world. The data analysis is usually a search for themes among interview data.

Interpretive phenomenological analysis

7. This approach studies the ways in which people construct memories of their lives through stories. A central principle is that people generally establish their identity through the method of construction through re-telling even if this is to themselves.

Narrative analysis

Exercise 10.2

Defining some key terms used in the chapter

Can you give a meaning for the following terms? Click to see a model answer.

Theory that ‘facts’ in the world are social constructions, created through human interaction, e.g., reconstruction of memory.

Research where a group collaboratively researches its own customs, history, social norms, etc.

A movement within psychology that emphasised women’s perspectives and the general absence of women from the mainstream psychological picture of human behaviour, except as contrasts; in particular feminist research drew on methods opposed to ‘masculinist’ quantification and emphasis on scientific instruments, emphasising qualitative approaches and questioning in human interaction.

The currently accepted model within any science, one that is likely to come under threat eventually from a new emergent paradigm as did Newtonian physics from Einstein’s models. ‘New Paradigm’ research claimed that ‘hard’, scientific psychology with quantitative methods was in need of radical change towards a more human and holistically oriented approach.

Belief that the world consists of facts that can be unambiguously discovered or demonstrated through empirical research. Only one ‘real world’ exists.

Recognition that the researcher’s own position, and the research design or constructed research question, can influence the construction of knowledge claimed in a research project.

A version of purposive sampling. People are selected for the research project on the basis of the research question and/or the ongoing data analysis and its implications in a qualitative project.

Chapter 11 Ethical issues in psychological research

Exercise 11.1

Ethical issues in research designs

What are the main ethical issues involved in the following possible research designs? Please have a good think before you reveal the answers.

1. A researcher arranges for shoppers to be given either too much or too little change when making a purchase in a department store. A record is taken of how many return to the cash desk and those that do are asked to complete a short questionnaire and are then debriefed as to the purpose of the study.

The ‘participants’ in this study were not able to give their informed consent before participating. They have been mildly delayed by having to return to the cash desk but are also under undue pressure to then complete the questionnaire.

2. Participants are given a general intelligence test and are then given false feedback about their performance. They are told either that they did very well and significantly above average or that they did rather poorly and significantly below average.

There is a possible issue of some psychological harm in that some participants are told they have produced poor intelligence scores. OK – the effect is short-lived, but psychologists have to consider whether the knowledge gained from the experiment will be worth the perhaps mild distress caused to the participants but also the effect this has on the credibility and trustworthiness of psychologists in general, in the public view.

3. In-depth semi-structured interviews are conducted with seven middle managers in an organisation where there has been some discord between middle and senior management. The researcher has been contracted to highlight possible causes of resentment and reasons for frustration that have been expressed quite widely. The researcher published a full report including demographics of the participants three of whom are women, one of whom one is Asian.

There is a problem here with anonymity. It will be easy for the senior managers to identify the sole Asian woman. In cases like these full information has to be compromised in order to preserve privacy and to protect individuals whose lives could be seriously affected by disclosure.

4. Participants volunteer for an experiment and are first shown slides that have a theme of sweets, nuts or beans. They are then asked to put their hands into bags which contain either jelly beans, peanuts or kidney beans. The researcher is interested in whether the slides influence the participant’s identification of the items in the bag.

The description does not make clear whether the participants were asked before participating whether they might suffer from any allergies. Most importantly there is a risk of an anaphylactic reaction from the peanuts. This then is in contravention of the principle of not putting participants at any physical risk.

5. A researcher conducting an experiment is quite attracted to one of the participants. At the end of the session, when the experiment is over, he asks her for a date.

The researcher is in a position of special power over the participant and should not exploit this by mixing professional activity with personal life. The two might of course meet up somehow outside the professional context but making this approach in the context of the experiment puts the psychologist at risk of contravening professional ethics.

Chapter 12 Analysing qualitative data

Exercise 12.1

Defining some key terms used in the chapter

Can you give a meaning for the following terms? Click to see a model answer.

In Grounded Theory the emergent theory is modified by the addition of new cases and the movement is from particulars to a general overview.

The method used in a specific qualitative project, usually described under this heading in the method section of the report.

The type of category that is to be used in a content analysis approach to qualitative data analysis. It might be decided to work at the level of individual words, or with certain phrases or, higher still, with certain sets of meanings (e.g., each time the prime minister refers to greed among highly paid executives, no matter what terms are used in the description).

Coding of qualitative data often into units so that quantitative analysis can be performed.

View of individual as unique with unique un-measurable characteristics.

Quantitative approach that looks to measure human characteristics but still claims individual can be unique in having unrepeated combination of characteristics at different levels.

Working with qualitative data so that theory emerges from instance within it. This approach is contrasted with the hypothetico-deductive method of looking for pre-reasoned patterns in the data.

Sharing an early analysis and interpretation of data with the people who were participants in the study so that they can verify, disagree with or shed new light on the assumptions and conclusions that are being made.

Checking one’s analysis and conclusions from qualitative data with another perspective – e.g., a different researcher’s analysis or using member checking.

Chapter 13 Statistics – organising the data

The early part of Chapter 13 introduces levels of measurement and first talks of categorical and measured variables. We then look at the traditional division of scales into four types, nominal, ordinal, interval and ratio. In fact, when attempting the analysis of data and trying to decide which statistical treatment is appropriate you will never need to decide whether data are at a ratio level and you will rarely come across ordinal level data. There are two major decisions most of the time: first, whether your variable for analysis is categorical or measured; and second, if measured, whether it can safely be treated as interval level data or whether you should employ tests that are appropriate for ordinal level data. We deal with these distinctions here but they will be put into practice when deciding whether data are suitable for parametric testing as described in Chapter 19.

Exercise 13.1

Identifying categorical and measured variables

Decide in each case below the type of variable for which data have been recorded: Categorical or Measured. Questions 1, 5, 9, 10 and 11 have further explanations that can be found by clicking on the reveal button.

1. Numbers of people who are extroverted or introverted

Categorical
Further explanation:: The numbers of people counted are on an interval scale but for each participant we have only a category; never confuse the frequencies with the measurement method used for each person/case.

2. Scores on an extroversion scale

Measured

3. Number of words recalled from a learned 20 item list

Measured

4. Whether people stopped at a red traffic light or not

Categorical

5. Grams of caffeine administered to participants

Measured
Further explanation:: Grams are units on a clearly measured scale. However, we could be conducting an experiment where we give 0 grams 50 grams or 200 grams to participants, in which case we would be using three categories; it is always possible to use an interval scale but create categories like these. If, for instance, we recorded in each case only whether participants solved a problem or not we would have a 3 x 2 cross tabs table for a 2 analysis – see Chapter 18.

6. Number of errors made in completing a maze

Measured

7. Whether people were recorded as employed, self-employed, unemployed or retired

Categorical

8. Number of cigarettes smoked per day

Measured

9. Whether people smoked none, 1–15, 15–30, or more than 30 cigarettes per day

Categorical

10. Number of aggressive responses recorded by an observer of one child

Measured
Further explanation:: Here again a measured variable has been reduced to a set of categories.

11. Whether a child was recorded as strong aggressive or moderate aggressive or non-aggressive

Categorical
Further explanation:: The same thing could have happened here too.

Exercise 13.2

Finding descriptive statistics

For those able to use IBM SPSS or any other spreadsheet software that will find descriptive statistics the file psychology test scores.sav (SPSS) or psychology test scores.xls (Excel) contains data on 132 cases that you can work with. For those working by hand this is rather a lot of data so I have provided a smaller data set below for you.

Download Exercise 13.2 Data Sets (ZIP 2.1KB)

Working on SPSS or equivalent with the psychology test score data find the:

37.02

38
Note 1 has been added to the SPSS answer for the range (37) for the reasons given on p. 352 of the book. We assume the range runs from the lower end of the lower interval to the upper end of the upper interval.

3.5
Note Find the semi-interquartile range in SPSS by selecting Analyze/Descriptives/Frequencies and selecting the statistics box to select quartiles. The output will call these ‘percentiles’ but the 25th and 75th will be provided so you can take the difference between these two and halve it.

6.83

Click on each item to see the correct answer.

For those working by hand here is a simpler data set:

Try to calculate the same statistics (and read the notes above about calculations):

23.65

22
Note 1 has been added to the SPSS answer for the range (37) for the reasons given on p. 352 of the book. We assume the range runs from the lower end of the lower interval to the upper end of the upper interval.

27
Note 1 has been added to the SPSS answer for the range (37) for the reasons given on p. 352 of the book. We assume the range runs from the lower end of the lower interval to the upper end of the upper interval.

3.5
Note Excel gives slightly different answers for quartiles and percentiles and hence the semi-interquartile range value will be different – 2.75

6.58

Chapter 14 Graphical representation of data

Here is the datafile to be used for a box-plot as referred to in the exercises for this chapter:

Box Plot Data.sav

Exercise 14.1

A bar chart

Students on an organisational psychology course have taken part in an experiment in which they have first conducted an interview while being observed by a visiting lecturer. Half the students are told the visitor was an expert in human relations and half of this group are given positive feedback by the visitor while the other half are given negative feedback. The other half of the students are told their visiting observer is simply ‘an academic’ and the same two types of feedback are given by the visitor to this group. Students were then asked to rate their own interview performance on a scale of 1 (very poor) to 10 (excellent). The results are displayed in the combined bar chart below.

Please describe the results as accurately as you can (no specific numerical values are required) and offer some possible explanation of the findings.

In general positive feedback has greater effect than negative feedback. In addition there appears to be a greater effect from the expert than from the academic. However, there also seems to be an interaction in that the academic’s negative feedback appears to have had a greater lowering effect than that of the expert. Perhaps the students receiving expert negative feedback would rationalise that the expert would be particularly harsh and have therefore discounted some of the feedback.

Exercise 14.2

A histogram

A psychology lecturer has given her students a class test where the maximum mark possible is 50. The histogram above shows the distribution of the test score data. The median of this distribution is 38:

1. Was the test easy or hard?

Easy. The distribution is negatively skewed and shows a ‘ceiling effect’ with many scores near the top end of the scale.

2. Why is the mean lower than the median?

Because the distribution is negatively skewed and therefore there are more extreme low scores in the tail pulling the mean (37.02) lower than the median (38).

3. What is the modal category of scores?

38–40

Note: The data for this histogram are contained in the file Psychology test scores.sav used in the Chapter 13 exercises.

Chapter 15 Frequencies and distributions

Exercise 15.1

z scores

A reading ability scale has a mean of 40 and a standard deviation of 10 and scores on it are normally distributed.

1. What reading score does a person get who has a z score of 1.5?

2. If a person has a raw score of 35 what is their z score?

0.5

3. How many standard deviations from the mean is a person achieving a z score of 2.5?

2.5 above the mean

4. What percentage of people score above 50 on the test?

15.87%

5. What percentage of people score below 27?

9.68%

6. What is the z score and raw score of someone on the 68th percentile?

z is where 18% (or .18) are above the mean. z is .47 and this is 4.7 above 40 = 44.7

7. At what percentile is a person who has a raw score of 33?

24^th (24.2%)

Exercise 15.2

Standard error

1. If a sample of 30 people produces a mean target detection score of 17 with a standard deviation of 4.5, what is our best estimate of the standard error of the sampling distribution of similar means?

1.006

2. Using the result of question 1, find the 95% confidence interval for the population mean.

15.39 to 18.61

Explanation: For 95% limits z must be -1.96 to +1.96; 1.96 x the se = 1.96 x 0.82 = 1.61. Hence we have 95% confidence that the true mean lies between 17 ± 1.61

Chapter 16 Significance testing – was it a real effect?

Exercise 16.1

One- or two-tailed tests

In each case below decide whether the research prediction permits a one-tailed test or whether a two-tailed test is obligatory.

1.There will be a difference between imagery and rehearsal recall scores.

two-tailed

2. Self-confidence will correlate with self-esteem

two-tailed

3.Extroverts will have higher comfort scores than introverts

one-tailed

4. Children on the anti-bullying programme will improve their attitude to bullying compared with the control group

one-tailed

5. Children on the anti-bullying programme will differ from the control group children on empathy

two-tailed

6. Anxiety will correlate negatively with self-esteem

one-tailed

7. Participants before an audience will make more errors than participants alone

one-tailed

8. Increased caffeine will produce a difference in reaction times

two-tailed

Exercise 16.2

Type I and Type II errors

Please answer true or false for each item.

Exercise 16.3

z values and significance

In the chapter we looked at a value of z and found the probability that a z that high or higher would be produced at random under the null hypothesis. We do that by taking the probability remaining to the right of the z value on the normal distribution in Appendix table 2 (if the z is negative we look at the other tail as in a mirror). Following this process, in the table below enter the exact value of p that you find from Appendix table 2. Don’t forget that with a two-tailed test we use the probabilities at both ends of the distribution. Enter your value with a decimal point and four decimal places exactly as in the table. Decide whether a z of this value would be declared significant with p ≤ .05

	z value	One or Two tailed	p =	Significant?
a	0.78	One
b	1.97	Two
c	2.56	Two
d	-2.24	Two
e	1.56	One
f	-1.82	Two

	z value	One or Two tailed	p =	Significant?
a	0.78	One	.2177	No
b	1.97	Two	.0488	Yes
c	2.56	Two	.0104	Yes
d	-2.24	Two	.0250	Yes
e	1.56	One	.0594	No
f	-1.82	Two	.0688	No

Chapter 17 Testing for differences between two samples

Here are the data sets, in SPSS and in MS Excel, for the results that are calculated by hand in this chapter of the book. The related t, unrelated t and single sample t data sets are all contained in the Excel file t test data sheets.xls. The files in SPSS are unrelated t sleep data.sav, related t imagery data.sav and single sample t test data.sav.

Exercise 17.1 Parametric test data sets

Download All Parametric test data sets (ZIP)

The data files for the non-parametric tests are linked below. The excel file nonparametric test data.xls contains the data for the Mann-Whitney, Wilcoxon and Sign test calculations. The SPSS files are, respectively, mannwhitney stereotype data.sav, wilcoxon module ratings data.sav and sign test therapy data.sav.

Exercise 17.1 Non-parametric data sets

Download All Non-parametric data sets (ZIP)

Exercise 17.1

t tests on further data sets

Exercise 17.1 Scenario data sets

Download All Scenario data sets (ZIP)

Data sets are provided here that correspond with the three research designs described below. Your first task is to identify which type of t test should be performed on the data for each design: unrelated t test, related t test, or single sample t test.

Scenario 1: Participants are asked to solve one set of anagrams in a noisy room and then solve an equivalent set in a quiet room. The prediction is that participants will perform worse in the noisy room. Data are given in seconds.

related t test

Scenario 2: A sample of children is selected from a ‘free’ school where the educational policy is radically different from the norm and where students are allowed to attend classes when they like and are also involved in deciding what lessons will be provided by staff. It is suspected their IQ scores may be lower than the average.

single sample t test

Scenario 3: One group of participants is asked to complete a scale concerning attitudes to people with disabilities. A second group of children is shown a film about the experiences of people with disabilities and then asked to complete the attitude scale a week later. The research is trying to show that changes in attitude last beyond the limits of the typical short-term laboratory experiment.

Unrelated t test

Now conduct the appropriate test on each data set and give a full report of the result including: t value, df, p value (either exact or in the ‘p less than …’ format), 95% confidence limits for the mean difference and effect size.

Scenario 1 (related t)
The mean time to solve anagrams in the noisy room (M = 193.45 secs, SD = 43.16) was higher than the mean time for the quiet room (M = 178.25, SD = 24.52) resulting in a mean difference of 15.2 seconds. This difference was not significant, t (19) = 1.558, p = .136. The mean difference (95% CI: -5.22 to 35.62) was small (Cohen’s d = 0.35).

Scenario 2 (single sample t )
The ‘free’ school children had a lower mean than the standard average IQ of 100 (M = 97.7, SD = 9.5). This difference, however, was not significant, t (24) = 1.22, p = .236. The difference between the sample mean and the population mean was small (2.32, 95% CI: -6.26 to 1.62, Cohen’s d = 0.15).

Note that here the known population standard deviation of 15 points has been used,so d is 2.32/15 = 0.15

Scenario 3 (unrelated t)
The film group produced a higher mean attitude score (M = 25.65, SD = 5.25) than the control group (M = 22.2, SD = 4.76). The difference between means was significant, t (38) = 2.18, p = .036. The difference between means (difference = 3.45, 95% CI: 0.24 to 6.66) was moderate (Cohen’s d = 0.69).

Note Effect size is calculated using where s is the mean standard deviation for the two groups ( sample sizes are equal).

Exercise 17.2

Non-parametric tests on the scenario data sets

Select below the appropriate non-parametric tests that can be used on the Scenario 1 and 3 data from the t test exercises. In one scenario more than one appropriate test can be selected.

Scenario 1

(Anagrams in noisy and quiet rooms)

Wilcoxon

Mann-Whitney

Sign test

Wilcoxon and Sign test

Scenario 3

(Control and film groups’ attitudes towards disabled people)

Wilcoxon

Mann-Whitney

Sign test

Mann-Whitney

Now conduct the appropriate test on each data set and give a full report of the result including: T or U, appropriate N values, p value (either exact or in the ‘p less than …’ format) and effect size.

Scenario 1: (Wilcoxon)
The differences between time taken to solve anagram in the noisy room and time taken in the quiet room were ranked according to size for each participant. A Wilcoxon T analysis on the difference ranks showed a rank total of 139 where noisy room times were higher than quiet room times and a rank total of 71 where quiet room times were higher. Hence, quiet rooms times were generally lower than noisy room times but this difference was not significant, T (N = 20) = 71, p = .204. The estimated effect size was small to moderate, r = 0.28.

Scenario 1: (Sign test)
For each participant the difference between noisy room and quiet room time was found and the sign of this difference recorded. The 13 cases where quiet room score was less than noisy room score were contrasted with the 7 cases where the difference was in the opposite direction using a sign test analysis. The difference was found not to be significant with S (N = 20) = 7, p = .263.

Scenario 3: Mann-Whitney
The children’s disability attitude scores were ranked as one group. The rank total for the control group was 339.5 whereas the total for the film trained group was 480.5. Using a Mann-Whitney analysis significance was very nearly achieved with U (N = 40)= 129.5, p = .056. The effect size was moderate, r = 0.3

Chapter 18 Tests for categorical variables and frequency tables

Exercise 18.1

A 2 x 2 chi-square analysis

Individual passers-by, approaching a pedestrian crossing, are targeted by observers who record whether the person crosses against the red man under two conditions, when no one at the crossing disobeys the red man and when at least two people disobey. The results are recorded in the table below.

	No jaywalker	At least two jaywalkers
Target disobeys light	16	27	43
Target obeys light	43	33	76
	59	60	119

1. Calculate the expected frequencies for a chi-square analysis. Copy the table below and enter your results.

	No jaywalker	At least two jaywalkers
Target disobeys light			43
Target obeys light			76
	59	60	119

2. Now conduct the chi-square analysis. The data set is available here. However if you are learning SPSS it is a good idea to set this up for yourself. Don’t forget to weight cases as described on p. 501 of the book. To weight cases here you need a variable called jaywalkers with two values, ‘none’ and ‘twoplus’. You need a second variable, obeys, with two values ‘no’ and ‘yes’. Make your datasheet show one case for each possible combination and enter the data from the observed data table above into the appropriate rows in a third column variable called count. Then select Data/Weight cases and drop the variable count into the weight cases box to the right.

Now enter your result into the spaces below. In each case use three places of decimals and don’t worry if you’re a fraction out. This could be because of rounding decimals in your calculations.

c2 (1, N = 119)
p value

	No jaywalker	At least two jaywalkers
Target disobeys light	37.7	38.3	43
Target obeys light	21.3	21.7	76
	59	60	119

c2 (1, N = 119)	4.122
p value	.042

Exercise 18.2

A loglinear analysis

Suppose that the research in Exercise 18.1 is extended to include an extra condition of five or more jaywalkers and to include a new variable of gender. The table below gives fictitious data for such an observational study. Conduct a loglinear analysis on the data outlining all significant results in your results report.

Males
	No jaywalker	At least two jaywalkers	Five or more jaywalkers
Target disobeys light	21	25	38	84
Target obeys light	38	35	22	95
	59	60	59	179
Females
	No jaywalker	At least two jaywalkers	Five or more jaywalkers
Target disobeys light	22	23	29	74
Target obeys light	39	37	31	107
	61	60	60	181

A three-way backward elimination loglinear analysis was performed on the frequency data in the table above produced by combining frequencies for jaywalkers, obedience and gender. One-way effects were not significant, likelihood ratio c2 (4) = 5.40, p = .248; two-way effects were significant, likelihood ratio c2 (5) = 11.968, p = .035; the three-way effect was not significant, c2 (2) = 1.789, p = .409. Only the jaywalkers x obedience interaction was significant c2 (2) = 10.845, p = .004. More people crossed against the light when there were more jaywalkers present.

Chapter 19 Correlation and regression

Exercise 19.1

Scatter plots

Have a look at the scatter plots below and select a description in terms of strength (weak, moderate, strong) and direction (positive, negative or curvilinear).

Figure 1

Figure 2

Figure 3

Figure 1: strong, positive
Figure 2: moderate, negative
Figure 3: strong, curvilinear

Exercise 19.2

Calculating Pearson’s and Spearman’s correlations

You’ll need one of these data sets for this exercise

Download Correlation Data Sets (ZIP)

The data set in the file correlation.sav (SPSS) or correlation.xls (Excel) is for you to use to calculate Pearson’s r and Spearman’s  (two-tailed) either by hand or using SPSS or a spreadsheet programme. Copy the table below and enter, using either p = or p ≤. Don’t worry if your answer is out by a small amount as this might be due to rounding errors.

Pearson’s r =		p =	p ≤
Spearman’s r =		p =	p ≤

Pearson’s r =	-.48	p = 0.005	p ≤0.01
Spearman’s r =	-.492	p = 0.004	p ≤0.01

Exercise 19.3

A few questions on correlation

1. Jarrod wants to correlate scores on a general health questionnaire with the subject that students have chosen for their first degree. Why can’t he?

First degree choice is a categorical variable.

2. Amy wants to correlate people’s scores on an anxiety questionnaire with their status – married or not married. Can she?

Yes, she can use the point biserial correlation coefficient (though better to conduct a difference test e.g., unrelated t).

3. As the number in a sample increases the critical value required for a significant correlation with p ≤ .05 increases or decreases?

Decreases.

In the first set of exercises for this chapter, question 2 asks you to draw the scatterplot for the maths and music data in Table 19.5. Here is a possible answer:

Figure 4

The maths and music score scatterplot – answer to exercise in Chapter 19.

Chapter 20 Multiple regression practice

Exercise 20.1

Multiple regression practice

The data set for the multiple regression analysis conducted in the book is called multiple regression data (book).sav and a link to this file is provided below.

Download Multiple Regression Data Sets (ZIP)

A further exercise in multiple regression can be performed using the file multiple regression ex.sav, which is also provided below. Imagine here that an occupational psychologist has measured ambition, work attitude and absences over the last year and used these to predict productivity over the last three months. If there is good predictive power the set of tests might be used in the selection process for new employees.

Download Multiple Exercise Regression Data Sets (ZIP)

In this exercise please perform the multiple regression analysis in SPSS if you have the programme and then answer the following multiple choice questions:

Chapter 21 Factor analysis

Download FA Data Sets (ZIP)

Exercise 21.1

In this chapter we carried out an Exploratory Factor Analysis on the data in the file FAdata.sav Now load up the file FAdata.sav and have a go at carrying out a Principal Components Analysis instead. The exercise, apart from getting factor analysis practice, is to inspect the SPSS Output files and see what differences there are between the EFA and PCA results.

To carry out the PCA do the following:

In SPSS proceed exactly as for stage 1 of the EFA analysis except:

When you select the Extraction button select Principal Components Analysis from the top drop down box after Method.

Select Rotation and make sure None is selected. Since we do in fact know the number of factors we will select you could rotate at this point, but usually you would not know the number of factors required.

Stage 2 would involve exactly the same decision process as for EFA and so we will extract four factors.

For stage 3 again, the steps are exactly the same but ensure that Principal Components Analysis is still selected in the Extraction box.

Now examine the two sets of results for differences before moving on to the answers below.

Answers

The first thing to note is that many tables stay the same in PCA as they were in EFA, including the first two – the Correlation Matrix and the KMO and Bartlett’s test results. This is expected since they deal with exactly the same data in exactly the same way. However, in the Communalities table the values for the initial solution in PCA are all 1. Remember that PCA explains all the variance across items and not just shared variance. Hence, since a communality is the amount of variation in an item explained by all the identified factors then this amount as a proportion will be 100% or simply 1 when other amounts of variance are expressed as decimal parts of 1.

Next note that below the scree plot, instead of a Factor Matrix there is now a Component Matrix. In this latter table note that there are more loadings that have exceeded the criterion value we set of .3 and that in almost all cases the loadings here are larger than those in the EFA Factor Matrix. Again this is because PCA incorporates all sources of variance into the analysis.

Moving to the all important Pattern Matrix tables we find that the two tables generally agree with one another but with some exceptions (the EFA table is in the book, p.xxx, and the PCA table appears below. The arts oriented creativity Factor 1 loads very highly on the creative item, more than for the equivalent PCA component, and relatively highly on visual and verbal creativity. The trivial loading for problem solving can be ignored and in the PCA table this item does not reach our criterion. However in the PCA table science creativity cross loads on components 1 and 2 and for component 1 the loading is not trivial being close to .4
Hence this item looks decidedly weak and ambiguous. The items included in factor 2 and component 2 are identical apart from emotionally aware which creeps into the PCA table with another trivial loading of .214 whereas, and as would be expected, this item loads much higher on component 3. Factor 3 (possibly related to emotional intelligence) and Component 3 also agree on included items except for the fact that creative is also included in Component 3 but again with a tiny loading just above the set criterion. The anger and impulsiveness related Factor 4 and Component 4 are again comparable apart from the inclusion of a small loading on creative for the latter.

Whilst inspecting and interpreting these tables it is worth remembering again that while the EFA table theoretically points to latent variables that are causal in creating the factors produced through EFA, the components in the PCA table are really constituents in an overall parsimonious description of the original items.

Figure 1

Pattern Matrix table for the PCA analysis.

Chapter 22 Multi-level analysis – ANOVA for differences between more than two conditions

Here is the data file oneway ANOVA caffeine.sav used in the main one-way ANOVA calculations in the chapter.

Download Oneway ANOVA Caffeine (SAV)

Exercise 22.1

Calculating one-way unrelated ANOVA

You will need one of these data sets for this exercise

Download 1-way Unrelated ANOVA (ZIP)

These are fictitious data supposedly collected from an experiment in which participants are given (with their permission) either Red Bull (a high caffeine drink), Diet Coke (moderate caffeine) or decaffeinated Coke (no caffeine, i.e., control group). They are then asked to complete a maze task where they have to trace round a maze to find the exit as quickly as possible.

Carry out a one-way ANOVA analysis on the data either in SPSS, by using a spreadsheet programme or even by hand and make a full report of results. Include the use of a Tukeyb post-hoc test if possible. You should also report effect sizes if you can. If you are calculating by hand you could conduct simple effect t tests between two samples at a time and adjust alpha accordingly.

F (2, 34) = 6.661, p = .004 (or < .01), partial eta2 = .282, a very large effect size.

Scores in the Red Bull group are significantly higher than scores in the caffeine-free group. This is shown by the Tukeyb test, which says that Red Bull and caffeine-free samples are in different subsets (non-homogenous) or by t tests. The simple effect test between Red Bull and caffeine-free gives t (22) = 3.76 or calculated by the Bonferroni method t (22) = 3.65. Either way this is highly significant (p < .01).

Exercise 22.2

Interpreting SPSS results for a one-way ANOVA

Shown below is the SPSS output after a one-way ANOVA has been performed on data where patients leaving hospital have been treated in three different ways, 1. traditionally (the control group), 2. with extra information (leaflet and video) given as they leave hospital and 3. with this information and a home visit from a health professional. The scores represent an assessment of their quality of recovery after three months. Have a go at answering the multiple-choice questions that appear below.

Test of homogeneity of variances
Score
Levene Statistic	df1	df2	Sig.
5.191	2	36	.010

ANOVA
Score
	Sum of Squares	df	Mean Square	F	Sig.
Between Groups	30.974	2	15.487	5.771	.007
Within Groups	96.615	36	2.684
Total	127.590	38

Score
Tukey B
Type of post-op care	N	Subset for alpha = 0.05
Type of post-op care	N	1	2
Trad. care	13	5.3846
Trad. care + inform.	13	6.7692	6.7692
trad. care + inform. + visit	13		7.5385
Means for groups in homogeneous subsets are displayed.

Exercise 22.3

The features of post hoc tests

Try the ‘matching’ quiz – match the test with the appropriate description.

Exercise 22.4

The Jonckheere trend test

p. 590 of the book describes the Jonckheere trend test and directs the reader here for the means of calculation. Below is a table of fictitious (and very minimal) data upon which we will conduct the test. This will tell us whether there is a significant trend for scores to increase across the three conditions from left to right. Assume that participants have been given information about a fictitious person including one criterion piece – in condition 1 that the person doesn’t care about global warming, in condition 2 no information about the person’s attitude is given and in condition 3 – the person cares a lot about global warming. The scores in the columns (in bold) are the participant’s rating of how likely they are to like the person.

	1. Doesn’t care	Values to right	2. No information	Values to right	3. Cares
Participant
A	3	7	2	4	10
B	5	7	7	3	8
C	6	7	9	2	7
D	3	7	8	2	11
Totals:		28		11

Procedure	Calculation
1. For each score *count* the number of scores that exceed it to the right. Start at the left-hand column.	See the table above. Example: The score of 5 for participant B in the ‘Doesn’t care’ column is exceeded by 7, 9 and 8 in the ‘No information’ column and 10, 8, 7 and 11 in the ‘Cares’ column, making 7 scores in all in the ‘values to right’ column. The value 9 in the ‘No information’ column is exceeded by 10 and 11 in the ‘Cares’ column so 2 goes into the ‘values to right’ column.
2. Add the two ‘values to right’ columns	See the table above (‘totals’)
3. Add the two totals and call this value X	X = 28 + 11 = 39

OK that was the easy part. Now things are rather tricky when we want to check if our value of X is significant. There are tables for this test but they only go up to n = 10 in each condition and you have to have the same number in each condition – a rare circumstance. We need to enter our value of X then into the following equation (take a deep breath):

Insert EqnChap_22 (1).tif

∑nini means multiply all possible combinations of sample size and add the results. If we had sample sizes of 4, 6 and 7 this would mean we would get: (4x6) + (4x7) + (6x7). In our case here though this is just 4x4 + 4x4 + 4x4 = 48.
∑(n²) is 4²+4²+4² = 3x16 = 48

∑(n³) is 4³ + 4³ + 4³ = 3x 64 = 192

The top of the equation then comes to: 78 – 48 - 1 = 29

N is the total sample size i.e. 3 x 4 = 12

Our z value then is = 29/√1/18 x [3888 – 144 – 384] = 29/3360/18 = 29/√186.67 = 29/13.66 = 2.12

A z value of 2.12 cuts off .0183 of the area of the normal distribution at either end (check in Appendix table 2 of the book) and this means that our overall p is 2 x .017 for a two-tailed test = .034 so we have a significant trend!

Chapter 23 Multi-factorial ANOVA designs

Exercise 23.1

Calculating two way unrelated ANOVA on a new data set

The data set used to calculate the example of a two-way unrelated ANOVA in this chapter is provided below and is named two way unrelated (book).sav. An Excel file with the same name is also provided.

Download Two way unrelated (book) Data Sets (ZIP)

The data set provided below (two-way unrelated ex) is one of fictitious data from a research project on leadership styles. Each participant has an LPC score, which stands for ‘least preferred co-worker’. People with high scores on this variable are able to get along with and accept relatively uncritically even those workers whom they least prefer to interact with. Such people make good leaders when situations at work are difficult (they are ‘people oriented’). By contrast low LPC people make good task leaders and are particularly effective when working conditions are good but tend to do poorly as leaders when conditions are a little difficult.

The variables in the file are sitfav with levels of highly favourable and moderately favourable (work situation) and lpclead with levels of high and low being the categories of high and low LPC scorers. Hence in these results we would expect to find an interaction between situation favourability and LPC leadership category. High LPC people should do well in moderately favourable conditions whereas low LPC people should do well in highly favourable conditions. Let’s see what the spoof data say. Conduct a two-way unrelated ANOVA analysis, including relevant means and standard deviations, and checking for homogeneity of variance and for effect sizes and power for each test.

Download 2 way unrelated EX Data Sets (ZIP)

The answers I got are revealed when you select the button below.

The main effect for LPC leadership is not significant (overall one type of leader did no better than the other), F(1,20) = 0.220, p = .644. The main effect for situation was also not significant (leadership performances overall were similar for highly and moderately favourable conditions), F(1,20) = 0.220, p = .644). However, there was a significant interaction between situation and leadership type. In highly favourable conditions, High LPC leaders (M = 5.33, SD = 1.03) scored lower than low LPC leaders (M = 6.5, SD = 1.64), whereas in moderately favourable conditions they scored higher (M = 7.0, SD = 1.41) than low LPC leaders (M = 5.33, SD = 1.03), F(1,20) = 7.045, p = .022. Levene’s test for homogeneity of variance was not significant so homogeneity was assumed. Partial eta-squared for the interaction was .261 with power estimated at .714.

Exercise 23.2

Interpreting an SPSS output for a two-way unrelated analysis.

Here is part of the SPSS output data for a quasi-experiment in which participants were grouped according to their attitude towards students. This is the ‘attitude group’ variable in the display below. Each group was exposed to some information about a fictitious person including their position on reintroducing government grants to students. Participants were later asked to rate the person on several characteristics including ‘liking’. It can be assumed for instance that participants who were pro students would show a higher liking for someone who wanted to introduce grants than someone who didn’t. Study the print out and try to answer the questions below.

Levene's test of equality of error variancesa

Dependent Variable: liking

F	df1	df2	Sig.
2.757	5	41	.031

Tests of between-subjects effects

Dependent Variable: liking

Source	Type III sum of squares	df	Mean square	F	Sig.
Corrected Model	114.601a	5	22.920	7.947	.000
Intercept	1880.558	1	1880.558	652.033	.000
information	3.670	2	1.835	.636	.534
attitudegroup	15.953	1	15.953	5.531	.024
information * attitudegroup	93.557	2	46.778	16.219	.000
Error	118.250	41	2.884
Total	2135.000	47
Corrected total	232.851	46

a. R Squared = .492 (Adjusted R Squared = .430)

Chapter 24 ANOVA for repeated measures designs

The data sets used to calculate the repeated measures examples in this chapter are provided below.

Data Sets (ZIP)

Exercise 24.1

Calculating a one-way repeated measures ANOVA example

You will need the following data set to complete this exercise:

Download Repeated Measures 1-way ex Data Sets (ZIP)

The file repeated measures 1-way.sav (SPSS) or repeated measures 1-way.xls (Excel) contains data for a fictitious study in which new employees were assessed for efficiency in their similar jobs after one month, six months and twelve months. Calculate the one-way repeated measures results and compare with the answer given below. The repeated measures variable is contained in the columns entitled efficency1, efficiency6 and efficiency12. In SPSS use the General linear model menu item. Don’t forget to employ Mauchly’s test for sphericity (check the Options button).

The means (and standard deviations) of the efficiency scores after 1 month, 6 months and 12 months respectively were M = 38.3 (3.91), M = 41.5 (6.51) and M = 46.1 (6.92). The means differed significantly with F (2,30) = 8.247, p = .001, effect size (eta squared ) = .355. Mauchly’s test was not significant, p = .349.

Exercise 24.2

Calculating a two-way mixed design ANOVA example

You will need this data set to carry out this exercise:

Download Repeated Measures Mixed Data Sets (ZIP)

In this exercise you can tackle a two-way mixed design where there is one repeated measures factor (efficiencyfrom the last exercise) and one between groups factor. This new factor is one of training.

Imagine the new employees in the last exercise were randomly divided into a group that received no training, one that received training and one that received training and some team building exercises early on in their employment at the company. You need the file repeated measures mixed.sav (SPSS)or repeated measures mixed.xls (Excel).

Conduct the two-way analysis and see if you get the same findings as the report below. Ignore the column headed ‘graduate’ for now. Make sure you inspect the table of means (by asking for Descriptive statistics under the Options button in SPSS). You need the efficiencyaverage variable to calculate overall efficiency means for each training group.

There was a main effect for efficiency with the means rising from M = 44.8, SD = 5.91 at one month, through M = 45.8, SD = 7.26 at six months to M = 47.8, SD = 6.80 at twelve months. F2.90 = 3.824, p = .025, effect size (partial n2 ) = .078.

There was a main effect for training with means of M = 42.3, SD = 3.41 for the untrained group, M = 47.1, SD = 4.31 for the trained group and M = 49.0, SD = 4.41 for the trained and team building group. F_2,45 = 11.564, p < .001, effect size (partial n2 ) = .993.

The interaction was not significant. Sphericity was at an acceptable level (p = .299). Levene’s test for homogeneity of variance was significant for efficiency1 so equality of variances was not assumed for this variable.

If you’re really feeling adventurous you could try the three-way mixed ANOVA that is produced by including the factor of graduate. This tells us whether the participant was a graduate or not. I have only provided brief details of results below but enough to let you see you’ve performed the analysis correctly.

Main effect efficiency F_2,84 = 4.018, p = .022, n2 = .087
Main effect training F_2,42 = 15.433, p < .001, n2 = .424
Interaction efficiency x training not significant
Interaction efficiency x graduate not significant
Interaction training x graduate significant F2,42 = 7.708, p = .001, n2 = .268
(graduates better than non-graduates if not trained or trained but with team building too they are worse! – overall).
Three-way interaction efficiency x training x graduate just significant F_4,84 = 2.492, p = .049, n2 = .106 (it seems that for team building and training, graduates improved more across the three times than non-graduates and, for training only, non-graduates improved more than graduates).

Exercise 24.3

Calculation of a two-way repeated measures ANOVA

Download 2 Way Repeated Ex (ZIP)

The data set these files are based on is an experiment where participants undergo the Stroop experience. Stroop was the psychologist responsible for demonstrating the dramatic effect that occurs when people are asked to name the colour of the ink in which words are written – there is a big problem if the word whose colour you are naming is a different colour word (e.g., red written in green – an ‘incongruent’ colour word)! People take much longer to name the ink colour of a set of such words than they do to name the colours of ‘congruent’ words (colour words written in the ink colour of the word they spell).

A further refinement of the experiment, based on a theory of sub-vocal speech when reading, is the prediction that words that sound like colour words (such as ‘shack’ or ‘crown’) should also produce some interference, if incongruent, thus lengthening times to name ink colours. The Stroop factor of this experiment then involves three conditions: naming the ink colour of congruent words; naming the ink colour of incongruent words sounding like colour words; and naming the ink colour of incongruent colour words.

In the imaginary experiment here we have introduced a second factor, which is that people perform the three Stroop tasks both alone and then in front of an audience. The data are presented as a 2 x 3 repeated measures design so there are six columns of raw data, the numbers being number of seconds to read the list of words. Control is naming the ink colour of congruent words, rhyme uses words sounding like incongruent colour words and colour uses incongruent colour words. The end part of each variable refers to the audience conditions, alone if no audience and aud with an audience observing.

Remember that in SPSS you have to name the two repeated measures factors then carefully select columns when asked to define the levels of each variable. If you enter the repeated measures variable names as first ‘stroop’, then ‘audience’ you will be asked to identify variables in the order stroop 1, audience 1, stroop 1, audience 2 and so on, so that’s controlalone, controlaud, rhymealone … and so on. You will need the three extra mean columns when looking at the differences related to the Stroop main effect.

Carry out the two-way analysis, remembering to check Mauchly’s sphericity statistic and to ask for descriptive statistics so you can see the mean of each level of each variable.

The main effect for Stroop is basically massive (as it nearly always is). The overall means for the three conditions were control M = 44.5 (SD = 14.54), rhyme M = 58.9 (SD = 14.82) and colour M = 97.7 (SD = 20.47). F2,18 = 34.873, p < .001, partial n² = .795
There was no effect for audience and the interaction stroop x audience was not significant. Sphericity was not a problem.

Exercise 24.4

Questions on SPSS results for a two-way ANOVA

The table below shows part of the SPSS output for a two-way ANOVA calculation. Extroverts and introverts (factor extint) have been asked to perform a task more than once during the day to see whether extroverts improve through the day and introverts worsen.

	df	F	p	Effect size h2
Performance	2	3.795	.026	.073
Performance x extint	2	23.225	.000	.326
Error (performance)	96
Extint	1	.026	.872	.001
Error	48

Exercise 24.5

The Page trend test

On p. 631 of the book there is a short description of the Page trend test, which is used when you have three or more related sets of data and you want to see whether they follow a trend across conditions. We’ll use the (very minimal) data below as an example. Imagine children have been tested for reading improvement on three successive occasions. We want to see if there is significant improvement.

Reading scores for four children tested three times
Score	Rank	Score	Rank	Score	Rank
3	2	2	1	10	3
5	1	7	2	8	3
6	1	9	3	7	2
3	1	8	2	11	3
Totals:	Ra = 5		Rb = 8		Rc = 11

Step 1. First we calculate a statistic: where Rk is the total of each rank column and K is the predicted order of that column. For instance when k is 1 the total is 5 and the predicted order of that column was 1 (we expect children to be lowest here).

Hence L = (5 x 1) + (8 x 2) + (11 x 3) = 54

Step 2. Again there are tables for Page but only going up to N = 10. For any value of N we can use the formula: where n is the sample size and k is the number of conditions, so here we get: (12 x 54) – (36 x 16)/√(36 x 8 x 4) = 2.12

1.96 is the critical value for z at .05, two-tailed so this would be a significant trend.

Chapter 25 Choosing a significance test for your data (and internet resources)

Exercise 25.1

Identifying simple two-condition designs

Some two-condition research designs are outlined below. Your job is to read the information (all of it!) and decide which test it is most appropriate to use. You should read the criteria for selecting tests contained in the first part of Chapter 23 before attempting the exercise. The tests that are possible are listed in the table below. Select parametric tests unless there is information contrary to their use.

Related t

Unrelated t

Single sample t

Mann-Whitney U

Wilcoxon T

Pearson correlation

Spearman correlation

Chi-square

Sign test

1. Children are classified as high or low text-message senders and researchers investigate whether their scores on a reading test differ significantly.

Unrelated t

2. The same children as in (1) are recorded as extroverts or introverts with the enquiry being: do extroverts send more texts than introverts?

chi-square

3. Students are tested for self-esteem before and after the exam period to see whether there is a significant change in self-esteem.

Related t

4. A researcher believes that stress has an effect on physical health and so measures people’s stress levels with a questionnaire and records the number of times they have visited the doctor with minor ailments over the past two years. She believes stress levels will predict number of visits.

Pearson correlation

5. A class of school pupils is asked to solve a set of simple maths problems, each working on their own and with a prize for the best performance. They are then asked to solve similar problems, but this time they are told they are working as a group and will receive a prize if they beat other groups. The dependent variable is the difference between their two performances and it is found that these scores are very different from a normal distribution in terms of kurtosis.

Wilcoxon

6. The same researcher as in (4) tests the hypothesis that stress affects self-esteem and expects higher stress levels to be related to lower self-esteem scores and vice versa. In this project she finds the self-esteem scores are heavily skewed and cannot remove this with a transformation.

Spearman’s correlation

7. A sample of people is found who have just completed their second degree. A researcher is interested in whether their second degree category is better than their first degree category. Since degree grades are categorical the only record is whether the second degree was better or worse than the first.

Sign test

8. Participants are divided into two groups. One group is asked to doodle (by filling in letters) while listening to a guest list of names invited to a party. A control group does the same task without doodling. It is predicted that the doodle group will perform better when asked to recall as many names as possible. The variances of the two groups are very different and there are quite different numbers of participants in each group.

Mann-Whitney U

Exercise 25.2

Identifying ANOVA designs

From the following brief descriptions of research designs try to identify the ANOVA type with factors and levels. For instance, an answer might be one-way, unrelated or 2 x 3 x 2 mixed.

One-way

2 x 3

3 x 3

2 x 3 x 2

3 x 4

Unrelated

Repeated measures

Mixed

1. Participants are asked to rate a fictitious person having been told they are either pro-hanging, anti-hanging or neutral.

One-way unrelated

2. Participants are presented with both positive and negative traits for later recall and are induced into either a depressed, neutral or elated mood.

2 x 3 mixed

3. All participants are asked to name colours of colour patches, non-colour words and colour words.

One-way repeated measures

4. Male and female clients experience either psychoanalysis, behaviour modification or humanistic therapy and effects are assessed.

2 x 3 unrelated

5. Participants are given either coffee, alcohol or a placebo and are all asked to perform a visual monitoring task under conditions of loud, moderate, intermittent and no noise.

3 x 4 mixed

6. Older or younger participants are asked to use one of three different memorising methods.

2 x 3 unrelated

7. Extroverts and introverts are asked to perform an energetic and later a dull task after being given a stimulant. On a subsequent occasion they are given a tranquiliser and repeat the two tasks. Later still they are given a placebo and asked to repeat the tasks.

2 (personality) x 2 (task) x 3 (drug) mixed

8. Participants perform tasks in front of an audience and when alone. They are first asked to sort cards into three piles, then four piles and finally five piles.

2 x 3 repeated measures

Chapter 26 Planning your practical and writing up your report

Exercise 26.1

Identifying problematic report statements

In the extracts from students’ psychology practical reports below try to describe what is dubious about the statement before checking the answers.

1. Title: An experiment to see whether giving people coffee, decaffeinated coffee or water will have an effect on their memory of 20 items in a list.

Far too long-winded and could be stripped nicely down to: ‘The experimental effect of caffeine on recall memory’.

2. The design was an experiment using different types of drink …

What kind of experiment (independent samples, repeated measures, quasi- etc.)? It’s true that we might be told later what kinds of drink were used but why not just explicitly state the levels of the independent variable straight away?

3. 20 students were selected at random and asked …

Hardly likely that they were selected truly at random. Probably ‘haphazardly’.

4. Materials used were a distraction task, a questionnaire, mirrors …

Never list materials, use normal prose description.

5. The results were tested with a t test …

Which results? In the simplest studies there are always several ways in which the data could be tested. We could, for instance test the difference between standard deviations rather than means. Usually though the reader needs to know explicitly which means were tested – there are usually more than just two, and anyway ‘results’ is just vague.

6. Miller (2008) stated that “There is no such thing as a loving smack. The term is an oxymoron. No child feels love as they are being beaten or slapped.”

No page number for the quotation.

7. The result proved that …

We never use ‘prove’ in psychology, or in most practical sciences for that matter. Findings usually support a hypothesis or theory, or they challenge it.

8. The experimental group scored higher.

Higher than what, whom? Might mean ‘higher than in the first condition’ or ‘higher than the control group’ etc. ALWAYS complete a comparative phrase in anything you write. E.g.: ‘Extroverts are more outgoing than introverts’; ‘Individuals who were intrinsically motivated showed deeper engagement and greater persistence than those who were extrinsically motivated’.

9. The experiment was not valid as it was conducted in a laboratory.

a. It is findings or conclusions from results that have validity not whole studies. b. Faults in the design, materials, procedure or statistical processes are threats to validity. Findings usually have a certain degree of validity; it is not an all-or-none concept. More threats tend to lower validity. c. Why should the use of a laboratory lower validity? Don’t assume your reader will agree with you automatically. You have to justify all criticisms that you make.

10. More research is needed

What kind of research exactly? That more is needed is always true. Try to specify the research most immediately needed by following up on your critical points and answer them by suggesting appropriate relevant research.

11. More participants should have been tested

Why? There may well have been plenty of participants to support statistical significance. This has to do with power so if you must make this point try to show how much more power would have been involved with a greater number of participants. In a well designed and controlled experiment though, 20 or 30 participants per condition is usually ample so be careful with this one.

12. More males/females should have been tested.

Why? You must justify. Is there any reason to believe that males and females perform differently on this task? If not don’t use this kneejerk criticism.

13. More people from other cultures should have been tested.

Another kneejerk criticism but this one has deeper problems the first being that the writer is assuming that all participants came from one culture – if that is possible. Unless you are living in a highly isolated part of the world, ‘one culture’ of origin is quite unlikely (e.g. British is not one culture but many). Besides this, and assuming there is a dominant culture involved, why should culture make any difference? This must be explained and the claim therefore justified.

Further Information

Precognition studies and the curse of the failed replications

The following is an article from The Guardian in which Chris French discusses the system of peer review that allowed a prestigious journal to refuse to publish a failed replication of some otherwise astonishing pre-cognition (predicting the future) studies.
www.theguardian.com/science/2012/mar/15/precognition-studies-curse-failed-replications

Chris French is a professor of psychology at Goldsmiths, University of London, and heads the Anomalistic Psychology Research Unit. He edits The Skeptic magazine.

Some notes on peer review

Chapter 1 mentions the fact that , in the interests of scientific integrity, psychological research articles (as with all other sciences) are usually submitted to a peer review process.

Over recent years several problems with this process have been raised, including:

a prolonged time between completion of research and eventual publication. In today’s academic research world there is mounting pressure on academics to achieve publication of their research articles in prestigious journals in order to increase their ‘impact factor’. As Stephen Curry writes in The Guardian (7th September, 2015) the central point of scientific publication in journals is “the rapid dissemination of new results so they can be read, critiqued and built upon. We have lost sight of that because scientific publication through journals has become more about earning prestige points to advance your career than communicating new findings. This has perverted both the motivations of authors and the job of reviewers.” Curry suggests that an encouraging recent development has been the increased use of ‘preprints’ which are research reports yet to receive peer review but which researchers publish on sites such as PsyArXiv. Preprints can be commented on by site users and this can help authors prepare articles for formal submission.
conflict of interest if the findings and conclusions of the submitted research article run counter to the theoretical position of one or more of the reviewers – see Info Box 1.4 in Chapter 1. A partial answer to this has been to use anonymous author names so that, at least, reviewers cannot simply talk down the potential publications of their known rivals but this still does not stop strong critical comment if the findings conflict with the reviewer’s position. Even with anonymity, where the research area is highly specific and only a few researchers are working in it, there will be a good chance that reviewers will recognise the origin of the research article.
claims of ‘sloppy science’ if authors of article will not share their data with reviewers. The APA does not require that data be shared with peer reviewers or online. For some detail on this controversy see: www.nature.com/news/peer-review-activists-push-psychology-journals-towards-open-data-1.21549 Obviously, without shared data reviewers cannot authenticate the statistical processes used and the accuracy of consequent results and the appropriateness of conclusions.
a wide variety of forms of peer review. There is no one agreed method. See www.ncbi.nlm.nih.gov/pmc/articles/PMC1420798/ Approaches range from almost just saying ‘The paper is fine’ to highly detailed, intensive and critical commentary. Reviews take time and are usually competed for no fee. Hence an author might be at the mercy of an overstressed reviewer who only glosses over the paper.
because academic career success very much depends on developing a CV littered with ‘high impact’ publications in top level journals, some have been driven to the use of fake reviews. This is not always the direct fault of the authors concerned. In some cases fake reviews were organised by agencies offering editing and submission services to highly productive academics. In 2017 the publishing company Springer had to retract (withdraw from publication) 64 papers because of faked reviews.

An extended discussion of the concept of ecological validity

In Chapter 4 there is a discussion of the much misused and poorly understood concept of ecological validity. This is the original discussion which I trimmed down for the book.

Ecological validity

I attempt here to fully discuss the meaning of this enigmatic and catch-all term ‘ecological validity’ because its widespread and over-generalised use has become somewhat pointless. Hammond (1998) refers to its use as ‘casual’ and ‘corrupted’ and refers to the robbing of its meaning (away from those who continue to use its original sense) as ‘bad science, bad scholarship and bad manners’.

There are three relatively distinct and well used uses of the term, which I shall call ‘the original technical’, ‘the external validity version’ and ‘the pop version’, the latter term to signify that this use I would consider to be unsustainable since it has little to do with validity and its indiscriminate use will not survive close scrutiny.

1. The original technical meaning

Brunswik (e.g., 1947) introduced the term ecological validity to psychology as an aspect of his work in perception ‘to indicate the degree of correlation between a proximal (e.g., retinal) cue and the distal (e.g., object) variable to which it is related’ (Hammond, 1998). This is a very technical use. The proximal stimulus is the information received directly by the senses – for instance two lines of differing lengths on our retinas. The distal stimulus is the nature of that actual object in the environment that we are receiving information from. If we know that the two lines are from two telegraph poles at different distances from us we might interpret the two poles as the same size but one further away than the other. The two lines have ecological validity in so far as we know how to usefully interpret them in an environment that we have learned to interpret in terms of perspective cues. The two lines do not appear to us as having different lengths because we interpret them in the context of other cues that tell us how far away the two poles are. In that context their ecological validity is high in predicting that we are seeing telegraph poles. More crudely, brown patches on an apple are ecologically valuable predictors of rottenness; a blue trade label on the apple tells us very little about rot.

2. The external validity meaning

Many textbooks, including this one, have taken the position that ecological validity is an aspect of external validity and refers to the degree of generalisation that is possible from results in one specific study setting to other different settings. This has usually had an undertone of comparing the paucity of the experimental environment with the greater complexity of a ‘real’ setting outside the laboratory. In other words researchers asked ‘how far will the results of this laboratory experiment generalise to life outside it?’ The general definition, however, has concerned the extent of generalisation of findings from one setting to another and has allowed for the possibility that a study in a ‘real life’ setting may produce low ecological validity because its results do not generalise to any other setting – see the Hofling study below. Most texts refer to Bracht and Glass (1968) as the originators of this sense of the term and the seminal work by Cook and Campbell (1979) also supported this interpretation.
On this view effects can be said to have demonstrated ecological validity the more they generalise to different settings and this can be established empirically by replicating studies in different research contexts.

3. The ‘pop’ version

The pop version is the definition very often taught on basic psychology courses. It takes the view that a study has (high) ecological validity so long as the setting in which it is conducted is ‘realistic’, or the materials used are ‘realistic’, or indeed if the study itself is naturalistic or in a ‘natural’ setting (e.g., Howitt, 2013). The idea is that we are likely to find out more about ‘real life’ if the study is in some way close to ‘real life’, begging the question of whether the laboratory is not ‘real life’.

The problem with the pop version is that it has become a knee-jerk mantra – the more realistic the more ecological validity. There is, however, no way to gauge the extent of this validity. It is just assumed, so much so that even A-level students are asked to judge the degree of ecological validity of fictitious studies with no information given about successful replications or otherwise.

Teaching students that ecological validity refers to the realism of studies or their materials simply adds a new ‘floating term’ to the psychological glossary that is completely unnecessary since we already have the terminology. The word to use is ‘realism’. As it is, students taught the pop version simply have to learn to substitute ‘realism’ when they see ‘ecological validity’ in an examination question.

For those concerned about the realism of experimental designs Hammond (1998) points out that Brunswick (1947) introduced another perfectly suitable term. He used representative design to refer to the need to design experiments so that they sample materials from among those to which the experimenter wants to generalise effects. He asked that experimenters specify in their design the circumstances to which they wished to generalise. For instance, in a study on person perception, in the same way as we try to use a representative sample of participants, we should sample a representative sample of stimulus persons (those whom participants will be asked to make a judgment about) in order to be able to generalise effects to a wider set of perceived people. Hammond is not the only psychologist worried about the misuse of Brunswik’s term. Araújo, Davids and Passos (2007) argue that the popular ‘realism’ definition of ecological validity is a confusion of the term with representative design:
‘… ecological validity, as Brunswik (1956) conceived it, refers to the validity of a cue (i.e., perceptual variable) in predicting a criterion state of the environment. Like other psychologists in the past, Rogers et al. (2005) confused this term with another of Brunswik’s terms: representative design.’ (p.69)

This article by Araújo et al is a good place to start understanding what Brunswik actually meant by ecological validity and demonstrates that arguments to haul its meaning back to the original are contemporary and not old-fashioned. The term is in regular use in its original meaning by many cognitive psychologists. They are not clinging to a ‘dinosaur’ interpretation in the face of unstoppable changes in the evolution of human language.

Milgram v. Hofling – which is more ‘ecologically valid’?

Another problem with the pop version is that it doesn’t teach students anything at all about validity as a general concept. It simply teaches them to spot when material or settings are not realistic and encourages them to claim that this is a ‘bad thing’. It leads to confusion with the laboratory–field distinction and a clichéd positive evaluation of the latter over the former. For example, let’s compare Milgram’s famous laboratory studies of obedience with another obedience study by Hoflinget al (1966), where nurses working in a hospital, unaware of any experimental procedure, were telephoned by an unknown doctor and broke several hospital regulations by starting to administer, at the doctor’s request, a potentially lethal dose of an unknown medicine. The pop version would describe Hofling’s study as more ‘ecologically valid’ because it was carried out in a naturalistic hospital setting on real nurses at work. In fact, this would be quite wrong in terms of external validity since the effect has never been replicated. The finding seems to have been limited to that hospital at that time with those staff members. A partial replication of Hofling’s procedures failed to produce the original obedience effect (Rank and Jacobson, 1977¹), whereas Milgram’s study has been successfully replicated in several different countries using a variety of settings and materials. In one of Milgram’s variations, validity was demonstrated when it was shown that shifting the entire experiment away from the university laboratory and into a ‘seedy’ downtown office, apparently run by independent commercial researchers, did not significantly reduce obedience levels. Here, following the pop version, we seem to be in the ludicrous situation of saying that Hofling’s effect is valid even though there is absolutely no replication of it, while Milgram’s is not, simply because he used a laboratory! In fact Milgram’s study does demonstrate ecological validity on the generalisation criterion. The real problem is that there is no sense of ‘validity’ in the pop notion of ecological validity.

In a thorough discussion of ecological validity Kvavilashvili and Ellis (2004) bring the original and external validity usages together by arguing that both representativeness and generalisation are involved, with generalisation appearing to be the more dominant concept. Generalisation improves the more that representativeness is dealt with. However, they argue that a highly artificial and unrealistic experiment can still demonstrate an ecologically valid effect. They cite as an example Ebbinghaus’s memory tasks with nonsense syllables. His materials and task were quite unlike everyday memory tasks but the effects Ebbinghaus demonstrated could be shown to operate in everyday life, though they were confounded by many other factors. The same is true of research in medicine or biology; we observe a phenomenon, make highly artificial experiments in the laboratory (e.g., by growing cultures on a dish) then re-interpret results in the everyday world by extending our overall knowledge of the operation of diseases and producing new treatments. In psychology, though, it is felt that by making tasks and settings more realistic we have a good chance of increasing ecological validity. Nevertheless, ecological validity must always be assessed using research outcomes and not guessed at because a study is ‘natural’.

I think the conclusions that emerge from this review of the uses of ecological validity are that:

Examiners (public or institutional) should certainly not assess the term unless they are prepared to state and justify explicitly the specific use they have in mind prior to any examinations.
The pop version tells us nothing about formal validity and is a conceptual dead end; ‘realism’ can be used instead and ‘ecological validity’ takes us no further.
Rather than ‘ecological validity’ it might be more accurate to use the term ‘external validity concerning settings’. Although in 1979 Cook and Campbell identified ecological validity with generalisation to other settings (i.e., external validity), in their update of their 1979 classic, Shadish, Cook and Campbell (2002) talk of external validity with regard to settings. They seem to pass the original term back to Brunswik saying that external validity is often ‘confused with’ ecological validity. By contrast, Kvavilashvili and Ellis (2004) argue that ‘the difference between the two concepts is really small’. Obviously we cannot hope for pure agreement among academics!
It is ridiculous to assume that on the sole basis that a study is carried out in a natural setting or with realistic materials it must be in some way more valid than a laboratory study using more ‘artificial’ materials. Validity is about whether the effect demonstrated is genuinely causal and universal. An effect apparently demonstrated in the field can easily be non-genuine and/or extremely limited, as was Hofling’s.
The pop version cannot be sustained scientifically and is not of much use beyond being a technical sounding substitute for the term ‘realism’. The original version is still used correctly by those working in perception and related fields. The external validity (generalising) version is favoured by Kvavilashvili (over representativeness), and directs attention to a useful aspect of validity in the design of research. However, the external validity version is challenged by authors such as Hammond (1998) and Araújo et al (2007), who claim that this is not at all what Brunswik meant nor is it the way cognitive psychologists use the term. Perhaps it’s better to lie low, use alternative terminology, and see how the term evolves. I rather sense that the pop version will hang around, as will complete misunderstanding of the terms ‘null hypothesis’ and ‘negative reinforcement’.

¹Unlike in Hofling’s study, nurses were familiar with the drug and were able to communicate freely with peers.

Festinger’s end-of-the-world study: ‘UNDER COVER’

In a famous participant observation case study, Leon Festinger, Henry W. Reiken and Stanley Schachter (1956) studied a woman they called Mrs Keech who had predicted the end of the world by a mighty flood. After reading about Mrs Keech in a newspaper, Festinger’s group joined her followers to see what would happen when the world did not, in fact, come to an end.

Mrs Keech claimed to have been receiving messages from a group called the Omnipotent Guardians from the planet Clarion. They had sent her messages through a combination of telepathy, automatic writing and crystal ball gazing to indicate that at midnight on a particular December evening the world would be destroyed, killing all humanity except for Mrs Keech’s group. They were to be rescued by flying saucers sent from Clarion to Mrs Keech’s home. During the weeks before the ‘end of the world’, several of the group members quit their jobs and spent their savings in preparation for the end. Messages continued to arrive daily to Mrs Keech. At meetings Festinger and his associates frequently would excuse themselves and write down their notes while in the bathroom. At one meeting, the members were asked to look into a crystal ball and report any new pieces of information. One member of Festinger’s group was forced to participate, even though he was hesitant to take a vocal role in meetings. After he remained silent for a time, Mrs Keech demanded that he report what he saw. Choosing a single word response, he truthfully announced, ‘Nothing’. Mrs Keech reacted theatrically, ‘That’s not nothing. That’s the void.’

On the final evening, members of the group waited for midnight. During the evening other instructions arrived. Cultists were told to remove their shoelaces and belt buckles since these items were unsafe aboard flying saucers. When midnight passed without any end of the world in sight – and without any flying saucers visible – members of the group began questioning whether they had misunderstood the instructions. Mrs Keech began to cry and whimpered that none of the group believed in her. A few of the group comforted her and reasserted their belief in her. Some members re-read past messages, and many others sat silently with stony expressions on their faces. Finally, in the wee hours of the morning, Mrs Keech returned to the group with a new ‘automatically written’ message from the Omnipotent Guardians. Because of the faith of the group, the Earth had been spared. The cult members were exuberant and during the weeks that followed actually attempted to secure additional converts.

One can find both insight and some entertainment in such a participant observation study. To Festinger’s group of scholars, the experience permitted them to provide a field study examination (in the form of a case study) of the theory of cognitive dissonance that they were trying to develop at the time – a theory that became a major force in communication and social psychology.

Excerpted from John C. Reinard, Introduction to Communication Research. Boston, McGraw-Hill, 2001, pp 186–8.

The Jefferson transcription system

Chapter 12 of the book promises to provide some details of the Jefferson transcription system, used to transcribe recorded conversations into text. Here they are. First of all though a few points about the system.

1. The system is used thoroughly in conversation analysis (see Hutchby and Wooffitt, 1998 – reference is in the book). In this approach there are two major concerns – those of turn-taking and of the characteristics of speech delivery. As you can see from the symbols below, many of these concern how a speech partner overlaps or take over from another (e.g., simply the fact of transcription line numbers) and how speech is presented (emphasis, rising tone, pauses and so on). As an exercise listen to a conversation and observe how people use pauses in order to ‘keep the floor, i.e., indicate that they want to go on speaking and not be interrupted. In some Australian and Californian accents there is a marked up turn at the end of an utterance when the speaker indicates the end of their speech and invites a reply.

2. The most important point is that the system is intricate and time hogging so check first that you really do need to use it! Most grounded theory approaches would not transcribe in this kind of detail. The system is used mainly in conversation analysis where researchers are interested in the analysis of para-linguistics, in a nutshell, not what is said but how it is said, for instance, emphasis, rising and falling tones, hesitations, etc. Such hesitation might indicate, for instance, that the speaker is embarrassed or in some way has a problem with what they are saying. You can of course indicate that there was such hesitation without resorting to the details of Jefferson.

3. Any two transcribers will come up with slightly different transcriptions. There is no clear criterion as to what counts as a marked or ‘less marked’ fall in pitch, for instance.

4. Each new utterance (according to the researcher’s hearing) is given a new line number in the transcript.

5. You would need a recording system that permits you to constantly rewind very short sections of speech. Researchers have traditionally used transcription machines although computer programmes can now do the job quite well.

John:	Used on the left hand side of the transcript indicates the speaker.
?:	Indicates the speaker is unknown to the transcriber; ?John: indicates a guess that the speaker is John.
(0.5)	Indicates a time interval in seconds.
(.)	Indicates a pause of less than 0.2 seconds.
=	‘Latching’ – the point where one person’s speech ends and continues on a new line without pause, usually after a fragment from their conversation partner, e.g., A: I got the train tickets for Sunday and I - = B: [Sun::day!] A: = yeah (.) you said Sunday didn’t you
[ ]	Brackets used where there is overlapping talk as in the example above where speaker B overlaps speaker A.
hh	Speaker breathes out – more hs means a longer breath.
.hh	Speaker breathes in – more hs, longer intake of breath.
(( ))	Double brackets can be used to describe a non-verbal sound (such as a kettle boiling) or something else in the context which the transcriber would like to convey (e.g., that a person is frowning whilst speaking).
-	Indicates the last word was suddenly cut off as in the example above, first line.
:	The previous sound has been stretched; more colons greater stretching, as in thee example above after ‘Sun’.
!	Indicates emphasis – energised speech.
( )	The passage of speech is unclear. The distance between indicates the estimated length of the piece. If there is speech between the brackets it is the transcriber’s best ‘guess’ at the speech.
.	fall in tone indicating a stop
,	‘continuing’ intonation
?	Rising inflection, as in but not exclusively so, a question.
¯	Indicate marked rising or falling of tone – placed before the rise/fall.
a:	Less marked fall in pitch as in – B: ‘whaddya gonna do: there
a:	Less marked rise in pitch (underlined colon).
Underline	Emphasis on the underlined section.
SHOUT	Indicates the word in capitals was louder than those around it.
° °	Speech between is noticeably softer than surrounding speech.
> <	Speech between is quicker than the rest.
< >	Speech between is slowed down.

Largely based on Hutchby and Woffitt (1998) p. vi

A few of the symbols can be seen in this conversation sequence.

1 A: gmornin:. j’sleep OK
2 B: huh
3 A: was that a yes
4 B: hhh° yer °
5 A: HALL¯O:: (0.5) are we awake.
6 B: <just about>
7 A: >look we really need to talk [about the holiday<
8 B: [I know (.) the holiday

If you’d like to get practice specifically with conversation analysis or at least with this kind of transcription you could visit Charles Antaki’s Conversation Analysis website at:

www.homepages.lboro.ac.uk/~ssca1/sitemenu.htm

where you can go through a script and look at various stages of transcription ending with a version using the notation system above.

Chapter 12: Appendix

Download Chapter 12 Appendix (PDF 132KB)

Sod’s law – or Murphy’s law as the Americans more delicately put it

A discussion of Sod’s law – a BBC spoof documentary about testing the notion that toast always falls butter side down and other issues.

Do you ever get the feeling that fate has it in for you? At the supermarket, for instance, do you always pick the wrong queue, the one looking shorter but which contains someone with 5 un-priced items and several redemption coupons or with the checkout clerk about to take a tea break? Do you take the outside lane only to find there’s a hidden right-turner? Sod’s law (known as Murphy’s law in the US), in its simplest form states that whatever can go wrong, will. Have you ever returned an item to a shop, or taken a car to the garage with a problem, only to find it working perfectly for the assistant? This is Sod’s law working in reverse but still against you. A colleague of mine holds the extension of Sod’s law that things will go wrong even if they can’t. An amusing QED (BBC) TV programme (Murphy’s Law, 1991¹) tested this perspective of subjective probability. The particular hypothesis, following from the law, was that celebrated kitchen occurrence where toast always falls butter side down – doesn’t it? First attempts engaged a university physics professor in developing machines for tossing the toast without bias. These included modified toasters and an electric typewriter. Results from this were not encouraging. The null hypothesis doggedly retained itself, buttered sides not making significantly more contact with the floor than unbuttered sides. It was decided that the human element was missing. Sod’s law might only work for human toast droppers.

The attempt at more naturalistic simulation was made using students and a stately home now belonging to the University of Newcastle. Benches and tables were laid out in the grounds and dozens of students asked to butter one side of bread then throw it in a specially trained fashion to avoid toss bias. In a cunning variation of the experiment, a new independent variable was introduced. Students were asked to pull out their slice of bread and, just before they were about to butter a side, to change their decision and butter the other side instead. This should produce a bias away from butter on grass if sides to fall on the floor are decided by fate early on in the buttering process. Sadly neither this nor the first experiment produced verification of Sod’s law. In both cases 148 slices fell one way and 152 the other – first in favour of Murphy’s law then against it. Now the scientists had one of those flashes of creative insight. A corollary of Sod’s law is that when things go wrong (as they surely will – general rule) they will go wrong in the worst possible manner. The researchers now placed expensive carpet over the lawn. Surely this would tempt fate into a reaction? Do things fall butter side down more often on the living room carpet (I’m sure they do!)? I’m afraid this was the extent of the research. Frequencies were yet again at chance level, 146 buttered side down, 154 up.

Murphy, it turned out, was a United States services officer testing for space flight by sending service men on a horizontally jet propelled chair across a mid-Western desert to produce many Gs of gravitational pressure. I’m still not convinced about his law. Psychologists suggest the explanation might lie in selective memory – we tend to remember the annoying incidents and ignore all the un-notable dry sides down or whizzes through the supermarket tills. But I still see looks on customers’ faces as they wait patiently – they seem to know something about my queue …

¹ Sadly no longer available except via The British Film Institute

The sociologist’s chip shop

An attempt to exemplify the concepts of the null hypothesis and significance in an everyday homely tale of chips.

Imagine one lunchtime you visit the local fish and chip emporium near the college and get into conversation with the chippy. At one point she asks you: ‘You’re from the college then? What do you study?’. Upon your reply she makes a rasping sound in her throat and snaps back. ‘Psychology?!!! Yeughhh!!! All that individualist, positivistic crap, unethical manipulation of human beings, nonsensical reductionism rendering continuous human action into pseudo-scientific behavioural elements. What a load of old cobblers! Give me sociology any day. Post-Marxist-Leninist socialism, symbolic interactionism, real life qualitative participative research and a good dollop of post-modern deconstructionism’. You begin to suspect she may not be entirely fond of psychology as an academic subject. You meekly take your bag of chips and proceed outside only to find that your bag contains far too many short chips, whilst your sociology friends all have healthy long ones.

We must at this point stretch fantasy a little further by assuming that this story is set in an age where, post-salmonella, BSE and genetically modified food, short chips are the latest health scare; long chips are seen as far healthier since they contain less fat overall (thanks to my students for this idea).

Being a well-trained, empirically based psychology student, you decide to design a test of the general theory that the chippy is biased in serving chips to psychology and sociology students. You engage the help of a pair of identical twins and send them simultaneously, identically clothed, into the chip shop to purchase a single bag of chips. One twin wears a large badge saying ‘I like psychology’ whilst the other twin wears an identical badge, apart from the replacement of ‘psychology’ with ‘sociology’. (OK! OK! I spotted the problem too! Which twin should go first? Those bothered about this can devise some sort of counterbalanced design – see Chapter 3 – but for now I really don’t want to distract from the point of this example). Just as you had suspected, without a word being spoken by the twins beyond their simple request, the sociology twin has far longer chips in her bag than does the psychology twin!

Now, we only have the two samples of chips to work with. We cannot see what goes on behind the chippy’s stainless steel counter. We have to entertain two possibilities. Either the chippy drew the two samples (fairly) from one big chip bin (H0) or the bags were filled from two separate chip bins, one with smaller chips overall and therefore with a smaller mean chip length than the other bin (H1). You now need to do some calculations to estimate the probability of getting such a large difference between samples if the bags were filled from the same bin (i.e., if the null hypothesis is true). If the probability is very low you might march back into the shop and demand redress (hence you have rejected H0!). If the probability is quite high – two bags from the same bin are often this different – you do not have a case. You must retain the null hypothesis.

In this example, our research prediction would be that the sociology student will receive longer chips than the psychology student. Our alternative hypothesis is that the psychology and sociology chip population means are different; the null hypothesis that the population means are the same (i.e., the samples were drawn from the same population).

Please, sir, may we use a one-tailed test, sir?

A discussion of the arguments for and against the use of one-tailed tests in statistical analysis in psychology.

It is hard to imagine statisticians having a heated and passionate debate about their subject matter. However, they’re scientists and of course they do. Odd, though, are the sorts of things they fall out over. Whether it is legitimate to do one-tailed tests in psychology on directional hypotheses is, believe it or not, one of these issues. Here are some views against the use of one-tailed tests on two-group psychological data.

A directional test requires that no rationale at all should exist for any systematic difference in the opposite direction, so there are very few situations indeed where a directional test is appropriate with psychological data consisting of two sets of scores.
MacRae, 1995

I recommend using a non-directional test to compare any two groups of scores … Questions about directional tests should never be asked in A level examinations.
MacRae, 1995

I say always do two-tailed tests and if you are worried about b, jack the sample size up a bit to offset the loss in power.
Bradley, 1983 (Cited in Howell, 1992)

And some arguments for the use of one-tailed tests are as follows:

To generate a theory about how the world works that implies an expected direction of an effect, but then to hedge one’s bet by putting some (up to 1⁄2) of the rejection region in the tail other than that predicted by the theory, strikes me as both scientifically dumb and slightly unethical … Theory generation and theory testing are much closer to the proper goal of science than truth searching, and running one-tailed tests is quite consistent with those goals.
Rodgers, 1986 (cited in Howell, 1992)

… it has been argued that there are few, if any, instances where the direction [of differences] is not of interest. At any rate, it is the opinion of this writer that directional tests should be used more frequently.
Ferguson and Takane, 1989

MacRae is saying that when we conduct a one-tailed test, any result in the non-predicted direction would have to be seen as a chance outcome since the null hypothesis for directional tests covers all that the alternative hypothesis does not. If the alternative hypothesis says the population mean is larger than 40 (say) then the null hypothesis is that the population mean is 40 or less. To justify use of a one-tailed test, you must, in a sense, be honestly and entirely uninterested in an effect in the opposite direction. A textbook example (one taken from a pure statistics book, not a statistics-for-social-science textbook) would be where a government agency is checking on a company to see that it meets its claim to include a minimum amount of (costly) vitamin X in its product. It predicts and tests for variations below the minimum. Variations above are not of interest and almost certainly are relatively small and rare, given the industry’s economic interests. A possibly equivalent psychological example could be where a therapist deals with severely depressed patients who score very much up the top end of a depression scale. As a result of therapy a decline in depression is predicted. Variations towards greater depression are almost meaningless since, after a measurement of serious depression, the idea of becoming even more depressed is unmeasurable and perhaps unobservable.

Rodgers, however, says what most people feel when they conduct psychological projects. Why on earth should I check the other way when the theory and past research so clearly point in this one direction? In a sense, all MacRae and Bradley are asking is that we operate with greater surety and always use the 2.5% level rather than the 5% level. If we’ve predicted a result, from closely argued theory, that goes in one direction, then use two-tailed values and find we are significant in the opposite direction, we’re hardly likely to jump about saying ‘Eureka! It’s not what I wanted but it’s significant!’ Probably we will still walk away glumly, as for a failure to reach significance, saying ‘What went wrong then?’ It will still feel like ‘failure’. If we had a point to make we haven’t made it, so we’re hardly likely to rush off to publish now. Our theoretical argument, producing our hypothesis, would look silly (though it may be possible to attempt an explanation of the unexpected result).

During this argument it always strikes me as bizarre that textbooks talk as if researchers really do stick rigidly to a hypothesis testing order: think through theory, make a specific prediction, set alpha, decide on one- or two-tailed test, find out what the probability is, make significance decision. The real order of events is a whole lot more disjointed than that. During research, many results are inspected and jiggled with. Participants are added to increase N. Some results are simply discarded.

Researchers usually know what all the probability values are, however, before they come to tackle the niggling problem of whether it would be advisable to offer a one- or two-tailed analysis in their proposed research article. When the one-tailed test decision is made is a rather arbitrary matter. In some circles and at some times it depends on the received view of what is correct. In others it depends on the actual theory (as it should) and in others it will depend on who, specifically, is on the panel reviewing submitted articles.

So what would happen, realistically speaking, if a researcher or research team obtained an opposite but highly ‘significant’ result, having made a directional prediction? In reality I’m sure that if such a reversal did in fact occur, the research team would sit back and say ‘Hmm! That’s interesting!’ They’re not likely to walk away from such an apparently strong effect, even though it initially contradicts their theorising. The early research on social facilitation was littered with results that went first one way (audiences make you perform better) then the other (no, they don’t; they make performance worse). Theories and research findings rarely follow the pure and simple ideal. It is rare in psychology for a researcher to find one contrary result and say ‘Oh well. That blows my whole theory apart. Back to the drawing board. What shall I turn my hand to today then?’ The result would slot into a whole range of findings and a research team with this dilemma might start to re-assess their method, look at possible confounding variables in their design and even consider some re-organisation of their theory in order to incorporate the effect.

It is important to recognise the usefulness of this kind of result. Far from leaving the opposite direction result as a ‘chance event’, the greater likelihood is that this finding will be investigated further. A replication of the effect, using a large enough sample to get p ≤ .01, would be of enormous interest if it clearly contradicts theoretical predictions – see what the book says about the 1% level.

So should you do one-tailed tests? This is clearly not a question I’m going to answer, since it really does depend upon so many things and is clearly an issue over which the experts can lose friends. I can only ever recall one research article that used a one-tailed test and the reality is that you would be unlikely to get published if you used them, or at least you would be asked to make corrections. Personally though, in project work, I can see no great tragedy lying in wait for those who do use one-tailed tests so long as they are conscientious, honest and professional in their overall approach to research, science and publishing. As a student, however, you should just pay attention to the following things:

follow the universally accepted ‘rules’ given in the main text;
be aware that this is a debate, and be prepared for varying opinions around it;
try to test enough participants (as Bradley advises), pilot your design and tighten it, so that you are likely to obtain significance at p ≤ .01, let alone .05!
the issue of two-tailed tests mostly disappears once we leave simple two-condition tests behind. In ANOVA designs there is no such issue.

For references, please see the textbook.

Calculating effect sizes and power in a 2-way ANOVA

In Chapter 21 of the book we calculated a two-way ANOVA on the data that are provided here in Exercise 1. The book tells you that you can obtain effect size and power using SPSS or G*Power, but that the by-hand calculations are rather complex. For the sake of completeness though I will give the detail here.

Remember we are dealing with an experiment where participants consume either strong coffee, decaffeinated coffee or nothing. These conditions are referred to as caffeine, decaff and none. Two groups of participants are tested, those who have just had five hours’ sleep and those who have been awake for a full 24 hours. We therefore have a 2 (sleep) x 3 (caffeine) design. The results Table 21.2 from the book is:

Skill scores

* Overlong decimal figures are used here in order that our figures come close to those given by SPSS. With sensible rounding, our ANOVA results would be more different from the SPSS result than they are.
Table 21.2: Driving skill scores by caffeine and sleep conditions

Main effects

The general rules for calculating effect size and power for main effects in a two-way design are as follows, where we will refer to one factor with the general term A and the other with the term B.
For factor A, = where αi is the difference between the grand mean and each mean for factor A ignoring any difference across factor B. Ideally, these means would be the true population means if known but in calculating power after an experiment we use the sample means and assume these are good estimates of the population means. For our caffeine/sleep study, then, the grand mean is 4.48 and the α values would be the differences between 4.48 and each value, that is, 5.13 (caffeine), 4.25 (decaff) and 4.06 (none). Each of these differences is squared, the results added and this result for the top of the equation is divided by a, the number of levels of A (three in the caffeine case) multiplied by the mean square for error (MSE) from the ANOVA calculation.

Calculating for the caffeine effect we get:
= 0.321

Our effect size, also referred to (by Cohen, 1988) as f, is 0.321. If we want to consult tables then we now need Φ which = in general terms but with a factorial ANOVA we substitute n' for n where ,. Our df are broken down like this:
Total df = N – 1 = 47
Main effect (caffeine) df = 2
Main effect (sleep) df = 1
Interaction df = 2 x 1 = 2
Error df = 47 – 2 – 1 – 2 = 42

Hence and Φ = 0.321√15 = 1.24

We go to appendix Table 13 with Φ = 1.24, df1 = 2 and dfe = 42 and α = 0.05. With a bit of extrapolation, we find that power is around 0.48 (don't forget that power = 1 – β). This is close enough to SPSS and G*Power which both agree on 0.471.

As explained in the book SPSS provides effect size and power if you select this before your analysis and in G*Power select F tests, ANOVA: Fixed effects, special, main effects and interactions and Post hoc: Compute achieved power - given α, sample size and effect size. The values to enter are 0.321 for effect size, .05 for α err probability, 48 for total sample size, 2 for numerator df and 3 for number of groups.

Interaction effects
For the interaction things are a bit tricky. The formula is .
The calculation of is carried out for each individual cell of the data table. The calculation is: where is the cell mean (5 hours/caffeine if working from the top left in table 1 above), is the mean of the caffeine condition which that cell is in (caffeine and 5.125 in this case), is the mean for the sleep condition of that cell (5 hours and 4.7917) and finally is the grand mean. Each of these six values (one for each of the cells in the table) is squared and the results are added together. This sum is divided by n' times the MSE where n' is calculated as above.

Put perhaps more simply for each cell you subtract from the cell mean the mean of the row it is in and the mean of the column it is in and then add the grand mean. Square the result and divide by n' times the MSE. Let's do this now. The calculation may look horrific but if you stick to the rule just stated you should be able to follow each step:

1. top line of the fraction inside the square root sign is:
(6.25 – 5.125 – 4.7917 + 4.48)2 + (4 – 5.125 – 4.1667 + 4.48)2 + (4.125 – 4.25 – 4.7917 + 4.48)2 +
(4.375 – 4.25 – 4.1667 + 4.48)2 + (4 – 4.0625 – 4.7917 + 4.48)2 + (4.125 – 4.0625 – 4.1667 + 4.48)2

Which comes to 1.983!

Dividing this by we get: 1.983/(2 x 3 x 2.074) = 0.159

The square root of 0.159 is 0.399. This is our value for Φ' (or f)

We need to calculate n' and we use as above but here is for the interaction and hence dfcaffeine x dfsleep which is 2 x 1 = 2. Hence n' will again be 15.

Φ then is Φ' x √n' = 0.399 x √15 = 1.54

Going to appendix Table 13 with Φ = 1.54, df for the interaction being 2, dferror = 42 and α = 0.05 we need (again) to do some extrapolation between the table values but I get a table value of around 0.35 and, remembering that power = 1 – β this gives power at 0.65 which is pretty close to the SPSS and G*Power values of 0.662. To use G*Power proceed exactly as given above but for the number of groups enter 6.

Research Methods and Statistics in

Psychology

Students

Multiple-choice test on basic material

Study for use with questions 5–19

Study design description for questions 23 and 24

Glossary

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Flashcards

Weblinks

Psychology, science and research

Experiments and experimental designs in psychology

Observational methods – watching and being with people

Psychological tests and measurement scales

Comparison studies – cross-sectional, longitudinal and cross-cultural studies

Ethical issues in psychological research

Analysing qualitative data

Correlation and regression

Choosing a significance test for your data (and Internet resources)

Online statistical textbooks

Planning your practical and writing up your report

Data Sets

Exercises

Chapter 1 Psychology, science and research

Exercise 1.1

Exercise 1.2

Exercise 1.3

Disconfirming theories – a ‘lateral thinking’ problem

Exercise 1.4a and 1.4b

Trusting intuition (the rationale for these exercises appears at the end)

1.4a

1.4b

Rationale for exercise 1.4

Chapter 2 Measuring people – variables, samples and the qualitative critique

Exercise 2.1

Creating variables to measure psychological constructs

Exercise 2.2

Identifying sample types

Chapter 3 Experiments and experimental designs in psychology

Exercise 3.1

The nature of experiments

Exercise 3.2

Identifying experimental designs

Chapter 4 Validity in psychological research

Exercise 4.1

Tabatha and her validity threats

Answers: Possible threats to validity in the study:

Exercise 4.2

Spotting the confounding variables

Chapter 5 Quasi-experiments and non-experiments

Exercise 5.1

Chapter 6 Observational methods – watching and being with people

Exercise 6.1

Defining some key terms used in the chapter

Chapter 7 Interview methods – asking people direct questions