SPSS Explained

A

ANOVA An acronym for the ANalysis Of VAriance. By analysing the variance in the data due to different sources (e.g. an independent variable or error) we can decide if our experimental manipulation is influencing the scores in the data.

Asymp. Sig. (asymptotic significance) An estimate of the probability of a nonparametric test statistic employed by computer statistical analysis programs. This is often used when the exact probability cannot be worked out quickly.

B

beta weight The average amount by which the dependent variable increases when the independent variable increases by one standard deviation (all other independent variables are held constant).

between subjects Also known as independent measures. In this design, the samples we select for each condition of the independent variable are independent, as a member of one sample is not a member of another sample.

bootstrapping A sample is used to estimate a population. New bootstrap samples are randomly selected from the original sample with replacement (so an item can be selected more than once). The bootstrap samples, often 1,000 or more, are then used to estimate the population sampling distribution.

C

case A row in the Data Editor file; the data collected from a single participant.

Chart Editor The feature in SPSS that allows the editing of charts and graphs.

comparisons The results of a statistical test with more than two conditions will often show a significant result but not where the difference lies. We need to undertake a comparison of conditions to see which ones are causing the effect. If we compare them two at a time this is known as pairwise comparison and if we perform unplanned comparisons after discovering the significant finding these are referred to as post hoc comparisons.

component The term used in the principal components method of factor analysis for a potential underlying factor.

condition A researcher chooses levels or categories of the independent variable(s) to observe the effect on the dependent variable(s). These are referred to as conditions, levels, treatments or groups. For example, ‘morning’ and ‘afternoon’ might be chosen as the conditions for the independent variable of time of day.

confidence interval  In statistics we use samples to estimate population values, such as the mean or the difference in means. The confidence interval provides a range of values within which we predict lies the population value (to a certain level of confidence). The 95 per cent confidence interval of the mean worked out from a sample indicates that the population mean would fall between the upper and lower limits 95 per cent of the time.

contrasts  With a number of conditions in a study we may plan a set of com - parisons such as contrasting each condition with a control condition. These planned comparisons are referred to as contrasts. We can plan complex contrasts – for example, the effects of conditions 1 and 2 against condition 3.

correlation  The degree to which the scores on two (or more) variables co-relate. That is, the extent to which a variation in the scores on one variable results in a corresponding variation in the scores on a second variable. Usually the relation - ship we are looking for is linear. A multiple correlation examines the relationship between a combination of predictor variables with a dependent variable.

critical value  We reject the null hypothesis after a statistical test if the probability of the calculated value of the test statistic (under the null hypothesis) is lower than the significance level (e.g. .05). Computer programs print out the probability of the calculated value (e.g. .023765) and we can examine this to see if it is higher or lower than the significance level. Textbooks print tables of the critical values of the test statistic, which are the values of the statistic at a particular probability. For example, if the calculated value of a statistic (i.e. a t test) is 4.20 and the critical value is 2.31 (at the .05 level of significance), then clearly the probability of the test statistic is less than .05.

crosstabulation  Frequency data can be represented in a table with the rows as the conditions of one variable and the columns as the conditions of a second variable. This is a crosstabulation. We can include more variables by adding ‘layers’ to the crosstabulation in SPSS.

D

Data Editor  The feature in SPSS where data is entered. Saving the information from the Data Editor will produce an SPSS .sav file. There are two windows within the Data Editor: Data View and Variable View.

Data View  The Data View window within the Data Editor presents a spreadsheet style format for entering all the data points.

degrees of freedom  When calculating a statistic we use information from the data (such as the mean or total) in the calculation. The degrees of freedom is the number of scores we need to know before we can work out the rest using the information we already have. It is the number of scores that are free to vary in the analysis.

dependent variable  The variable measured by the researcher and predicted to be influenced by (that is, depend on) the independent variable.

descriptive statistics  Usually we wish to describe our data before conducting further analysis or comparisons. Descriptive statistics such as the mean and standard deviation enable us to summarise a dataset.

discriminant function A discriminant function is one derived from a set of independent (or predictor) variables that can be used to discriminate between the conditions of a dependent variable.

distribution The range of possible scores on a variable and their frequency of occurrence. In statistical terms we refer to a distribution as a ‘probability density function’. We use the mathematical formulae for known distributions to work out the probability of finding a score as high as or as low as a particular score.

E

effect size The size of the difference between the means of two populations, in terms of standard deviation units.

eigenvalue In a factor analysis an eigenvalue provides a measure of the amount of variance that can be explained by a proposed factor. If a factor has an eigenvalue of 1, it can explain as much variance as one of the original independent variables.

equality of variance See homogeneity of variance.

F

factor Another name for ‘variable’, used commonly in the analysis of variance to refer to an independent variable. In factor analysis we analyse the variation in the data to see if it can be explained by fewer factors (i.e. ‘new’ variables) than the original number of independent variables.

G

general linear model The underlying mathematical model employed in parametric statistics. When there are only two variables, X and Y, the relationship between them is linear when they satisfy the formula Y = a + bX (where a and b are constants). The general linear model is a general form of this equation allowing as many X and Y variables as we wish in our analysis.

grouping variable In analysing data in SPSS we can employ an independent measures independent variable as a grouping variable. This separates our parti - ci pants into groups (such as introverts versus extroverts). It is important when inputting data into a statistical analysis program that we include the grouping variable as a column, with each group defined (i.e. introvert as ‘1’ and extrovert as ‘2’). We can then analyse the scores on other variables in terms of these groups, such as comparing the introverts with the extroverts on, say, a monitoring task.

H

homogeneity of variance Underlying parametric tests is the assumption that the populations from which the samples are drawn have the same variance. We can examine the variances of the samples in our data to see whether this assumption is appropriate with our data or not.

homoscedasticity The scores in a scatterplot are evenly distributed along and about a regression line. This is an assumption made in linear correlation. (This is the correlation and regression equivalent of the homogeneity of variance assumption.)

hypothesis A predicted relationship between variables. For example: ‘As sleep loss increases so the number of errors on a specific monitoring task will increase.’

I

illustrative statistics Statistics that illustrate rather than analyse a set of data, such as the total number of errors made on a reading task. Often we illustrate a dataset by means of a graph or a table.

independent or independent measures A term used to indicate that there are different subjects (participants) in each condition of an independent variable; also known as ‘between subjects’.

independent variable A variable chosen by the researcher for testing, predicted to influence the dependent variable.

inferential statistics Statistics that allow us to make inferences about the data – for example, whether samples are drawn from different populations or whether two variables correlate.

interaction When there are two or more factors in an analysis of variance, we can examine the interactions between the factors. An interaction indicates that the effect of one factor is not the same at each condition of another factor. For example, if we find that more cold drinks are sold in summer and more hot drinks sold in winter, we have an interaction of ‘drink temperature’ and ‘time of year’.

intercept A linear regression finds the best fit linear relationship between two variables. This is a straight line based on the formula Y = a + bX, where b is the slope of the line and a is the intercept, or point where the line crosses the Y-axis. (In the SPSS output for an ANOVA the term ‘intercept’ is used to refer to the overall mean value and its difference from zero.)

item When we employ a test with a number of variables (such as questions in a questionnaire) we refer to these variables as ‘items’, particularly in reliability analysis where we are interested in the correlation between items in the test.

J

none

K

kurtosis The degree to which a distribution differs from the bell-shaped normal distribution in terms of its peakness. A sharper peak with narrow ‘shoulders’ is called leptokurtic and a flatter peak with wider ‘shoulders’ is called platykurtic.

L

levels of data Not all data are produced by using numbers in the same way. Sometimes we use numbers to name or allocate participants to categories (i.e. labelling a person as a liberal, and allocating them the number 1, or a conservative, and allocating them the number 2). In this case the data is termed ‘nominal’. Sometimes we employ numbers to rank order participants, in which case the data is termed ‘ordinal’. Finally, when the data is produced on a measuring scale with equal intervals the data is termed ‘interval’ (or ‘ratio’ if the scale includes an absolute zero value). Parametric statistics require interval data for their analyses.

Likert scale A measuring scale where participants are asked to indicate their level of agreement or disagreement to a particular statement on, typically, a 5- or 7-point scale (from strongly agree to strongly disagree).

linear correlation The extent to which variables correlate in a linear manner. For two variables this is how close their scatterplot is to a straight line.

linear regression A regression that is assumed to follow a linear model. For two variables this is a straight line of best fit, which minimises the ‘error’.

 

M

main effect The effect of a factor (independent variable) on the dependent variable in an analysis of variance measured without regard to the other factors in the analysis. In an ANOVA with more than one independent variable we can examine the effects of each factor individually (termed the main effect) and the factors in combination (the interactions).

MANOVA A

Multivariate

Analysis

of

Variance. An analysis of variance technique where there can be more than one dependent variable in the analysis.

mean A measure of the ‘average’ score in a set of data. The mean is found by adding up all the scores and dividing by the number of scores.

mean square A term used in the analysis of variance to refer to the variance in the data due to a particular source of variation.

median If we order a set of data from lowest to highest, the median is the point that divides the scores into two, with half the scores below and half above the median.

mixed design A mixed design is one that includes both independent measures factors and repeated measures factors. For example, a group of men and a group of women are tested in the morning and the afternoon. In this test ‘gender’ is an independent measures variable (also known as ‘between subjects’) and time of day is a repeated measures factor (also known as ‘within subjects’), so we have a mixed design.

mode The score that has occurred the highest number of times in a set of data.

multiple correlation The correlation of one variable with a combination of other variables.

multivariate Literally, this means ‘many variables’ but is most commonly used to refer to a test with more than one dependent variable (as in the MANOVA).

N

nonparametric test Statistical tests that do not use, or make assumptions about, the characteristics (parameters) of populations.

normal distribution A bell-shaped frequency distribution that appears to underlie many human variables. The normal distribution can be worked out mathematically using the population mean and standard deviation.

null hypothesis A prediction that there is no relationship between the inde - pendent and dependent variables.

O

one-tailed test A prediction that two samples come from different populations, specifying the direction of the difference – that is, which of the two populations will have the larger mean value.

outlier An extreme value in a scatterplot in that it lies outside the main cluster of scores. When calculating a linear correlation or regression, an outlier will have a disproportionate influence on the statistical calculations.

Output Navigator An SPSS navigation and editing system in an outline view in the left-hand column of the output window. This enables the user to hide or show output or to move items within the output screen.

P

p

value The probability of a test statistic (assuming the null hypothesis to be true). If this value is very small (e.g. .02763), we reject the null hypothesis. We claim a significant effect if the p value is smaller than a conventional significance level (such as .05).

parameter A characteristic of a population, such as the population mean.

parametric tests Statistical tests that use the characteristics (parameters) of populations or estimates of them (when assumptions are also made about the populations under study).

partial correlation The correlation of two variables after having removed the effects of a third variable from both.

participant A person taking part as a ‘subject’ in a study. The term ‘participant’ is preferred to ‘subject’ as it acknowledges the person’s agency – i.e. that they have consented to take part in the study.

population A complete set of items or events. In statistics, this usually refers to the complete set of subjects or scores we are interested in, from which we have drawn a sample.

post hoc

tests When we have more than two conditions of an independent variable, a statistical test (such as an ANOVA) may show a significant result but not the source of the effect. We can perform post hoc tests (literally, post hoc means ‘after this’) to see which conditions are showing significant differences. Post hoc tests should correct for the additional risk of Type I errors when performing multiple tests on the same data.

power of a test The probability that, when there is a genuine effect to be found, the test will find it (that is, correctly reject a false null hypothesis). As an illustration, one test might be like a stopwatch that gives the same time for two runners in a race but a more powerful test is like a sensitive electronic timer that more accurately shows the times to differ by a fiftieth of a second.

probability The chance of a specific event occurring from a set of possible events, expressed as a proportion. For example, if there were 4 women and 6 men in a room, the probability of meeting a woman first on entering the room is 4/10 or .4 as there are 4 women out of 10 people in the room. A probability of 0 indicates an event will never occur and a probability of 1 that it will always occur. In a room of only 10 men there is a probability of 0 (0/10) of meeting a woman first and a probability of 1 (10/10) of meeting a man.

Q

none

R

range The difference between the lowest score and the highest score.

rank When a set of data is ordered from lowest to highest, the rank of a score is its position in this order.

regression The prediction of scores on one variable by their scores on a second variable. The larger the correlation between the variables, the more accurate the prediction. We can undertake a multiple regression where the scores on one variable are predicted from the scores on a number of predictor variables.

reliability A reliable test is one that that will produce the same result when repeated (in the same circumstances). We can investigate the reliability of the items in a test (such as the questions in a questionnaire) by examining the relationship between each item and the overall score on the test.

repeated measures A term used to indicate that the same subjects (participants) are providing data for all the conditions of an independent variable; also known as ‘within subjects’.

residual A residual is the difference between an actual score and a predicted score. If scores are predicted by a model (such as the normal distribution curve) then the residual will give a measure of how well the data fit the model.

S

Sig. (2-tailed) The exact probability of the test statistic for a two tailed prediction. Sometimes an estimate (see Asymp.Sig. – asymptotic significance – is also included).

significance level The risk (probability) of erroneously claiming a relationship between an independent and a dependent variable when there is not one. Statistical tests are undertaken so that this probability is chosen to be small, usually set at .05 indicating that this will occur no more than 5 times in 100.

simple main effects A significant interaction in a two factor analysis of variance indicates that the effect of one variable is different at the various conditions of the other variable. Calculating simple main effects tells us what these different effects are. A simple main effect is the effect of one variable at a single condition of the other variable.

skew The degree of symmetry of a distribution. A symmetrical distribution, like the normal distribution, has a skew of zero. The skew is negative if the scores ‘pile’ to the right of the mean and positive if they pile to the left.

sphericity An assumption we make about the data in a repeated measures design. Not only must we assume homogeneity of variance but homogeneity of covariance – that is, homogeneity of variance of the differences between samples. Essentially, we must assume the effect of an independent variable to be consistent across both conditions and subjects in these designs for the analysis to be appropriate.

standard deviation A measure of the standard (‘average’) difference (deviation) of a score from the mean in a set of scores. It is the square root of the variance. (There is a different calculation for standard deviation when the set of scores are a population as opposed to a sample.)

standard error of the estimate A measure of the ‘average’ distance (standard error) of a score from the regression line.

standard error of the mean The standard deviation of the distribution of sample means. It is a measure of the standard (‘average’) difference of a sample mean from the mean of all sample means of samples of the same size from the same population.

standard score The position of a score within a distribution of scores. It provides a measure of how many standard deviation units a specific score falls above or below the mean. It is also referred to as a z score.

statistic Specifically, a characteristic of a sample, such as the sample mean. More generally, statistic and statistics are used to describe techniques for summarising and analysing numerical data.

statistics viewer The SPSS Statistics Viewer is the name of the file that contains all of the output from the SPSS procedures. Often referred to (as in this book) as the Output Window.

subject The term used for the source of data in a sample. If people are the subjects of the study it is viewed as more respectful to refer to them as participants, which acknowledges their role as helpful contributors to the investigation.

sums of squares The sum of the squared deviations of scores from their mean value.

T

test statistic The calculated value of the statistical test that has been undertaken.

two-tailed test A prediction that two samples come from different populations, but not stating which population has the higher mean value.

Type I error The error of rejecting the null hypothesis when it is true. The risk of this occurring is set by the significance level.

Type II error The error of not rejecting the null hypothesis when it is false.

U

univariate A term used to refer to a statistical test where there is only one dependent variable. ANOVA is a univariate analysis as there can be more than one independent variable but only one dependent variable.

V

value labels Assigning value labels within the Variable View screen in SPSS ensures that the output is labelled appropriately when grouping variables are used – for example, 1 = males, 2 = females.

Variable View The screen within the SPSS Data Editor where the characteristics of variables are assigned.

variance A measure of how much a set of scores vary from their mean value. Variance is the square of the standard deviation.

W

within subjects Also known as repeated measures. We select the same subjects (participants) for each condition of an independent variable for a within subjects design.

X

none

Y

none

Z

z score See standard score