استكشاف الإحصاء باستخدام برنامج SPSS (مدخل للطرق الإحصائية)، كتاب أجنبي مميز في التحليل الإحصائي

Published on: **Mar 4, 2016**

Published in:
Education

Source: www.slideshare.net

- 1. A N D Y F I E L D DISCOVERING STATISTICS USING Spss T H I R D E D I T I O N (and sex and drugs and rock ’n’ roll)
- 2. © Andy Field 2009 First edition published 2000 Second edition published 2005 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1 Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 Library of Congress Control Number: 2008930166 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN 978-1-84787-906-6 ISBN 978-1-84787-907-3 Typeset by C&M Digitals (P) Ltd, Chennai, India Printed by Oriental Press, Dubai Printed on paper from sustainable resources
- 3. CONTENTS Preface xix How to use this book xxiv Acknowledgements xxviii Dedication xxx Symbols used in this book xxxi Some maths revision xxxiii 1 Why is my evil lecturer forcing me to learn statistics? 1 1.1. What will this chapter tell me? 1 1 1.2. What the hell am I doing here? I don’t belong here 1 2 1.2.1. The research process 1 3 1.3. Initial observation: finding something that needs explaining 1 3 1.4. Generating theories and testing them 1 4 1.5. Data collection 1: what to measure 1 7 1.5.1. Variables 1 7 1.5.2. Measurement error 1 10 1.5.3. Validity and reliability 1 11 1.6. Data collection 2: how to measure 1 12 1.6.1. Correlational research methods 1 12 1.6.2. Experimental research methods 1 13 1.6.3. Randomization 1 17 1.7. Analysing data 1 18 1.7.1. Frequency distributions 1 18 1.7.2. The centre of a distribution 1 20 1.7.3. The dispersion in a distribution 1 23 1.7.4. Using a frequency distribution to go beyond the data 1 24 1.7.5. Fitting statistical models to the data 1 26 What have I discovered about statistics? 1 28 Key terms that I’ve discovered 28 Smart Alex’s stats quiz 29 Further reading 29 Interesting real research 30
- 4. vi DISCOVERING STATISTICS USING SPSS 2 Everything you ever wanted to know about statistics (well, sort of) 31 2.1. What will this chapter tell me? 1 31 2.2. Building statistical models 1 32 2.3. Populations and samples 1 34 2.4. Simple statistical models 1 35 2.4.1. The mean: a very simple statistical model 1 35 2.4.2. Assessing the fit of the mean: sums of squares, variance and standard deviations 1 35 2.4.3. Expressing the mean as a model 2 38 2.5. Going beyond the data 1 40 2.5.1. The standard error 1 40 2.5.2. Confidence intervals 2 43 2.6. Using statistical models to test research questions 1 48 2.6.1. Test statistics 1 52 2.6.2. One- and two-tailed tests 1 54 2.6.3. Type I and Type II errors 1 55 2.6.4. Effect sizes 2 56 2.6.5. Statistical power 2 58 What have I discovered about statistics? 1 59 Key terms that I’ve discovered 59 Smart Alex’s stats quiz 59 Further reading 60 Interesting real research 60 3 The spss environment 61 3.1. What will this chapter tell me? 1 61 3.2. Versions of spss 1 62 3.3. Getting started 1 62 3.4. The data editor 1 63 3.4.1. Entering data into the data editor 1 69 3.4.2. The ‘Variable View’ 1 70 3.4.3. Missing values 1 77 3.5. The spss viewer 1 78 3.6. The spss SmartViewer 1 81 3.7. The syntax window 3 82 3.8. Saving files 1 83 3.9. Retrieving a file 1 84 What have I discovered about statistics? 1 85 Key terms that I’ve discovered 85 Smart Alex’s tasks 85 Further reading 86 Online tutorials 86 4 Exploring data with graphs 87 4.1. What will this chapter tell me? 1 87 4.2. The art of presenting data 1 88 4.2.1. What makes a good graph? 1 88 4.2.2. Lies, damned lies, and … erm … graphs 1 90
- 5. viiContents 4.3. The spss Chart Builder 1 91 4.4. Histograms: a good way to spot obvious problems 1 93 4.5. Boxplots (box–whisker diagrams) 1 99 4.6. Graphing means: bar charts and error bars 1 103 4.6.1. Simple bar charts for independent means 1 105 4.6.2. Clustered bar charts for independent means 1 107 4.6.3. Simple bar charts for related means 1 109 4.6.4. Clustered bar charts for related means 1 111 4.6.5. Clustered bar charts for ‘mixed’ designs 1 113 4.7. Line charts 1 115 4.8. Graphing relationships: the scatterplot 1 116 4.8.1. Simple scatterplot 1 117 4.8.2. Grouped scatterplot 1 119 4.8.3. Simple and grouped 3-D scatterplots 1 121 4.8.4. Matrix scatterplot 1 123 4.8.5. Simple dot plot or density plot 1 125 4.8.6. Drop-line graph 1 126 4.9. Editing graphs 1 126 What have I discovered about statistics? 1 129 Key terms that I’ve discovered 130 Smart Alex’s tasks 130 Further reading 130 Online tutorial 130 Interesting real research 130 5 Exploring assumptions 131 5.1. What will this chapter tell me? 1 131 5.2. What are assumptions? 1 132 5.3. Assumptions of parametric data 1 132 5.4. The assumption of normality 1 133 5.4.1. Oh no, it’s that pesky frequency distribution again: checking normality visually 1 134 5.4.2. Quantifying normality with numbers 1 136 5.4.3. Exploring groups of data 1 140 5.5. Testing whether a distribution is normal 1 144 5.5.1. Doing the Kolmogorov–Smirnov test on spss 1 145 5.5.2. Output from the explore procedure 1 146 5.5.3. Reporting the K–S test 1 148 5.6. Testing for homogeneity of variance 1 149 5.6.1. Levene’s test 1 150 5.6.2. Reporting Levene’s test 1 152 5.7. Correcting problems in the data 2 153 5.7.1. Dealing with outliers 2 153 5.7.2. Dealing with non-normality and unequal variances 2 153 5.7.3. Transforming the data using spss 2 156 5.7.4. When it all goes horribly wrong 3 162 What have I discovered about statistics? 1 164 Key terms that I’ve discovered 164 Smart Alex’s tasks 165 Online tutorial 165 Further reading 165
- 6. viii DISCOVERING STATISTICS USING SPSS 6 Correlation 166 6.1. What will this chapter tell me? 1 166 6.2. Looking at relationships 1 167 6.3. How do we measure relationships? 1 167 6.3.1. A detour into the murky world of covariance 1 167 6.3.2. Standardization and the correlation coefficient 1 169 6.3.3. The significance of the correlation coefficient 3 171 6.3.4. Confidence intervals for r 3 172 6.3.5. A word of warning about interpretation: causality 1 173 6.4. Data entry for correlation analysis using spss 1 174 6.5. Bivariate correlation 1 175 6.5.1. General procedure for running correlations on spss 1 175 6.5.2. Pearson’s correlation coefficient 1 177 6.5.3. Spearman’s correlation coefficient 1 179 6.5.4. Kendall’s tau (non-parametric) 1 181 6.5.5. Biserial and point–biserial correlations 3 182 6.6. Partial correlation 2 186 6.6.1. The theory behind part and partial correlation 2 186 6.6.2. Partial correlation using spss 2 188 6.6.3. Semi-partial (or part) correlations 2 190 6.7. Comparing correlations 3 191 6.7.1. Comparing independent rs 3 191 6.7.2. Comparing dependent rs 3 191 6.8. Calculating the effect size 1 192 6.9. How to report correlation coefficents 1 193 What have I discovered about statistics? 1 195 Key terms that I’ve discovered 195 Smart Alex’s tasks 195 Further reading 196 Online tutorial 196 Interesting real research 196 7 Regression 197 7.1. What will this chapter tell me? 1 197 7.2. An introduction to regression 1 198 7.2.1. Some important information about straight lines 1 199 7.2.2. The method of least squares 1 200 7.2.3. Assessing the goodness of fit: sums of squares, R and R2 1 201 7.2.4. Assessing individual predictors 1 204 7.3. Doing simple regression on spss 1 205 7.4. Interpreting a simple regression 1 206 7.4.1. Overall fit of the model 1 206 7.4.2. Model parameters 1 207 7.4.3. Using the model 1 208 7.5. Multiple regression: the basics 2 209 7.5.1. An example of a multiple regression model 2 210 7.5.2. Sums of squares, R and R2 2 211 7.5.3. Methods of regression 2 212 7.6. How accurate is my regression model? 2 214
- 7. ixContents 7.6.1. Assessing the regression model I: diagnostics 2 214 7.6.2. Assessing the regression model II: generalization 2 220 7.7. How to do multiple regression using spss 2 225 7.7.1. Some things to think about before the analysis 2 225 7.7.2. Main options 2 225 7.7.3. Statistics 2 227 7.7.4. Regression plots 2 229 7.7.5. Saving regression diagnostics 2 230 7.7.6. Further options 2 231 7.8. Interpreting multiple regression 2 233 7.8.1. Descriptives 2 233 7.8.2. Summary of model 2 234 7.8.3. Model parameters 2 237 7.8.4. Excluded variables 2 241 7.8.5. Assessing the assumption of no multicollinearity 2 241 7.8.6. Casewise diagnostics 2 244 7.8.7. Checking assumptions 2 247 7.9. What if I violate an assumption? 2 251 7.10. How to report multiple regression 2 252 7.11. Categorical predictors and multiple regression 3 253 7.11.1. Dummy coding 3 253 7.11.2. Spss output for dummy variables 3 256 What have I discovered about statistics? 1 261 Key terms that I’ve discovered 261 Smart Alex’s tasks 262 Further reading 263 Online tutorial 263 Interesting real research 263 8 Logistic regression 264 8.1. What will this chapter tell me? 1 264 8.2. Background to logistic regression 1 265 8.3. What are the principles behind logistic regression? 3 265 8.3.1. Assessing the model: the log-likelihood statistic 3 267 8.3.2. Assessing the model: R and R2 3 268 8.3.3. Assessing the contribution of predictors: the Wald statistic 2 269 8.3.4. The odds ratio: Exp(B) 3 270 8.3.5. Methods of logistic regression 2 271 8.4. Assumptions and things that can go wrong 4 273 8.4.1. Assumptions 2 273 8.4.2. Incomplete information from the predictors 4 273 8.4.3. Complete separation 4 274 8.4.4. Overdispersion 4 276 8.5. Binary logistic regression: an example that will make you feel eel 2 277 8.5.1. The main analysis 2 278 8.5.2. Method of regression 2 279 8.5.3. Categorical predictors 2 279 8.5.4. Obtaining residuals 2 280 8.5.5. Further options 2 281 8.6. Interpreting logistic regression 2 282
- 8. x DISCOVERING STATISTICS USING SPSS 8.6.1. The initial model 2 282 8.6.2. Step 1: intervention 3 284 8.6.3. Listing predicted probabilities 2 291 8.6.4. Interpreting residuals 2 292 8.6.5. Calculating the effect size 2 294 8.7. How to report logistic regression 2 294 8.8. Testing assumptions: another example 2 294 8.8.1. Testing for linearity of the logit 3 296 8.8.2. Testing for multicollinearity 3 297 8.9. Predicting several categories: multinomial logistic regression 3 300 8.9.1. Running multinomial logistic regression in spss 3 301 8.9.2. Statistics 3 304 8.9.3. Other options 3 305 8.9.4. Interpreting the multinomial logistic regression output 3 306 8.9.5. Reporting the results 312 What have I discovered about statistics? 1 313 Key terms that I’ve discovered 313 Smart Alex’s tasks 313 Further reading 315 Online tutorial 315 Interesting real research 315 9 Comparing two means 316 9.1. What will this chapter tell me? 1 316 9.2. Looking at differences 1 317 9.2.1. A problem with error bar graphs of repeated-measures designs 1 317 9.2.2. Step 1: calculate the mean for each participant 2 320 9.2.3. Step 2: calculate the grand mean 2 320 9.2.4. Step 3: calculate the adjustment factor 2 322 9.2.5. Step 4: create adjusted values for each variable 2 323 9.3. The t-test 1 324 9.3.1. Rationale for the t-test 1 325 9.3.2. Assumptions of the t-test 1 326 9.4. The dependent t-test 1 326 9.4.1. Sampling distributions and the standard error 1 327 9.4.2. The dependent t-test equation explained 1 327 9.4.3. The dependent t-test and the assumption of normality 1 329 9.4.4. Dependent t-tests using spss 1 329 9.4.5. Output from the dependent t-test 1 330 9.4.6. Calculating the effect size 2 332 9.4.7. Reporting the dependent t-test 1 333 9.5. The independent t-test 1 334 9.5.1. The independent t-test equation explained 1 334 9.5.2. The independent t-test using spss 1 337 9.5.3. Output from the independent t-test 1 339 9.5.4. Calculating the effect size 2 341 9.5.5. Reporting the independent t-test 1 341 9.6. Between groups or repeated measures? 1 342 9.7. The t-test as a general linear model 2 342 9.8. What if my data are not normally distributed? 2 344
- 9. xiContents What have I discovered about statistics? 1 345 Key terms that I’ve discovered 345 Smart Alex’s task 346 Further reading 346 Online tutorial 346 Interesting real research 346 10 Comparing several means: anova (glm 1) 347 10.1. What will this chapter tell me? 1 347 10.2. The theory behind anova 2 348 10.2.1. Inflated error rates 2 348 10.2.2. Interpreting f 2 349 10.2.3. Anova as regression 2 349 10.2.4. Logic of the f-ratio 2 354 10.2.5. Total sum of squares (sst ) 2 356 10.2.6. Model sum of squares (ssm ) 2 356 10.2.7. Residual sum of squares (ssr ) 2 357 10.2.8. Mean squares 2 358 10.2.9. The F-ratio 2 358 10.2.10. Assumptions of anova 3 359 10.2.11. Planned contrasts 2 360 10.2.12. Post hoc procedures 2 372 10.3. Running one-way anova on spss 2 375 10.3.1. Planned comparisons using spss 2 376 10.3.2. Post hoc tests in spss 2 378 10.3.3. Options 2 379 10.4. Output from one-way anova 2 381 10.4.1. Output for the main analysis 2 381 10.4.2. Output for planned comparisons 2 384 10.4.3. Output for post hoc tests 2 385 10.5. Calculating the effect size 2 389 10.6. Reporting results from one-way independent anova 2 390 10.7. Violations of assumptions in one-way independent anova 2 391 What have I discovered about statistics? 1 392 Key terms that I’ve discovered 392 Smart Alex’s tasks 393 Further reading 394 Online tutorials 394 Interesting real research 394 11 Analysis of covariance, ancova (glm 2) 395 11.1. What will this chapter tell me? 2 395 11.2. What is ancova? 2 396 11.3. Assumptions and issues in ancova 3 397 11.3.1. Independence of the covariate and treatment effect 3 397 11.3.2. Homogeneity of regression slopes 3 399 11.4. Conducting ancova on spss 2 399 11.4.1. Inputting data 1 399 11.4.2. Initial considerations: testing the independence of the independent variable and covariate 2 400
- 10. xii DISCOVERING STATISTICS USING SPSS 11.4.3. The main analysis 2 401 11.4.4. Contrasts and other options 2 401 11.5. Interpreting the output from ancova 2 404 11.5.1. What happens when the covariate is excluded? 2 404 11.5.2. The main analysis 2 405 11.5.3. Contrasts 2 407 11.5.4. Interpreting the covariate 2 408 11.6. Ancova run as a multiple regression 2 408 11.7. Testing the assumption of homogeneity of regression slopes 3 413 11.8. Calculating the effect size 2 415 11.9. Reporting results 2 417 11.10. What to do when assumptions are violated in ancova 3 418 What have I discovered about statistics? 2 418 Key terms that I’ve discovered 419 Smart Alex’s tasks 419 Further reading 420 Online tutorials 420 Interesting real research 420 12 Factorial anova (glm 3) 421 12.1. What will this chapter tell me? 2 421 12.2. Theory of factorial anova (between-groups) 2 422 12.2.1. Factorial designs 2 422 12.2.2. An example with two independent variables 2 423 12.2.3. Total sums of squares (sst ) 2 424 12.2.4. The model sum of squares (ssm ) 2 426 12.2.5. The residual sum of squares (ssr ) 2 428 12.2.6. The F-ratios 2 429 12.3. Factorial anova using spss 2 430 12.3.1. Entering the data and accessing the main dialog box 2 430 12.3.2. Graphing interactions 2 432 12.3.3. Contrasts 2 432 12.3.4. Post hoc tests 2 434 12.3.5. Options 2 434 12.4. Output from factorial anova 2 435 12.4.1. Output for the preliminary analysis 2 435 12.4.2. Levene’s test 2 436 12.4.3. The main anova table 2 436 12.4.4. Contrasts 2 439 12.4.5. Simple effects analysis 3 440 12.4.6. Post hoc analysis 2 441 12.5. Interpreting interaction graphs 2 443 12.6. Calculating effect sizes 3 446 12.7. Reporting the results of two-way anova 2 448 12.8. Factorial anova as regression 3 450 12.9. What to do when assumptions are violated in factorial anova 3 454 What have I discovered about statistics? 2 454 Key terms that I’ve discovered 455 Smart Alex’s tasks 455
- 11. xiiiContents Further reading 456 Online tutorials 456 Interesting real research 456 13 Repeated-measures designs (glm 4) 457 13.1. What will this chapter tell me? 2 457 13.2. Introduction to repeated-measures designs 2 458 13.2.1. The assumption of sphericity 2 459 13.2.2. How is sphericity measured? 2 459 13.2.3. Assessing the severity of departures from sphericity 2 460 13.2.4. What is the effect of violating the assumption of sphericity? 3 460 13.2.5. What do you do if you violate sphericity? 2 461 13.3. Theory of one-way repeated-measures anova 2 462 13.3.1. The total sum of squares (sst ) 2 464 13.3.2. The within-participant (ssw ) 2 465 13.3.3. The model sum of squares (ssm ) 2 466 13.3.4. The residual sum of squares (ssr ) 2 467 13.3.5. The mean squares 2 467 13.3.6. The F-ratio 2 467 13.3.7. The between-participant sum of squares 2 468 13.4. One-way repeated-measures anova using spss 2 468 13.4.1. The main analysis 2 468 13.4.2. Defining contrasts for repeated-measures 2 471 13.4.3. Post hoc tests and additional options 3 471 13.5. Output for one-way repeated-measures anova 2 474 13.5.1. Descriptives and other diagnostics 1 474 13.5.2. Assessing and correcting for sphericity: Mauchly’s test 2 474 13.5.3. The main anova 2 475 13.5.4. Contrasts 2 477 13.5.5. Post hoc tests 2 478 13.6. Effect sizes for repeated-measures anova 3 479 13.7. Reporting one-way repeated-measures anova 2 481 13.8. Repeated-measures with several independent variables 2 482 13.8.1. The main analysis 2 484 13.8.2. Contrasts 2 488 13.8.3. Simple effects analysis 3 488 13.8.4. Graphing interactions 2 490 13.8.5. Other options 2 491 13.9. Output for factorial repeated-measures anova 2 492 13.9.1. Descriptives and main analysis 2 492 13.9.2. The effect of drink 2 493 13.9.3. The effect of imagery 2 495 13.9.4. The interaction effect (drink × imagery) 2 496 13.9.5. Contrasts for repeated-measures variables 2 498 13.10. Effect sizes for factorial repeated-measures anova 3 501 13.11. Reporting the results from factorial repeated-measures anova 2 502 13.12. What to do when assumptions are violated in repeated-measures anova 3 503 What have I discovered about statistics? 2 503 Key terms that I’ve discovered 504
- 12. xiv DISCOVERING STATISTICS USING SPSS Smart Alex’s tasks 504 Further reading 505 Online tutorials 505 Interesting real research 505 14 Mixed design anova (glm 5) 506 14.1. What will this chapter tell me? 1 506 14.2. Mixed designs 2 507 14.3. What do men and women look for in a partner? 2 508 14.4. Mixed anova on spss 2 508 14.4.1. The main analysis 2 508 14.4.2. Other options 2 513 14.5. Output for mixed factorial anova: main analysis 3 514 14.5.1. The main effect of gender 2 517 14.5.2. The main effect of looks 2 518 14.5.3. The main effect of charisma 2 520 14.5.4. The interaction between gender and looks 2 521 14.5.5. The interaction between gender and charisma 2 523 14.5.6. The interaction between attractiveness and charisma 2 524 14.5.7. The interaction between looks, charisma and gender 3 527 14.5.8. Conclusions 3 530 14.6. Calculating effect sizes 3 531 14.7. Reporting the results of mixed anova 2 533 14.8. What to do when assumptions are violated in mixed ANOVA 3 536 What have I discovered about statistics? 2 536 Key terms that I’ve discovered 537 Smart Alex’s tasks 537 Further reading 538 Online tutorials 538 Interesting real research 538 15 Non-parametric tests 539 15.1. What will this chapter tell me? 1 539 15.2. When to use non-parametric tests 1 540 15.3. Comparing two independent conditions: the Wilcoxon rank-sum test and Mann–Whitney test 1 540 15.3.1. Theory 2 542 15.3.2. Inputting data and provisional analysis 1 545 15.3.3. Running the analysis 1 546 15.3.4. Output from the Mann–Whitney test 1 548 15.3.5. Calculating an effect size 2 550 15.3.6. Writing the results 1 550 15.4. Comparing two related conditions: the Wilcoxon signed-rank test 1 552 15.4.1. Theory of the Wilcoxon signed-rank test 2 552 15.4.2. Running the analysis 1 554 15.4.3. Output for the ecstasy group 1 556 15.4.4. Output for the alcohol group 1 557 15.4.5. Calculating an effect size 2 558 15.4.6. Writing the results 1 558
- 13. xvContents 15.5. Differences between several independent groups: the Kruskal–Wallis test 1 559 15.5.1. Theory of the Kruskal–Wallis test 2 560 15.5.2. Inputting data and provisional analysis 1 562 15.5.3. Doing the Kruskal–Wallis test on spss 1 562 15.5.4. Output from the Kruskal–Wallis test 1 564 15.5.5. Post hoc tests for the Kruskal–Wallis test 2 565 15.5.6. Testing for trends: the Jonckheere–Terpstra test 2 568 15.5.7. Calculating an effect size 2 570 15.5.8. Writing and interpreting the results 1 571 15.6. Differences between several related groups: Friedman’s anova 1 573 15.6.1. Theory of Friedman’s anova 2 573 15.6.2. Inputting data and provisional analysis 1 575 15.6.3. Doing Friedman’s anova on spss 1 575 15.6.4. Output from Friedman’s anova 1 576 15.6.5. Post hoc tests for Friedman’s anova 2 577 15.6.6. Calculating an effect size 2 579 15.6.7. Writing and interpreting the results 1 580 What have I discovered about statistics? 1 581 Key terms that I’ve discovered 582 Smart Alex’s tasks 582 Further reading 583 Online tutorial 583 Interesting real research 583 16 Multivariate analysis of variance (manova) 584 16.1. What will this chapter tell me? 2 584 16.2. When to use manova 2 585 16.3. Introduction: similarities and differences to anova 2 585 16.3.1. Words of warning 2 587 16.3.2. The example for this chapter 2 587 16.4. Theory of manova 3 588 16.4.1. Introduction to matrices 3 588 16.4.2. Some important matrices and their functions 3 590 16.4.3. Calculating manova by hand: a worked example 3 591 16.4.4. Principle of the manova test statistic 4 598 16.5. Practical issues when conducting manova 3 603 16.5.1. Assumptions and how to check them 3 603 16.5.2. Choosing a test statistic 3 604 16.5.3. Follow-up analysis 3 605 16.6. Manova on spss 2 605 16.6.1. The main analysis 2 606 16.6.2. Multiple comparisons in Manova 2 607 16.6.3. Additional options 3 607 16.7. Output from manova 3 608 16.7.1. Preliminary analysis and testing assumptions 3 608 16.7.2. Manova test statistics 3 608 16.7.3. Univariate test statistics 2 609 16.7.4. Sscp Matrices 3 611 16.7.5. Contrasts 3 613
- 14. xvi DISCOVERING STATISTICS USING SPSS 16.8. Reporting results from manova 2 614 16.9. Following up manova with discriminant analysis 3 615 16.10. Output from the discriminant analysis 4 618 16.11. Reporting results from discriminant analysis 2 621 16.12. Some final remarks 4 622 16.12.1. The final interpretation 4 622 16.12.2. Univariate anova or discriminant analysis? 624 16.13. What to do when assumptions are violated in Manova 3 624 What have I discovered about statistics? 2 624 Key terms that I’ve discovered 625 Smart Alex’s tasks 625 Further reading 626 Online tutorials 626 Interesting real research 626 17 Exploratory factor analysis 627 17.1. What will this chapter tell me? 1 627 17.2. When to use factor analysis 2 628 17.3. Factors 2 628 17.3.1. Graphical representation of factors 2 630 17.3.2. Mathematical representation of factors 2 631 17.3.3. Factor scores 2 633 17.4. Discovering factors 2 636 17.4.1. Choosing a method 2 636 17.4.2. Communality 2 637 17.4.3. Factor analysis vs. principal component analysis 2 638 17.4.4. Theory behind principal component analysis 3 638 17.4.5. Factor extraction: eigenvalues and the scree plot 2 639 17.4.6. Improving interpretation: factor rotation 3 642 17.5. Research example 2 645 17.5.1. Before you begin 2 645 17.6. Running the analysis 2 650 17.6.1. Factor extraction on spss 2 651 17.6.2. Rotation 2 653 17.6.3. Scores 2 654 17.6.4. Options 2 654 17.7. Interpreting output from spss 2 655 17.7.1. Preliminary analysis 2 656 17.7.2. Factor extraction 2 660 17.7.3. Factor rotation 2 664 17.7.4. Factor scores 2 669 17.7.5. Summary 2 671 17.8. How to report factor analysis 1 671 17.9. Reliability analysis 2 673 17.9.1. Measures of reliability 3 673 17.9.2. Interpreting Cronbach’s α (some cautionary tales …) 2 675 17.9.3. Reliability analysis on spss 2 676 17.9.4. Interpreting the output 2 678 17.10. How to report reliability analysis 2 681
- 15. xviiContents What have I discovered about statistics? 2 682 Key terms that I’ve discovered 682 Smart Alex’s tasks 683 Further reading 685 Online tutorial 685 Interesting real research 685 18 Categorical data 686 18.1. What will this chapter tell me? 1 686 18.2. Analysing categorical data 1 687 18.3. Theory of analysing categorical data 1 687 18.3.1. Pearson’s chi-square test 1 688 18.3.2. Fisher’s exact test 1 690 18.3.3. The likelihood ratio 2 690 18.3.4. Yates’ correction 2 691 18.4. Assumptions of the chi-square test 1 691 18.5. Doing chi-square on spss 1 692 18.5.1. Entering data: raw scores 1 692 18.5.2. Entering data: weight cases 1 692 18.5.3. Running the analysis 1 694 18.5.4. Output for the chi-square test 1 696 18.5.5. Breaking down a significant chi-square test with standardized residuals 2 698 18.5.6. Calculating an effect size 2 699 18.5.7. Reporting the results of chi-square 1 700 18.6. Several categorical variables: loglinear analysis 3 702 18.6.1. Chi-square as regression 4 702 18.6.2. Loglinear analysis 3 708 18.7. Assumptions in loglinear analysis 2 710 18.8. Loglinear analysis using spss 2 711 18.8.1. Initial considerations 2 711 18.8.2. The loglinear analysis 2 712 18.9. Output from loglinear analysis 3 714 18.10. Following up loglinear analysis 2 719 18.11. Effect sizes in loglinear analysis 2 720 18.12. Reporting the results of loglinear analysis 2 721 What have I discovered about statistics? 1 722 Key terms that I’ve discovered 722 Smart Alex’s tasks 722 Further reading 724 Online tutorial 724 Interesting real research 724 19 Multilevel linear models 725 19.1. What will this chapter tell me? 1 725 19.2. Hierarchical data 2 726 19.2.1. The intraclass correlation 2 728 19.2.2. Benefits of multilevel models 2 729 19.3. Theory of multilevel linear models 3 730
- 16. xviii DISCOVERING STATISTICS USING SPSS 19.3.1. An example 2 730 19.3.2. Fixed and random coefficients 3 732 19.4. The multilevel model 4 734 19.4.1. Assessing the fit and comparing multilevel models 4 737 19.4.2. Types of covariance structures 4 737 19.5. Some practical issues 3 739 19.5.1. Assumptions 3 739 19.5.2. Sample size and power 3 740 19.5.3. Centring variables 4 740 19.6. Multilevel modelling on spss 4 741 19.6.1. Entering the data 2 742 19.6.2. Ignoring the data structure: anova 2 742 19.6.3. Ignoring the data structure: ancova 2 746 19.6.4. Factoring in the data structure: random intercepts 3 749 19.6.5. Factoring in the data structure: random intercepts and slopes 4 752 19.6.6. Adding an interaction to the model 4 756 19.7. Growth models 4 761 19.7.1. Growth curves (polynomials) 4 761 19.7.2. An example: the honeymoon period 2 761 19.7.3. Restructuring the data 3 763 19.7.4. Running a growth model on spss 4 767 19.7.5. Further analysis 4 774 19.8. How to report a multilevel model 3 775 What have I discovered about statistics? 2 776 Key terms that I’ve discovered 777 Smart Alex’s tasks 777 Further reading 778 Online tutorial 778 Interesting real research 778 Epilogue 779 Glossary 781 Appendix 797 A.1. Table of the standard normal distribution 797 A.2. Critical values of the t-distribution 803 A.3. Critical values of the F-distribution 804 A.4. Critical values of the chi-square distribution 808 References 809 Index 816
- 17. xix Karma Police, arrest this man, he talks in maths, he buzzes like a fridge, he’s like a detuned radio. Radiohead (1997) Introduction Social science students despise statistics. For one thing, most have a non-mathematical back- ground, which makes understanding complex statistical equations very difficult. The major advantage in being taught statistics in the early 1990s (as I was) compared to the 1960s was the development of computer software to do all of the hard work. The advantage of learning statistics now rather than 15 years ago is that Windows™/MacOS™ enable us to just click on stuff rather than typing in horribly confusing commands (although, as you will see, we can still type in horribly confusing commands if we want to). One of the most commonly used of these packages is Spss; what on earth possessed me to write a book on it? You know that you’re a geek when you have favourite statistics textbooks; my favourites are Howell (2006), Stevens (2002) and Tabachnick and Fidell (2007). These three books are peer- less as far as I am concerned and have taught me (and continue to teach me) more about statistics than you could possibly imagine. (I have an ambition to be cited in one of these books but I don’t think that will ever happen.) So, why would I try to compete with these sacred tomes? Well, I wouldn’t and I couldn’t (intellectually these people are several leagues above me). However, these wonderful and clear books use computer examples as addenda to the theory. The advent of programs like Spss provides the unique opportunity to teach statistics at a conceptual level with- out getting too bogged down in equations. However, many Spss books concentrate on ‘doing the test’ at the expense of theory. Using Spss without any statistical knowledge at all can be a dangerous thing (unfortunately, at the moment Spss is a rather stupid tool, and it relies heavily on the users knowing what they are doing). As such, this book is an attempt to strike a good bal- ance between theory and practice: I want to use Spss as a tool for teaching statistical concepts in the hope that you will gain a better understanding of both theory and practice. Primarily, I want to answer the kinds of questions that I found myself asking while learning statistics and using Spss as an undergraduate (things like ‘How can I understand how this statis- tical test works without knowing too much about the maths behind it?’, ‘What does that button do?’, ‘What the hell does this output mean?’). Like most academics I’m slightly high on the autis- tic spectrum, and I used to get fed up with people telling me to ‘ignore’ options or ‘ignore that bit of the output’. I would lie awake for hours in my bed every night wondering ‘Why is that bit of Spss output there if we just ignore it?’ So that no student has to suffer the mental anguish that I did, I aim to explain what different options do, what bits of the output mean, and if we ignore something, why we ignore it. Furthermore, I want to be non-prescriptive. Too many books tell the reader what to do (‘click on this button’, ‘do this’, ‘do that’, etc.) and this can create the impression that statistics and Spss are inflexible. Spss has many options designed to allow you to tailor a given test to your particular needs. Therefore, although I make recommendations, PREFACE
- 18. xx DISCOVERING STATISTICS USING SPSS within the limits imposed by the senseless destruction of rainforests, I hope to give you enough background in theory to enable you to make your own decisions about which options are appro- priate for the analysis you want to do. A second, not in any way ridiculously ambitious, aim was to make this the only statistics textbook that anyone ever needs to buy. As such, it’s a book that I hope will become your friend from first year right through to your professorship. I’ve tried, therefore, to write a book that can be read at several levels (see the next section for more guidance). There are chapters for first-year undergraduates (1, 2, 3, 4, 5, 6, 9 and 15), chapters for second-year undergraduates (5, 7, 10, 11, 12, 13 and 14) and chapters on more advanced topics that postgraduates might use (8, 16, 17, 18 and 19). All of these chapters should be accessible to everyone, and I hope to achieve this by flagging the level of each section (see the next section). My third, final and most important aim is make the learning process fun. I have a sticky history with maths because I used to be terrible at it: Above is an extract of my school report at the age of 11. The ‘27’ in the report is to say that I came equal 27th with another student out of a class of 29. That’s almost bottom of the class. The 43 is my exam mark as a percentage! Oh dear. Four years later (at 15) this was my school report: What led to this remarkable change? It was having a good teacher: my brother, Paul. In fact I owe my life as an academic to Paul’s ability to do what my maths teachers couldn’t: teach me stuff in an engaging way. To this day he still pops up in times of need to teach me things (a crash course in computer programming some Christmases ago springs to mind). Anyway, the reason he’s a great teacher is because he’s able to make things interesting and relevant to me. Sadly he seems to have got the ‘good teaching’ genes in the family (and he doesn’t even work as a bloody teacher, so they’re wasted!), but his approach inspires my lectures and books. One thing that I have learnt is that people appreciate the human touch, and so in previous editions I tried to inject a lot of my own personality and sense of humour (or lack of …). Many of the examples in this book, although inspired by some of the craziness that you find in the real world, are designed to reflect topics that play on the minds of the average student (i.e. sex, drugs, rock and roll, celebrity, people doing crazy stuff). There are also some examples that are there just because they made me laugh. So, the examples are light-hearted (some have said ‘smutty’ but I prefer ‘light-hearted’) and by the end, for better or worse, I think you will have some idea of what goes on in my head on a daily basis!
- 19. xxiPREFACE What’s new? Seeing as some people appreciated the style of the previous editions I’ve taken this as a green light to include even more stupid examples, more smut and more bad taste. I apolo- gise to those who think it’s crass, hate it, or think that I’m undermining the seriousness of science, but, come on, what’s not funny about a man putting an eel up his anus? Aside from adding more smut, I was forced reluctantly to expand the academic content! Most of the expansions have resulted from someone (often several people) emailing me to ask how to do something. So, in theory, this edition should answer any question anyone has asked me over the past four years! Mind you, I said that last time and still the questions come (will I never be free?). The general changes in the book are: More introductory materialMM : The first chapter in the last edition was like sticking your brain into a food blender. I rushed chaotically through the entire theory of statistics in a single chapter at the pace of a cheetah on speed. I didn’t really bother explaining any basic research methods, except when, out of the blue, I’d stick a section in some random chapter, alone and looking for friends. This time, I have written a brand-new Chapter 1, which eases you gently through the research process – why and how we do it. I also bring in some basic descriptive statistics at this point too. More graphsMM : Graphs are very important. In the previous edition information about plotting graphs was scattered about in different chapters making it hard to find. What on earth was I thinking? I’ve now written a self-contained chapter on how to use Spss’s Chart Builder. As such, everything you need to know about graphs (and I added a lot of material that wasn’t in the previous edition) is now in Chapter 4. More assumptionsMM : All chapters now have a section towards the end about what to do when assumptions are violated (although these usually tell you that Spss can’t do what needs to be done!). More data setsMM : You can never have too many examples, so I’ve added a lot of new data sets. There are 30 new data sets in the book at the last count (although I’m not very good at maths so it could be a few more or less). More stupid facesMM : I have added some more characters with stupid faces because I find stupid faces comforting, probably because I have one. You can find out more in the next section. Miraculously, the publishers stumped up some cash to get them designed by someone who can actually draw. More reporting your analysisMM : OK, I had these sections in the previous edition too, but then in some chapters I just seemed to forget about them for no good reason. This time every single chapter has one. More glossaryMM : Writing the glossary last time nearly made me stick a vacuum cleaner into my ear to suck out my own brain. I thought I probably ought to expand it a bit. You can find my brain in the bottom of the vacuum cleaner in my house. New! It’s colourMM : The publishers went full colour. This means that (1) I had to redo all of the diagrams to take advantage of the colour format, and (2) If you lick the orange bits they taste of orange (it amuses me that someone might try this to see whether I’m telling the truth). New! Real-world dataMM : Lots of people said that they wanted more ‘real data’ to play with. The trouble is that real research can be quite boring. However, just for you, I trawled the world for examples of research on really fascinating topics (in my opin- ion). I then stalked the authors of the research until they gave me their data. Every chapter now has a real research example. New! Self-test questionsMM : Everyone loves an exam, don’t they? Well, everyone that is apart from people who breathe. Given how much everyone hates tests, I thought the
- 20. xxii DISCOVERING STATISTICS USING SPSS best way to commit commercial suicide was to liberally scatter tests throughout each chapter. These range from simple questions to test out what you have just learned to going back to a technique that you read about several chapters before and applying it in a new context. All of these questions have answers to them on the companion website. They are there so that you can check on your progress. New!MM Spss tips: Spss does weird things sometimes. In each chapter, I’ve included boxes containing tips, hints and pitfalls related to Spss. New!MM Spss 17 compliant: Spss 17 looks different to earlier versions but in other respects is much the same. I updated the material to reflect the latest editions of Spss. New! Flash moviesMM : I’ve recorded some flash movies of using Spss to accompany each chapter. They’re on the companion website. They might help you if you get stuck. NewMM ! Additional material: Enough trees have died in the name of this book, but still it gets longer and still people want to know more. Therefore, I’ve written nearly 300 pages, yes, three hundred, of additional material for the book. So for some more technical topics and help with tasks in the book the material has been provided electronically so that (1) the planet suffers a little less, and (2) you can actually lift the book. New! Multilevel modellingMM : It’s all the rage these days so I thought I should write a chapter on it. I didn’t know anything about it, but I do now (sort of). New! Multinomial logistic regressionMM : It doesn’t get much more exciting than this; people wanted to know about logistic regression with several categorical outcomes and I always give people what they want (but only if they want smutty examples). All of the chapters now have Spss tips, self-test questions, additional material (Oliver Twisted boxes), real research examples (Labcoat Leni boxes), boxes on difficult topics (Jane Superbrain boxes) and flash movies. The specific changes in each chapter are: Chapter 1 (Research methods)MM : This is a completely new chapter. It basically talks about why and how to do research. Chapter 2 (Statistics)MM : I spent a lot of time rewriting this chapter but it was such a long time ago that I can’t really remember what I changed. Trust me, though; it’s much better than before. Chapter 3 (MM Spss): The old Chapter 2 is now Spss 17 compliant. I restructured a lot of the material, and added some sections on other forms of variables (strings and dates). Chapter 4MM (Graphs): This chapter is completely new. Chapter 5 (Assumptions)MM : This retains some of the material from the old Chapter 4, but I’ve expanded the content to include P–P and Q–Q plots, a lot of new content on homo- geneity of variance (including the variance ratio) and a new section on robust methods. Chapter 6 (Correlation)MM : The old Chapter 4; I redid one of the examples, added some material on confidence intervals for r, the biserial correlation, testing differences between dependent and independent rs and how certain eminent statisticians hate each other. Chapter 7 (Regression)MM : This chapter was already so long that the publishers banned me from extending it! Nevertheless I rewrote a few bits to make them clearer, but otherwise it’s the same but with nicer diagrams and the bells and whistles that have been added to every chapter. Chapter 8 (Logistic regression)MM : I changed the main example from one about theory of mind (which is now an end of chapter task) to one about putting eels up your anus to cure constipation (based on a true story). Does this help you understand logistic regression? Probably not, but it really kept me entertained for days. I’ve extended the
- 21. xxiiiPREFACE chapter to include multinomial logistic regression, which was a pain because I didn’t know how to do it. Chapter 9 (MM t-tests): I stripped a lot of the methods content to go in Chapter 1, so this chapter is more purely about the t-test now. I added some discussion on median splits, and doing t-tests from only the means and standard deviations. Chapter 10 (GLM 1)MM : Is basically the same as the old Chapter 8. Chapter 11 (GLM 2)MM : Similar to the old Chapter 9, but I added a section on assump- tions that now discusses the need for the covariate and treatment effect to be independent. I also added some discussion of eta-squared and partial eta-squared (Spss produces partial eta-squared but I ignored it completely in the last edition). Consequently I restructured much of the material in this example (and I had to create a new data set when I realized that the old one violated the assumption that I had just spent several pages telling people not to violate). Chapter 12 (GLM 3)MM : This chapter is ostensibly the same as the old Chapter 10, but with nicer diagrams. Chapter 13 (GLM 4)MM : This chapter is more or less the same as the old Chapter 11. I edited it down quite a bit and restructured material so there was less repetition. I added an explanation of the between-participant sum of squares also. The first example (tutors marking essays) is now an end of chapter task, and the new example is one about celebrities eating kangaroo testicles on television. It needed to be done. Chapter 14 (GLM 5)MM : This chapter is very similar to the old Chapter 12 on mixed Anova. Chapter 15 (Non-parametric statistics)MM : This chapter is more or less the same as the old Chapter 13. Chapter 16 (MMM Anova): I rewrote a lot of the material on the interpretation of discriminant function analysis because I thought it pretty awful. It’s better now. Chapter 17 (Factor analysis)MM : This chapter is very similar to the old Chapter 15. I wrote some material on interpretation of the determinant. I’m not sure why, but I did. Chapter 18 (Categorical data)MM : This is similar to Chapter 16 in the previous edition. I added some material on interpreting standardized residuals. Chapter 19 (Multilevel linear models)MM : This is a new chapter. Goodbye The first edition of this book was the result of two years (give or take a few weeks to write up my Ph.D.) of trying to write a statistics book that I would enjoy reading. The second edition was another two years of work and I was terrified that all of the changes would be the death of it. You’d think by now I’d have some faith in myself. Really, though, having spent an extremely intense six months in writing hell, I am still hugely anxious that I’ve just ruined the only useful thing that I’ve ever done with my life. I can hear the cries of lecturers around the world refusing to use the book because of cruelty to eels. This book has been part of my life now for over 10 years; it began and continues to be a labour of love. Despite this it isn’t perfect, and I still love to have feedback (good or bad) from the people who matter most: you. Andy (My contact details are at www.statisticshell.com.)
- 22. xxiv When the publishers asked me to write a section on ‘How to use this book’ it was obvi- ously tempting to write ‘Buy a large bottle of Olay anti-wrinkle cream (which you’ll need to fend off the effects of ageing while you read), find a comfy chair, sit down, fold back the front cover, begin reading and stop when you reach the back cover.’ However, I think they wanted something more useful. What background knowledge do I need? In essence, I assume you know nothing about statistics, but I do assume you have some very basic grasp of computers (I won’t be telling you how to switch them on, for example) and maths (although I have included a quick revision of some very basic concepts so I really don’t assume anything). Do the chapters get more difficult as I go through the book? In a sense they do (Chapter 16 on MAnova is more difficult than Chapter 1), but in other ways they don’t (Chapter 15 on non-parametric statistics is arguably less complex than Chapter 14, and Chapter 9 on the t-test is definitely less complex than Chapter 8 on logistic regression). Why have I done this? Well, I’ve ordered the chapters to make statistical sense (to me, at least). Many books teach different tests in isolation and never really give you a grip of the similarities between them; this, I think, creates an unnecessary mystery. Most of the tests in this book are the same thing expressed in slightly different ways. So, I wanted the book to tell this story. To do this I have to do certain things such as explain regression fairly early on because it’s the foundation on which nearly everything else is built! However, to help you through I’ve coded each section with an icon. These icons are designed to give you an idea of the difficulty of the section. It doesn’t necessarily mean you can skip the sections (but see Smart Alex in the next section), but it will let you know whether a section is at about your level, or whether it’s going to push you. I’ve based the icons on my own teaching so they may not be entirely accurate for everyone (especially as systems vary in different countries!): 1 This means ‘level 1’ and I equate this to first-year undergraduate in the UK. These are sections that everyone should be able to understand. 2 This is the next level and I equate this to second-year undergraduates in the UK. These are topics that I teach my second years and so anyone with a bit of background in sta- tistics should be able to get to grips with them. However, some of these sections will be quite challenging even for second years. These are intermediate sections. How To Use This Book
- 23. xxvHow To Use This Book 3 This is ‘level 3’ and represents difficult topics. I’d expect third-year (final-year) UK undergraduates and recent postgraduate students to be able to tackle these sections. 4 This is the highest level and represents very difficult topics. I would expect these sec- tions to be very challenging to undergraduates and recent postgraduates, but post- graduates with a reasonable background in research methods shouldn’t find them too much of a problem. Why do I keep seeing stupid faces everywhere? Brian Haemorrhage: Brian’s job is to pop up to ask questions and look permanently con- fused. It’s no surprise to note, therefore, that he doesn’t look entirely different from the author. As the book progresses he becomes increasingly despondent. Read into that what you will. Curious Cat: He also pops up and asks questions (because he’s curious). Actually the only reason he’s here is because I wanted a cat in the book … and preferably one that looks like mine. Of course the educational specialists think he needs a specific role, and so his role is to look cute and make bad cat-related jokes. Cramming Sam: Samantha hates statistics. In fact, she thinks it’s all a boring waste of time and she just wants to pass her exam and forget that she ever had to know anything about normal distributions. So, she appears and gives you a summary of the key points that you need to know. If, like Samantha, you’re cramming for an exam, she will tell you the essential information to save you having to trawl through hundreds of pages of my drivel. Jane Superbrain: Jane is the cleverest person in the whole universe (she makes Smart Alex look like a bit of an imbecile). The reason she is so clever is that she steals the brains of statisticians and eats them. Apparently they taste of sweaty tank tops, but nevertheless she likes them. As it happens, she is also able to absorb the contents of brains while she eats them. Having devoured some top statistics brains she knows all the really hard stuff and appears in boxes to tell you really advanced things that are a bit tangential to the main text. (Readers should note that Jane wasn’t interested in eating my brain. That tells you all that you need to know about my statistics ability.) Labcoat Leni: Leni is a budding young scientist and he’s fascinated by real research. He says, ‘Andy, man, I like an example about using an eel as a cure for constipation as much as the next man, but all of your examples are made up. Real data aren’t like that, we need some real exam- ples, dude!’ So off Leni went; he walked the globe, a lone data warrior in a thankless quest for real data. He turned up at universities, cornered academics, kidnapped their families and threatened to put them in a bath of crayfish unless he was given real data. The generous ones relented, but others? Well, let’s just say their families are sore. So, when you see Leni you know that you will get some real data, from a real research study to analyse. Keep it real. Oliver Twisted: With apologies to Charles Dickens, Oliver, like his more famous fictional London urchin, is always asking, ‘Please sir, can I have some more?’ Unlike Master Twist, though, our young Master Twisted always wants more statistics information. Of course he does, who wouldn’t? Let us not be the ones to disappoint a young, dirty, slightly smelly boy who dines on gruel, so when Oliver appears you can be certain of one thing: there is additional information to be found on the companion website. (Don’t be shy; download it and bathe in the warm asp’s milk of knowledge.)
- 24. xxvi DISCOVERING STATISTICS USING SPSS Satan’s Personal Statistics Slave: Satan is a busy boy – he has all of the lost souls to torture in hell; then there are the fires to keep fuelled, not to mention organizing enough carnage on the planet’s surface to keep Norwegian black metal bands inspired. Like many of us, this leaves little time for him to analyse data, and this makes him very sad. So, he has his own personal slave, who, also like some of us, spends all day dressed in a gimp mask and tight leather pants in front of Spss analysing Satan’s data. Consequently, he knows a thing or two about Spss, and when Satan’s busy spanking a goat, he pops up in a box with Spss tips. Smart Alex: Alex is a very important character because he appears when things get par- ticularly difficult. He’s basically a bit of a smart alec and so whenever you see his face you know that something scary is about to be explained. When the hard stuff is over he reappears to let you know that it’s safe to continue. Now, this is not to say that all of the rest of the material in the book is easy, he just let’s you know the bits of the book that you can skip if you’ve got better things to do with your life than read all 800 pages! So, if you see Smart Alex then you can skip the section entirely and still understand what’s going on. You’ll also find that Alex pops up at the end of each chapter to give you some tasks to do to see whether you’re as smart as he is. What is on the companion website? In this age of downloading, CD-ROMs are for losers (at least that’s what the ‘kids’ tell me) so this time around I’ve put my cornucopia of additional funk on that worldwide interweb thing. This has two benefits: (1) The book is slightly lighter than it would have been, and (2) rather than being restricted to the size of a CD-ROM, there is no limit to the amount of fascinating extra material that I can give you (although Sage have had to purchase a new server to fit it all on). To enter my world of delights, go to www.sagepub.co.uk/field3e (see the image on the next page). How will you know when there are extra goodies on this website? Easy-peasy, Oliver Twisted appears in the book to indicate that there’s something you need (or something extra) on the website. The website contains resources for students and lecturers alike: Data filesMM : You need data files to work through the examples in the book and they are all on the companion website. We did this so that you’re forced to go there and once you’re there you will never want to leave. There are data files here for a range of students, including those studying psychology, business and health sciences. Flash moviesMM : Reading is a bit boring; it’s much more amusing to listen to me explain- ing things in my camp English accent. Therefore, so that you can all have ‘laugh at Andy’ parties, I have created flash movies for each chapter that show you how to do the SPSS examples. I’ve also done extra ones that show you useful things that would otherwise have taken me pages of drivel to explain. Some of these movies are open access, but because the publishers want to sell some books, others are available only to lecturers. The idea is that they can put them on their virtual learning environ- ments. If they don’t, put insects under their office doors. PodcastMM : My publishers think that watching a film of me explaining what this book is all about is going to get people flocking to the bookshop. I think it will have people flocking to the medicine cabinet. Either way, if you want to see how truly uncharis- matic I am, watch and cringe.
- 25. xxviiHow To Use This Book Self-assessment multiple-choice questionsMM : Organized by chapter, these will allow you to test whether wasting your life reading this book has paid off so that you can walk confidently into an examination much to the annoyance of your friends. If you fail said exam, you can employ a good lawyer and sue me. Flashcard glossaryMM : As if a printed glossary wasn’t enough, my publishers insisted that you’d like one in electronic format too. Have fun here flipping about between terms and definitions that are covered in the textbook, it’s better than actually learning something. Additional materialMM : Enough trees have died in the name of this book, but still it gets longer and still people want to know more. Therefore, I’ve written nearly 300 pages, yes, three hundred, of additional material for the book. So for some more technical topics and help with tasks in the book the material has been provided electronically so that (1) the planet suffers a little less, and (2) you can actually lift the book. AnswersMM : each chapter ends with a set of tasks for you to test your newly acquired expertise. The chapters are also littered with self-test questions. How will you know if you get these correct? Well, the companion website contains around 300 hundred pages (that’s a different three hundred pages to the three hundred above) of detailed answers. Will I ever stop writing? Cyberworms of knowledgeMM : I have used nanotechnology to create cyberworms that crawl down your broadband connection, pop out of the USB port of your computer then fly through space into your brain. They re-arrange your neurons so that you understand statistics. You don’t believe me? Well, you’ll never know for sure unless you visit the companion website … Happy reading, and don’t get sidetracked by Facebook.
- 26. xxviii The first edition of this book wouldn’t have happened if it hadn’t been for Dan Wright, who not only had an unwarranted faith in a then-postgraduate to write the book, but also read and commented on draft chapters in all three editions. I’m really sad that he is leaving England to go back to the United States. The last two editions have benefited from the following people emailing me with com- ments, and I really appreciate their contributions: John Alcock, Aliza Berger-Cooper, Sanne Bongers, Thomas Brügger, Woody Carter, Brittany Cornell, Peter de Heus, Edith de Leeuw, Sanne de Vrie, Jaap Dronkers, Anthony Fee, Andy Fugard, Massimo Garbuio, Ruben van Genderen, Daniel Hoppe, Tilly Houtmans, Joop Hox, Suh-Ing (Amy) Hsieh, Don Hunt, Laura Hutchins-Korte, Mike Kenfield, Ned Palmer, Jim Parkinson, Nick Perham, Thusha Rajendran, Paul Rogers, Alf Schabmann, Mischa Schirris, Mizanur Rashid Shuvra, Nick Smith, Craig Thorley, Paul Tinsley, Keith Tolfrey, Frederico Torracchi, Djuke Veldhuis, Jane Webster and Enrique Woll. In this edition I have incorporated data sets from real research papers. All of these research papers are studies that I find fascinating and it’s an honour for me to have these researchers’ data in my book: Hakan Çetinkaya, Tomas Chamorro-Premuzic, Graham Davey, Mike Domjan, Gordon Gallup, Eric Lacourse, Sarah Marzillier, Geoffrey Miller, Peter Muris, Laura Nichols and Achim Schüetzwohl. Jeremy Miles stopped me making a complete and utter fool of myself (in the book – sadly his powers don’t extend to everyday life) by pointing out some glaring errors; he’s also been a very nice person to know over the past few years (apart from when he’s saying that draft sections of my books are, and I quote, ‘bollocks’!). David Hitchin, Laura Murray, Gareth Williams and Lynne Slocombe made an enormous contribution to the last edition and all of their good work remains in this edition. In this edition, Zoë Nightingale’s unwavering positivity and suggestions for many of the new chapters were invaluable. My biggest thanks go to Kate Lester who not only read every single chapter, but also kept my research laboratory ticking over while my mind was on this book. I liter- ally could not have done it without her support and constant offers to take on extra work that she did not have to do so that I could be a bit less stressed. I am very lucky to have her in my research team. All of these people have taken time out of their busy lives to help me out. I’m not sure what that says about their mental states, but they are all responsible for a great many improvements. May they live long and their data sets be normal. Not all contributions are as tangible as those above. With the possible exception of them not understanding why sometimes I don’t answer my phone, I could not have asked for more loving and proud parents – a fact that I often take for granted. Also, very early in my career Graham Hole made me realize that teaching research methods didn’t have to be dull. My whole approach to teaching has been to steal all of his good ideas and I’m pleased that he has had the good grace not to ask for them back! He is also a rarity in being Acknowledgements
- 27. xxixAcknowledgements brilliant, funny and nice. I also thank my Ph.D. students Carina Ugland, Khanya Price- Evans and Saeid Rohani for their patience for the three months that I was physically away in Rotterdam, and for the three months that I was mentally away upon my return. I appreciate everyone who has taken time to write nice reviews of this book on the vari- ous Amazon sites around the world (or any other website for that matter!). The success of this book has been in no small part due to these people being so positive and constructive in their reviews. I continue to be amazed and bowled over by the nice things that people write and if any of you are ever in Brighton, I owe you a pint! The people at Sage are less hardened drinkers than they used to be, but I have been very fortunate to work with Michael Carmichael and Emily Jenner. Mike, despite his failings on the football field(!), has provided me with some truly memorable nights out and he also read some of my chapters this time around which, as an editor, made a pleasant change. Both Emily and Mike took a lot of crap from me (especially when I was tired and stressed) and I’m grateful for their understanding. Emily I’m sure thinks I’m a grumpy sod, but she did a better job of managing me than she realizes. Also, Alex Lee did a fantastic job of turn- ing the characters in my head into characters on the page. Thanks to Jill Rietema at Spss Inc. who has been incredibly helpful over the past few years; it has been a pleasure working with her. The book (obviously) would not exist without Spss Inc.’s kind permission to use screenshots of their software. Check out their web pages (http://www.spss.com) for sup- port, contact information and training opportunities. I wrote much of this edition while on sabbatical at the Department of Psychology at the Erasmus University, Rotterdam, The Netherlands. I’m grateful to the clinical research group (especially the white ape posse!) who so unreservedly made me part of the team. Part of me definitely stayed with you when I left – I hope it isn’t annoying you too much. Mostly, though, I thank Peter (Muris), Birgit (Mayer), Jip and Kiki who made me part of their family while in Rotterdam. They are all inspirational. I’m grateful for their kindness, hospitality, and for not getting annoyed when I was still in their kitchen having drunk all of their wine after the last tram home had gone. Mostly, I thank them for the wealth of happy memories that they gave me. I always write listening to music. For the previous editions, I owed my sanity to: Abba, AC/DC, Arvo Pärt, Beck, The Beyond, Blondie, Busta Rhymes, Cardiacs, Cradle of Filth, DJ Shadow, Elliott Smith, Emperor, Frank Black and the Catholics, Fugazi, Genesis (Peter Gabriel era), Hefner, Iron Maiden, Janes Addiction, Love, Metallica, Massive Attack, Mercury Rev, Morrissey, Muse, Nevermore, Nick Cave, Nusrat Fateh Ali Khan, Peter Gabriel, Placebo, Quasi, Radiohead, Sevara Nazarkhan, Slipknot, Supergrass and The White Stripes. For this edition, I listened to the following, which I think tells you all that you need to know about my stress levels: 1349, Air, Angantyr, Audrey Horne, Cobalt, Cradle of Filth, Danzig, Dark Angel, Darkthrone, Death Angel, Deathspell Omega, Exodus, Fugazi, Genesis, High on Fire, Iron Maiden, The Mars Volta, Manowar, Mastodon, Megadeth, Meshuggah, Opeth, Porcupine Tree, Radiohead, Rush, Serj Tankian, She Said!, Slayer, Soundgarden, Taake, Tool and the Wedding Present. Finally, all this book-writing nonsense requires many lonely hours (mainly late at night) of typing. Without some wonderful friends to drag me out of my dimly lit room from time to time I’d be even more of a gibbering cabbage than I already am. My eternal gratitude goes to Graham Davey, Benie MacDonald, Ben Dyson, Martin Watts, Paul Spreckley, Darren Hayman, Helen Liddle, Sam Cartwright-Hatton, Karina Knowles and Mark Franklin for reminding me that there is more to life than work. Also, my eternal gratitude to Gini Harrison, Sam Pehrson and Luke Anthony and especially my brothers of metal Doug Martin and Rob Mepham for letting me deafen them with my drumming on a regular basis. Finally, thanks to Leonora for her support while I was writing the last two editions of this book.
- 28. Dedication Like the previous editions, this book is dedicated to my brother Paul and my cat Fuzzy, because one of them is a constant source of intellectual inspiration and the other wakes me up in the morning by sitting on me and purring in my face until I give him cat food: mornings will be considerably more pleasant when my brother gets over his love of cat food for breakfast.
- 29. xxxi Mathematical operators Σ This symbol (called sigma) means ‘add everything up’. So, if you see something like Σxi it just means ‘add up all of the scores you’ve collected’. Π This symbol means ‘multiply everything’. So, if you see something like Π xi it just means ‘multiply all of the scores you’ve collected’. √x This means ‘take the square root of x’. Greek symbols α The probability of making a Type I error β The probability of making a Type II error βi Standardized regression coefficient χ2 Chi-square test statistic χ2 F Friedman’s Anova test statistic ε Usually stands for ‘error’ η2 Eta-squared µ The mean of a population of scores ρ The correlation in the population σ2 The variance in a population of data σ The standard deviation in a population of data σx– The standard error of the mean τ Kendall’s tau (non-parametric correlation coefficient) ω2 Omega squared (an effect size measure). This symbol also means ‘expel the contents of your intestine immediately into your trousers’; you will understand why in due course Symbols used in this book
- 30. xxxii DISCOVERING STATISTICS USING SPSS English symbols bi The regression coefficient (unstandardized) df Degrees of freedom ei The error associated with the ith person F F-ratio (test statistic used in Anova) H Kruskal–Wallis test statistic k The number of levels of a variable (i.e. the number of treatment conditions), or the number of predictors in a regression model ln Natural logarithm MS The mean squared error (Mean Square). The average variability in the data N, n, ni The sample size. N usually denotes the total sample size, whereas n usually denotes the size of a particular group P Probability (the probability value, p-value or significance of a test are usually denoted by p) r Pearson’s correlation coefficient rs Spearman’s rank correlation coefficient rb, rpb Biserial correlation coefficient and point–biserial correlation coefficient respectively R The multiple correlation coefficient R2 The coefficient of determination (i.e. the proportion of data explained by the model) s2 The variance of a sample of data s The standard deviation of a sample of data SS The sum of squares, or sum of squared errors to give it its full title SSA The sum of squares for variable A SSM The model sum of squares (i.e. the variability explained by the model fitted to the data) SSR The residual sum of squares (i.e. the variability that the model can’t explain – the error in the model) SST The total sum of squares (i.e. the total variability within the data) t Test statistic for Student’s t-test T Test statistic for Wilcoxon’s matched-pairs signed-rank test U Test statistic for the Mann–Whitney test Ws Test statistic for Wilcoxon’s rank-sum test X – or x– The mean of a sample of scores z A data point expressed in standard deviation units
- 31. xxxiii Two negatives make a positive1 : Although in life two wrongs don’t make a right, in mathematics they do! When we multiply a negative number by another negative number, the result is a positive number. For example, −2 × −4 = 8. A negative number multiplied by a positive one make a negative number2 : If you mul- tiply a positive number by a negative number then the result is another negative number. For example, 2 × −4 = −8, or −2 × 6 = −12. BODMAS3 : This is an acronym for the order in which mathematical operations are per- formed. It stands for Brackets, Order, Division, Multiplication, Addition, Subtraction and this is the order in which you should carry out operations within an equation. Mostly these operations are self-explanatory (e.g. always calculate things within brackets first) except for order, which actually refers to power terms such as squares. Four squared, or 42 , used to be called four raised to the order of 2, hence the reason why these terms are called ‘order’ in BODMAS (also, if we called it power, we’d end up with BPDMAS, which doesn’t roll off the tongue quite so nicely). Let’s look at an example of BODMAS: what would be the result of 1 + 3 × 52 ? The answer is 76 (not 100 as some of you might have thought). There are no brackets so the first thing is to deal with the order term: 52 is 25, so the equation becomes 1 + 3 × 25. There is no division, so we can move on to multiplication: 3 × 25, which gives us 75. BODMAS tells us to deal with addition next: 1 + 75, which gives us 76 and the equation is solved. If I’d written the original equation as (1 + 3) × 52 , then the answer would have been 100 because we deal with the brackets first: (1 + 3) = 4, so the equation becomes 4 × 52 . We then deal with the order term, so the equation becomes 4 × 25 = 100! http://www.easymaths.com4 is a good site for revising basic maths. Some maths revision
- 32. 1 1.1. What will this chapter tell me? 1 I was born on 21 June 1973. Like most people, I don’t remember anything about the first few years of life and like most children I did go through a phase of driving my parents mad by asking ‘Why?’ every five seconds. ‘Dad, why is the sky blue?’, ‘Dad, why doesn’t mummy have a willy?’ etc. Children are naturally curious about the world. I remember at the age of 3 being at a party of my friend Obe (this was just before he left England to return to Nigeria, much to my distress). It was a hot day, and there was an electric fan blowing cold air around the room. As I said, children are natural scientists and my little scientific brain was working through what seemed like a particularly pressing question: ‘What hap- pens when you stick your finger into a fan?’ The answer, as it turned out, was that it hurts – a lot.1 My point is this: my curiosity to explain the world never went away, and that’s why 1 In the 1970s fans didn’t have helpful protective cages around them to prevent idiotic 3 year olds sticking their fingers into the blades. 1 Why is my evil lecturer forcing me to learn statistics? Figure 1.1 When I grow up, please don’t let me be a statistics lecturer
- 33. 2 DISCOVERING STATISTICS USING SPSS I’m a scientist, and that’s also why your evil lecturer is forcing you to learn statistics. It’s because you have a curious mind too and you want to answer new and exciting questions. To answer these questions we need statistics. Statistics is a bit like sticking your finger into a revolving fan blade: sometimes it’s very painful, but it does give you the power to answer interesting questions. This chapter is going to attempt to explain why statistics are an important part of doing research. We will overview the whole research process, from why we conduct research in the first place, through how theories are generated, to why we need data to test these theories. If that doesn’t convince you to read on then maybe the fact that we discover whether Coca-Cola kills sperm will. Or perhaps not. 1.2. What the hell am I doing here? I don’t belong here 1 You’re probably wondering why you have bought this book. Maybe you liked the pictures, maybe you fancied doing some weight training (it is heavy), or perhaps you need to reach something in a high place (it is thick). The chances are, though, that given the choice of spending your hard-earned cash on a statistics book or something more entertaining (a nice novel, a trip to the cinema, etc.) you’d choose the latter. So, why have you bought the book (or downloaded an illegal pdf of it from someone who has way too much time on their hands if they can scan an 800-page textbook)? It’s likely that you obtained it because you’re doing a course on statistics, or you’re doing some research, and you need to know how to analyse data. It’s possible that you didn’t realize when you started your course or research that you’d have to know this much about statistics but now find yourself inexplicably wad- ing, neck high, through the Victorian sewer that is data analysis. The reason that you’re in the mess that you find yourself in is because you have a curious mind. You might have asked yourself questions like why people behave the way they do (psychology) or why behaviours differ across cultures (anthropology), how businesses maximize their profit (business), how did the dinosaurs die (palaeontology), does eating tomatoes protect you against cancer (medicine, biology), is it possible to build a quantum computer (physics, chemistry), is the planet hotter than it used to be and in what regions (geography, environmental studies)? Whatever it is you’re studying or researching, the reason you’re studying it is probably because you’re interested in answering questions. Scientists are curious people, and you probably are too. However, you might not have bargained on the fact that to answer inter- esting questions, you need two things: data and an explanation of those data. The answer to ‘what the hell are you doing here?’ is, therefore, simple: to answer interest- ing questions you need data. Therefore, one of the reasons why your evil statistics lecturer is forcing you to learn about numbers is because they are a form of data and are vital to the research process. Of course there are forms of data other than numbers that can be used to test and generate theories. When numbers are involved the research involves quantitative methods, but you can also generate and test theories by analysing language (such as conversa- tions, magazine articles, media broadcasts and so on). This involves qualitative methods and it is a topic for another book not written by me. People can get quite passionate about which of these methods is best, which is a bit silly because they are complementary, not compet- ing, approaches and there are much more important issues in the world to get upset about. Having said that, all qualitative research is rubbish.2 2 This is a joke. I thought long and hard about whether to include it because, like many of my jokes, there are people who won’t find it remotely funny. Its inclusion is also making me fear being hunted down and forced to eat my own entrails by a hoard of rabid qualitative researchers. However, it made me laugh, a lot, and despite being vegetarian I’m sure my entrails will taste lovely.
- 34. 3CHAPTER 1 Why is my evil lecturer forcing me to learn statistics? Data Initial Observation (Research Question) Generate Theory Generate Hypotheses Collect Data to Test Theory Analyse Data Identify Variables Measure Variables Graph Data Fit a Model Figure 1.2 The research process 1.2.1. The research process 1 How do you go about answering an interesting question? The research process is broadly summarized in Figure 1.2. You begin with an observation that you want to understand, and this observation could be anecdotal (you’ve noticed that your cat watches birds when they’re on TV but not when jellyfish are on3 ) or could be based on some data (you’ve got several cat owners to keep diaries of their cat’s TV habits and have noticed that lots of them watch birds on TV). From your ini- tial observation you generate explanations, or theories, of those observations, from which you can make predictions (hypotheses). Here’s where the data come into the process because to test your predictions you need data. First you collect some relevant data (and to do that you need to identify things that can be measured) and then you analyse those data. The analysis of the data may support your theory or give you cause to modify the theory. As such, the processes of data collection and analysis and generating theo- ries are intrinsically linked: theories lead to data collection/analysis and data collection/analysis informs theories! This chapter explains this research process in more detail. 1.3. Initial observation: finding something that needs explaining 1 The first step in Figure 1.2 was to come up with a question that needs an answer. I spend rather more time than I should watching reality TV. Every year I swear that I won’t get hooked on Big Brother, and yet every year I find myself glued to the TV screen waiting for 3 My cat does actually climb up and stare at the TV when it’s showing birds flying about. How do I do research?
- 35. 4 DISCOVERING STATISTICS USING SPSS the next contestant’s meltdown (I am a psychologist, so really this is just research – honestly). One question I am constantly perplexed by is why every year there are so many contestants with really unpleasant personalities (my money is on narcissistic personality disorder4 ) on the show. A lot of scientific endeavour starts this way: not by watching Big Brother, but by observing something in the world and wondering why it happens. Having made a casual observation about the world (Big Brother contestants on the whole have profound personality defects), I need to collect some data to see whether this obser- vation is true (and not just a biased observation). To do this, I need to define one or more variables that I would like to measure. There’s one variable in this example: the personal- ity of the contestant. I could measure this variable by giving them one of the many well- established questionnaires that measure personality characteristics. Let’s say that I did this and I found that 75% of contestants did have narcissistic personality disorder. These data support my observation: a lot of Big Brother contestants have extreme personalities. 1.4. Generating theories and testing them 1 The next logical thing to do is to explain these data (Figure 1.2). One explanation could be that people with narcissistic personality disorder are more likely to audition for Big Brother than those without. This is a theory. Another possibility is that the producers of Big Brother are more likely to select people who have narcissistic personality disorder to be contestants than those with less extreme personalities. This is another theory. We verified our original observation by collecting data, and we can collect more data to test our theories. We can make two predictions from these two theories. The first is that the number of people turning up for an audition that have narcissistic personality disorder will be higher than the general level in the population (which is about 1%). A prediction from a theory, like this one, is known as a hypothesis (see Jane Superbrain Box 1.1). We could test this hypothesis by getting a team of clinical psychologists to interview each person at the Big Brother audition and diagnose them as having narcissistic personality disorder or not. The prediction from our second theory is that if the Big Brother selection panel are more likely to choose people with narcissistic personality disorder then the rate of this disorder in the final contestants will be even higher than the rate in the group of people going for auditions. This is another hypothesis. Imagine we collected these data; they are in Table 1.1. In total, 7662 people turned up for the audition. Our first hypothesis is that the percentage of people with narcissistic personality disorder will be higher at the audition than the gen- eral level in the population. We can see in the table that of the 7662 people at the audition, 4 This disorder is characterized by (among other things) a grandiose sense of self-importance, arrogance, lack of empathy for others, envy of others and belief that others envy them, excessive fantasies of brilliance or beauty, the need for excessive admiration and exploitation of others. Table 1.1 A table of the number of people at the Big Brother audition split by whether they had narcissistic personality disorder and whether they were selected as contestants by the producers No Disorder Disorder Total Selected 3 9 12 Rejected 6805 845 7650 Total 6808 854 7662
- 36. 5CHAPTER 1 Why is my evil lecturer forcing me to learn statistics? 854 were diagnosed with the disorder, this is about 11% (854/7662 × 100) which is much higher than the 1% we’d expect. Therefore, hypothesis 1 is supported by the data. The second hypothesis was that the Big Brother selection panel have a bias to choose people with narcissistic personality disorder. If we look at the 12 contestants that they selected, 9 of them had the disorder (a massive 75%). If the producers did not have a bias we would have expected only 11% of the contestants to have the disorder. The data again support our hypothesis. Therefore, my initial observation that contestants have personality disor- ders was verified by data, then my theory was tested using specific hypotheses that were also verified using data. Data are very important! be empirically tested. So, statements such as ‘The Led Zeppelin reunion concert in London in 2007 was the best gig ever’,5 ‘Lindt chocolate is the best food’, and ‘This is the worst statistics book in the world’ are all non-scientific; they cannot be proved or disproved. Scientific statements can be confirmed or disconfirmed empirically. ‘Watching Curb Your Enthusiasm makes you happy’, ‘having sex increases levels of the neurotransmitter dopamine’ and ‘Velociraptors ate meat’ are all things that can be tested empirically (provided you can quantify and measure the variables concerned). Non-scientific statements can sometimes be altered to become scientific statements, so ‘The Beatles were the most influential band ever’ is non-scientific (because it is probably impossible to quan- tify ‘influence’ in any meaningful way) but by changing the statement to ‘The Beatles were the best-selling band ever’ it becomes testable (we can collect data about worldwide record sales and establish whether The Beatles have, in fact, sold more records than any other music artist). Karl Popper, the famous philosopher of science, believed that non-scientific statements were nonsense, and had no place in science. Good theories should, therefore, pro- duce hypotheses that are scientific statements. A good theory should allow us to make statements about the state of the world. Statements about the world are good things: they allow us to make sense of our world, and to make decisions that affect our future. One current example is global warming. Being able to make a defini- tive statement that global warming is happening, and that it is caused by certain practices in society, allows us to change these practices and, hopefully, avert catas- trophe. However, not all statements are ones that can be tested using science. Scientific statements are ones that can be verified with reference to empirical evidence, whereas non-scientific statements are ones that cannot JANE SUPERBRAIN 1.1 When is a hypothesis not a hypothesis? 1 I would now be smugly sitting in my office with a contented grin on my face about how my theories and observations were well supported by the data. Perhaps I would quit while I’m ahead and retire. It’s more likely, though, that having solved one great mystery, my excited mind would turn to another. After another few hours (well, days probably) locked up at home watching Big Brother I would emerge triumphant with another profound observa- tion, which is that these personality-disordered contestants, despite their obvious character flaws, enter the house convinced that the public will love them and that they will win.6 My hypothesis would, therefore, be that if I asked the contestants if they thought that they would win, the people with a personality disorder would say yes. 6 One of the things I like about Big Brother in the UK is that year upon year the winner tends to be a nice person, which does give me faith that humanity favours the nice. 5 It was pretty awesome actually.
- 37. 6 DISCOVERING STATISTICS USING SPSS Let’s imagine I tested my hypothesis by measuring their expectations of success in the show, by just asking them, ‘Do you think you will win Big Brother?’. Let’s say that 7 of 9 contestants with personality disorders said that they thought that they will win, which confirms my observation. Next, I would come up with another theory: these contestants think that they will win because they don’t realize that they have a per- sonality disorder. My hypothesis would be that if I asked these people about whether their personalities were different from other people they would say ‘no’. As before, I would collect some more data and perhaps ask those who thought that they would win whether they thought that their personalities were different from the norm. All 7 contestants said that they thought their personalities were different from the norm. These data seem to contradict my theory. This is known as falsification, which is the act of disproving a hypothesis or theory. It’s unlikely that we would be the only people interested in why individuals who go on Big Brother have extreme personalities and think that they will win. Imagine these research- ers discovered that: (1) people with narcissistic personality disorder think that they are more interesting than others; (2) they also think that they deserve success more than oth- ers; and (3) they also think that others like them because they have ‘special’ personalities. This additional research is even worse news for my theory: if they didn’t realize that they had a personality different from the norm then you wouldn’t expect them to think that they were more interesting than others, and you certainly wouldn’t expect them to think that others will like their unusual personalities. In general, this means that my theory sucks: it cannot explain all of the data, predictions from the theory are not supported by subsequent data, and it cannot explain other research findings. At this point I would start to feel intellectually inadequate and people would find me curled up on my desk in floods of tears wailing and moaning about my failing career (no change there then). At this point, a rival scientist, Fester Ingpant-Stain, appears on the scene with a rival theory to mine. In his new theory, he suggests that the problem is not that personality-disordered contestants don’t realize that they have a personality disorder (or at least a personality that is unusual), but that they falsely believe that this special personality is perceived positively by other people (put another way, they believe that their personality makes them likeable, not dislikeable). One hypothesis from this model is that if personality-disordered contestants are asked to evaluate what other people think of them, then they will overestimate other peo- ple’s positive perceptions. To test this hypothesis, Fester Ingpant-Stain collected yet more data. When each contestant came to the diary room they had to fill out a questionnaire evaluating all of the other contestants’ personalities, and also answer each question as if they were each of the contestants responding about them. (So, for every contestant there is a measure of what they thought of every other contestant, and also a measure of what they believed every other contestant thought of them.) He found out that the contestants with personality disorders did overestimate their housemate’s view of them; in comparison the contestants without personal- ity disorders had relatively accurate impressions of what others thought of them. These data, irritating as it would be for me, support the rival theory that the contestants with personality disorders know they have unusual personalities but believe that these characteristics are ones that others would feel positive about. Fester Ingpant-Stain’s theory is quite good: it explains the initial observations and brings together a range of research findings. The end result of this whole process (and my career) is that we should be able to make a general statement about the state of the world. In this case we could state: ‘Big Brother contestants who have personality disorders overestimate how much other people like their personality characteristics’. SELF-TEST Based on what you have read in this section, what qualities do you think a scientific theory should have? Are Big Brother contestants odd?
- 38. 7CHAPTER 1 Why is my evil lecturer forcing me to learn statistics? 1.5. Data collection 1: what to measure 1 We have seen already that data collection is vital for testing theories. When we collect data we need to decide on two things: (1) what to measure, (2) how to measure it. This section looks at the first of these issues. 1.5.1. Variables 1 1.5.1.1. Independent and dependent variables 1 To test hypotheses we need to measure variables. Variables are just things that can change (or vary); they might vary between people (e.g. IQ, behaviour) or locations (e.g. unemployment) or even time (e.g. mood, profit, number of cancerous cells). Most hypotheses can be expressed in terms of two variables: a proposed cause and a proposed outcome. For example, if we take the scientific statement ‘Coca-Cola is an effective spermicide’7 then proposed cause is ‘Coca- Cola’ and the proposed effect is dead sperm. Both the cause and the outcome are variables: for the cause we could vary the type of drink, and for the outcome, these drinks will kill different amounts of sperm. The key to testing such statements is to measure these two variables. A variable that we think is a cause is known as an independent variable (because its value does not depend on any other variables). A variable that we think is an effect is called a dependent variable because the value of this variable depends on the cause (independent variable). These terms are very closely tied to experimental methods in which the cause is actually manipulated by the experimenter (as we will see in section 1.6.2). In cross- sectional research we don’t manipulate any variables, and we cannot make causal statements about the relationships between variables, so it doesn’t make sense to talk of dependent and independent variables because all variables are dependent variables in a sense. One possibil- ity is to abandon the terms dependent and independent variable and use the terms predictor variable and outcome variable. In experimental work the cause, or independent variable, is a predictor, and the effect, or dependent variable, is simply an outcome. This terminology also suits cross-sectional work where, statistically at least, we can use one or more variables to make predictions about the other(s) without needing to imply causality. 7 Actually, there is a long-standing urban myth that a post-coital douche with the contents of a bottle of Coke is an effective contraceptive. Unbelievably, this hypothesis has been tested and Coke does affect sperm motility, and different types of Coke are more or less effective – Diet Coke is best apparently (Umpierre, Hill, Anderson, 1985). Nevertheless, a Coke douche is ineffective at preventing pregnancy. CRAMMING SAM’s Tips Some important terms When doing research there are some important generic terms for variables that you will encounter: Independent variable: A variable thought to be the cause of some effect. This term is usually used in experimental research to denote a variable that the experimenter has manipulated. Dependent variable: A variable thought to be affected by changes in an independent variable. You can think of this variable as an outcome. Predictor variable: A variable thought to predict an outcome variable. This is basically another term for independent variable (although some people won’t like me saying that; I think life would be easier if we talked only about predictors and outcomes). Outcome variable: A variable thought to change as a function of changes in a predictor variable. This term could be synonymous with ‘dependent variable’ for the sake of an easy life.
- 39. 8 DISCOVERING STATISTICS USING SPSS 1.5.1.2. Levels of measurement 1 As we have seen in the examples so far, variables can take on many different forms and levels of sophistication. The relationship between what is being measured and the numbers that represent what is being measured is known as the level of measurement. Broadly speaking, variables can be categorical or continuous, and can have different levels of measurement. A categorical variable is made up of categories. A categorical variable that you should be familiar with already is your species (e.g. human, domestic cat, fruit bat, etc.). You are a human or a cat or a fruit bat: you cannot be a bit of a cat and a bit of a bat, and neither a batman nor (despite many fantasies to the contrary) a catwoman (not even one in a nice PVC suit) exist. A categorical variable is one that names distinct entities. In its simplest form it names just two distinct types of things, for example male or female. This is known as a binary variable. Other examples of binary variables are being alive or dead, pregnant or not, and responding ‘yes’ or ‘no’ to a question. In all cases there are just two categories and an entity can be placed into only one of the two categories. When two things that are equivalent in some sense are given the same name (or number), but there are more than two possibilities, the variable is said to be a nominal variable. It should be obvious that if the variable is made up of names it is pointless to do arithmetic on them (if you multiply a human by a cat, you do not get a hat). However, sometimes numbers are used to denote categories. For example, the numbers worn by players in a rugby or football (soccer) team. In rugby, the numbers of shirts denote specific field positions, so the number 10 is always worn by the fly-half (e.g. England’s Jonny Wilkinson),8 and the number 1 is always the hooker (the ugly-looking player at the front of the scrum). These numbers do not tell us anything other than what position the player plays. We could equally have shirts with FH and H instead of 10 and 1. A number 10 player is not necessarily better than a number 1 (most managers would not want their fly-half stuck in the front of the scrum!). It is equally as daft to try to do arithmetic with nominal scales where the categories are denoted by numbers: the number 10 takes penalty kicks, and if the England coach found that Jonny Wilkinson (his number 10) was injured he would not get his number 4 to give number 6 a piggyback and then take the kick. The only way that nominal data can be used is to consider frequencies. For example, we could look at how frequently number 10s score tries compared to number 4s. 8 Unlike, for example, NFL American football where a quarterback could wear any number from 1 to 19. on a 10-point scale. We might be confident that a judge who gives a rating of 10 found Billie more beautiful than one who gave a rating of 2, but can we be certain that the first judge found her five times more beautiful than the sec- ond? What about if both judges gave a rating of 8, could we be sure they found her equally beautiful? Probably not: their ratings will depend on their subjective feelings about what constitutes beauty. For these reasons, in any situa- tion in which we ask people to rate something subjective (e.g. rate their preference for a product, their confidence about an answer, how much they have understood some medical instructions) we should probably regard these data as ordinal although many scientists do not. A lot of self-report data are ordinal. Imagine if two judges at our beauty pageant were asked to rate Billie’s beauty JANE SUPERBRAIN 1.2 Self-report data 1