-
Notifications
You must be signed in to change notification settings - Fork 0
/
appendix.qmd
2180 lines (1622 loc) · 97.8 KB
/
appendix.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Appendix
## Interpreting Interactions in a Regression Model Overview
### Two-Way Interactions
#### General
Let our regression model follow this form:
$$
Y = A + B + A*B
$$
Where Y represents our dependent/outcome variable and $A*B$ represents
the interaction between $A$ and $B$.
- The regression coefficient for $A$ shows the effect of $A$ when $B=0$.
- The regression coefficient for $B$ shows the effect of B when $A=0$.
- The regression coefficient for $A*B$ demonstrates how $A$ changes with
a one unit increase in $B.$ It also demonstrates how $B$ changes with a
one unit increase in $A$.
#### Two Categorical Variables
- Let $A$ represent gender
- 0=Female
- 1=Male
- Let $B$ represent treatment condition
- 0=Control
- 1=Experimental
- The interaction regression coefficient shows whether the effect of
treatment condition is different for males and females.
- The regression coefficient for $A$ shows the difference in $Y$ between
males and females for the 'control' treatment group.
- The regression coefficient for $B$ shows the difference in $Y$ between
treatment and control groups for females.
#### One Categorical and One Continuous Variable
- Let $A$ represent gender
- 0=Female
- 1=Male
- Let $B$ represent a continuous variable: age in years.
- The interaction regression coefficient shows if the effect of age on
$Y$ is different for males and females.
- The regression coefficient for $A$ shows the difference between males
and females when age is equal to zero.
- The regression coefficient for $B$ shows the effect of age for
females.
#### Two Continuous Variables
- Let $A$ represent a continuous variable: IQ score.
- Let $B$ represent a continuous variable: Age.
- The interaction regression coefficient shows
- if the relationship between age and $Y$ differs according to IQ
- if the relationship between IQ and $Y$ differs according to age.
- The regression coefficient for $A$ shows the relationship between IQ
and $Y$ when age equals zero.
- The regression coefficient for $B$ shows the relationship between age
and $Y$ when IQ equals zero.
### Three-Way Interactions
The same principles apply from above. The general model:
$$
Y = A + B + C + A*B + A*C + B*C + A*B*C
$$
- The coefficient for $A$ shows the effect of $A$ on $Y$ when both $B$ and $C$
are zero.
- The coefficient for $B$ shows the effect of $B$ on $Y$ when both $A$ and $C$
are zero.
- The coefficient for $C$ shows the effect of $C$ on $Y$ when both $A$ and $B$
are zero.
- The coefficient for $A*B$ shows the interaction between $A$ and $B$ when
$C$ is zero.
- The coefficient for $A*C$ shows the interaction between $A$ and $C$ when
$B$ is zero.
- The coefficient for $B*C$ shows the interaction between $B$ and $C$ when
$A$ is zero.
- The interaction regression coefficient shows if the relationship
between
- $A$ and $Y$ differs according to $B$ and $C$
- $B$ and $Y$ differs according to $A$ and $C$
- $C$ and $Y$ differs according to $A$ and $B$.
## Exercise Solutions
### Exercise 1
In order to analyze data properly in SPSS, we need to follow the
guidelines set out above. Open [exercise1_data.sav](data/exercise1/exercise1_data.sav) and see what guidelines
we have ignored.
#### Exercise 1 Solution
[![](./media/spss-exer-image1.png){width="4.229166666666667in" height="3.2511712598425198in"}](./media/spss-exer-image1.png){target="_blank"}
Too much information is contained in one variable
(CTSSurgTypeCatCodeDesc, LOS, SURGLOS, DCDate, etc.)
Errors can easily be found by sorting (errors in Year, AGE)
The same content is entered in differently for a single variable (SEX,
HTN, SMOKING)
Anything else?
### Exercise 2
Open [exercise2_data.sav](data/exercise2/exercise2_data.xls) (an Excel file). Modify this Excel file such that
it can be imported into SPSS properly. Save the file and close it.
Open the file in SPSS (import it). Export this file back into Excel,
but only save the following variables: id, salary, minority.
#### Exercise 2 Solution
[![](./media/spss-exer-image2.png){width="6.5in" height="4.440387139107612in"}](./media/spss-exer-image2.png){target="_blank"}
- Delete the first three rows of data (remove heading)
- Remove rows 23 and 24 (contains summary information)
- Remove the formatting (fill color)
- Save the file as Exercise2\_Data\_Ready
[![](./media/spss-exer-image3.png){width="6.5in" height="4.440387139107612in"}](./media/spss-exer-image3.png){target="_blank"}
- Close Exercise2_Data_Ready
- Open SPSS
- Select File -> Open -> Data
- Under "Files of Type" select either "All Files" or "Excel" to view
Exercise2_Data_Ready, select the file, then select "Open"
[![](./media/spss-exer-image4.png){width="6.239583333333333in" height="3.3229166666666665in"}](./media/spss-exer-image4.png){target="_blank"}
- A window appears
- Check the box so the variable names will be imported
- Select the sheet of the Excel file that you would like to be read
in, then select "Ok"
[![](./media/spss-exer-image5.png){width="4.041666666666667in" height="2.5in"}](./media/spss-exer-image5.png){target="_blank"}
- The Excel data should now open in the Data Editor
- Delete any "blank" rows of data or columns of data (indicated by `.`) by
highlighting, right click, select "cut"
[![](./media/spss-exer-image6.png){width="4.550594925634296in" height="3.4982699037620297in"}](./media/spss-exer-image6.png){target="_blank"}
- Select File -> Save As
- Let the file name be Exercise2_Data_Ready_short
- Change the file type to Excel 97 through 2003 (\*.xls)
- Select the "Variables..." button
- Select the "Drop All" button
- Under the "Keep" column, check the box for id, salary, minority
- Select "Continue"
[![](./media/spss-exer-image7.png){width="3.7843613298337706in" height="2.914981408573928in"}](./media/spss-exer-image7.png){target="_blank"}
- Select "Save"
- Open the new file (Exercise2_Data_Ready_short) to investigate the
results
### Exercise 3
Open [exercise3_data.sav](data/exercise3/exercise3_data.sav) and go to Variable View. Practice defining the
correct attributes to each variable by following the code book.
|Name |Label |Value Label |Missing Values |Measure
|:-------------|:--------------------------------------|:--------------------------------|:--------------|:-------
|IDnum |<none> |<none> |<none> |Scale
|sex |Respondent's Sex |1 = Male |<none> |Nominal
| | |2 = Female | |
|race |Race of Respondent |1 = White |<none> |Nominal
| | |2 = Black | |
| | |3 = Other | |
|region |Region of the United States |1 = North East |<none> |Nominal
| | |2 = South East | |
| | |3 = West | |
|happy |General Happiness |0 = NAP |0, 8, 9 |Ordinal
| | |1 = Very Happy | |
| | |2 = Pretty Happy | |
| | |3 = Not too Happy | |
| | |8 = DK | |
| | |9 = NA | |
|life |Is Life Exciting or Dull |0 = NAP |0, 8, 9 |Ordinal
| | |1 = Exciting | |
| | |2 = Routine | |
| | |3 = Dull | |
| | |8 = DK | |
| | |9 = NA | |
|sibs |Number of Brothers and Sisters |98 = DK |98, 99 |Scale
| | |99 = NA | |
|childs |Number of Children |8 = Eight or More |9 |Scale
| | |9 = NA | |
|age |Age of Respondent |98 = DK |0, 98, 99 |Scale
| | |99 = NA | |
|educ |Highest Year of School Completed |97 = NAP |97, 98, 99 |Scale
| | |98 = DK | |
| | |99 = NA | |
|paeduc |Highest Year School, Father |97 = NAP |97, 98, 99 |Scale
| | |98 = DK | |
| | |99 = NA | |
|maeduc |Highest Year School, Mother |97 = NAP |97, 98, 99 |Scale
| | |98 = DK | |
| | |99 = NA | |
|seeduc |Highest Year School, Spouse |97 = NAP |97, 98, 99 |Scale
| | |98 = DK | |
| | |99 = NA | |
|prestg80 |Occupational Prestige Score |0 = DK,NA,NAP |0 |Scale
|occcat80 |Occupational Category |1 = Managerial and Professional |<none> |Nominal
| | |2 = Technical and Sales | |
| | |3 = Service | |
| | |4 = Farming, Forest, and Fishing | |
| | |5 = Production and Craft | |
| | |6 = General Labor | |
#### Exercise 3 Solution
- In Variable View, the first four columns do not need to be modified
- To modify the variable label, click in the cell that you wish to
edit and start tying in the label
- To modify the value labels, click the cell that you wish to edit and
then select the box with three small dots. The following window will
appear:
[![](./media/spss-exer-image9.png){width="4.854166666666667in" height="3.125in"}](./media/spss-exer-image9.png){target="_blank"}
- Enter the value and label, then select "Add". Once all possible
value labels are added, select "OK"
- When value labels (or other attributes such as label or missing)
repeat for a variable, you can copy and paste the attribute values.
Right click on the cell you want to copy, select copy. Then right
click on the cell that you would like to paste in, and select paste.
[![](./media/spss-exer-image10.png){width="6.5in" height="3.5605489938757655in"}](./media/spss-exer-image10.png){target="_blank"}
- Enter in missing values in a similar fashion---here we have discrete
missing values
- Use the drop down menu for "Measure" to specify the correct
measurement type
[![](./media/spss-exer-image11.png){width="3.3541666666666665in" height="2.4895833333333335in"}](./media/spss-exer-image11.png){target="_blank"}
### Exercise 4
Open [exercise4_data.sav](data/exercise4/exercise4_data.sav).
Compute a new variable that is the change from beginning salary to
current salary for each employee.
Recode the education variable into a new variable according to the
following
- 1=High School or Less (educ\<=12)
- 2=Some College (12\<educ\<=16)
- 3=Bachelor's Degree or Higher (educ\>=17)
#### Exercise 4 Solution
Compute a new variable that is the change from beginning salary to
current salary for each employee.
- Transform -> Compute Variable
- Select "Reset"
- Enter the following information
- Target Variable: salchange
- Double click (or use the arrow) to move salary to the Numeric
Expression window
- Use the calculator box below the numeric expression box to enter a
minus sign (alternatively, you could type a minus sign) then select
salbegin
- Select OK, and the new variable will appear in the data set
[![](./media/spss-exer-image12.png){width="6.459686132983377in" height="5.278486439195101in"}](./media/spss-exer-image12.png){target="_blank"}
Recode the education variable into a new variable according to the
following
- 1=High School or Less (educ\<=12)
- 2=Some College (12\<educ\<=16)
- 3=Bachelor's Degree or Higher (educ\>=17)
<!-- -->
- Transform -> Recode into different variables
- Move education (educ) into the Input Variable Output Variable window
by double clicking on it or using the arrow
- Name: EducRecode
- Label: Leave Blank
- Click the change button
- Under old value, select the radio dial for Range, LOWEST through
value: enter 12
- Under new value, select the radio dial for Value: enter 1
- Select Add
- Under old value, select the radio dial for Range: enter 13 through
15
- Under new value, select the radio dial for Value: enter 2
- Select Add
- Under old value, select the radio dial for Range, value through
HIGHEST: enter 16
- Under new value, select the radio dial for Value: enter 3
- Select Add
- Select Continue
- Select OK
- Check the dataset in Data View
[![](./media/spss-exer-image13.png){width="6.5in" height="3.7894663167104112in"} ](./media/spss-exer-image13.png){target="_blank"}
### Exercise 5
Open [exercise5_data.sav](data/exercise5/exercise5_data.sav).
Select male managers. What is their average age?
(You can obtain the average age by choosing Analyze -> Descriptive Statistics -> Descriptives and moving “Age of Respondent (age)” to the right hand side.)
Use the “Split File” procedure to get the average age for each job category.
#### Exercise 5 Solution
Select male managers. What is their average age?
- Check Values for sex and occat80 to see what values correspond to
"male" and "manager" (it's 1 and 1).
- Data -> Select Cases
- Under Select: Select the If Condition is Satisfied radio dial and
select the If button
[![](./media/spss-exer-image14.png){width="4.327660761154855in" height="4.396551837270342in"}](./media/spss-exer-image14.png){target="_blank"}
- Enter the following information
- Open box should read as follows: sex=1 & occcat80=1
- Continue
[![](./media/spss-exer-image15.png){width="4.5in" height="3.0317311898512687in"}](./media/spss-exer-image15.png){target="_blank"}
- Under Output: Select Filter Out Unselected Cases
- Select OK
- Inspect the data in Data View
- Analyze -> Descriptive Statistics -> Descriptives
[![](./media/spss-exer-image16.png){width="4.387911198600175in" height="2.52586176727909in"}](./media/spss-exer-image16.png){target="_blank"}
- Select the age variable, select OK
- Turn off the filter!
[![](./media/MM_age.png){width="4in" height="1in"}](./media/MM_age.png){target="_blank"}
Use the “Split File” procedure to get the average age for each job category.
- Data -> Split File
- Select Compare Groups
- Select occat80 (Occupational Category) and move it into the Groups
Based On window by double clicking (or using the arrow)
- Select Sort the File by Grouping Variables
- Select Ok
[![](./media/spss-exer-image17.png){width="4.004309930008749in" height="2.963445975503062in"}](./media/spss-exer-image17.png){target="_blank"}
- Analyze -> Descriptive Statistics -> Descriptives
- Select the age variable and OK
[![](./media/spss-exer-image16.png){width="4.387911198600175in" height="2.52586176727909in"}](./media/spss-exer-image16.png){target="_blank"}
- Turn off the split file!
[![](./media/OCC_age.png)](./media/OCC_age.png){target="_blank"}
### Exercise 6
Convert exercise6_data from "Wide" format to "Long" format
#### Exercise 6 Solution
- Open [exercise6_data.sav](data/exercise6/exercise6_data.sav)
- Select Data -> Restructure to open the Wizard
- Select "Restructure selected variables into cases" then "Next"
[![](./media/spss-exer-image18.png){width="5.472672790901138in" height="5.316558398950131in"}](./media/spss-exer-image18.png){target="_blank"}
- How many variable groups to you want to restructure? Select "One"
then "Next"
[![](./media/spss-exer-image19.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image19.png){target="_blank"}
- Case Group Identification should be changed to "Use selected
variable" and the variable should be the ID variable
- Variables to be transposed: Move the X variables over (X1, X2, X3)
- Fixed Variable(s): Move Group and Age over
- Select "Next"
[![](./media/spss-exer-image20.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image20.png){target="_blank"}
- How many index variables do you want to create? Select "one" then
"Next"
[![](./media/spss-exer-image21.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image21.png){target="_blank"}
- What kind of index values? Select "Sequential Numbers" then select
"Next"
[![](./media/spss-exer-image22.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image22.png){target="_blank"}
- Handling of Variables not Selected: Select "Keep and treat as fixed
variable(s)"
- System Missing or Blank Values in All Transposed Variables: Select
"Create a case in the new file"
- Leave "Case Count Variable" unchecked
- Select "Next"
[![](./media/spss-exer-image23.png){width="6.5in" height="6.314580052493438in"} ](./media/spss-exer-image23.png){target="_blank"}
- What do you want to do? Select "Restructure the data now". In the
future you may want to keep the syntax.
- Select "Finish"
- The following message appears, click "OK"
[![](./media/spss-exer-image24.png){width="5.375in" height="1.4115113735783027in"} ](./media/spss-exer-image24.png){target="_blank"}
- Inspect the data (and change "trans1" to "X")
[![](./media/spss-exer-image25.png){width="5.207962598425197in" height="4.987069116360455in"} ](./media/spss-exer-image25.png){target="_blank"}
### Exercise 7
Convert exercise7_data from "Long" format to "Wide" format
#### Exercise 7 Solution
- Open [exercise7_data.sav](data/exercise7/exercise7_data.sav)
- Select Data -> Restructure to open the Wizard
[![](./media/spss-exer-image26.png){width="5.521132983377078in" height="5.363636264216973in"}](./media/spss-exer-image26.png){target="_blank"}
- Identifier Variable(s): ID
- Index Variable(s): Index1
- Select "Next"
[![](./media/spss-exer-image27.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image27.png){target="_blank"}
- Sort the current data? Yes
- Select "Next"
[![](./media/spss-exer-image28.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image28.png){target="_blank"}
- Order of New Variable Groups: Group by original variable
- Leave the other options unchecked
- Select "Next"
[![](./media/spss-exer-image29.png){width="6.5in" height="6.314580052493438in"}](./media/spss-exer-image29.png){target="_blank"}
- Select "Restructure the Data Now" and "Finish"
[![](./media/spss-exer-image30.png){width="6.375in" height="6.193145231846019in"}](./media/spss-exer-image30.png){target="_blank"}
- The following message will appear, select "OK". Inspect the data and
save!
[![](./media/spss-exer-image24.png){width="5.375in" height="1.4115113735783027in"} ](./media/spss-exer-image24.png){target="_blank"}
[![](./media/spss-exer-image31.png){width="6.482638888888889in" height="3.345138888888889in"} ](./media/spss-exer-image31.png){target="_blank"}
### Exercise 8
Open [exercise8_data.sav](data/exercise8/exercise8_data.sav)
**Part 1**: Investigate the variable attributes. Determine which
variables are categorical variables (nominal and ordinal), and which
variables are continuous (scale).
Obtain the appropriate descriptive statistics for each variable.
Remember, continuous variables should be investigated with Descriptives
and categorical variables should be investigated with frequency tables.
*Hint*: Select more than one variable in the Analyze -> Descriptive
Statistics -> Descriptives", or Analyze -> Descriptive Statistics -> Frequencies dialog boxes.
**Part 2**: Assess the distribution of the Occupational Prestige Score
("prestg80") with both a histogram (normal curve displayed) and a Q-Q
plot. Is the assumption that the population of Occupational Prestige
Scores is normally distributed reasonable?
**Part 3**: Compare the average highest year of school completed
("educ") for males and females.
*Hint*: First split the file by "sex" (Data -> Split File), then
calculate the descriptive statistics. Be sure to return to the Split
File menu when you are done with this question and return the dialog
box to "Analyze all cases".
**Part 4**: Produce a pie chart for the variable "region". (We didn't
cover this, you can use either Chart Builder or Legacy Dialogs.)
#### Exercise 8 Solution
Open the dataset [exercise8_data.sav](data/exercise8/exercise8_data.sav)
**Part 1**
Investigate the variable attributes. Determine which
variables are categorical variables (nominal and ordinal), and which
variables are continuous (scale).
- Select the "Variable View" tab
- Investigate the labels and measure of each variable
[![](./media/spss-exer-image32.png){width="6.637844488188977in" height="3.8053280839895014in"} ](./media/spss-exer-image32.png){target="_blank"}
Obtain the appropriate descriptive statistics for each variable in the
dataset. Remember, continuous variables should be investigated with
5-point summary descriptives and categorical variables should be
investigated with frequency tables.
- Select Analyze -> Descriptive Statistics -> Descriptives
- Select the following variables: sibs, childs, age, educ, paeduc,
maeduc, speduc, prestg80
[![](./media/spss-exer-image33.png){width="4.870138888888889in" height="3.077777777777778in"}](./media/spss-exer-image33.png){target="_blank"}
- Select "OK"
- Notice there are only 519 respondents that have valid data points
for all of the continuous variables.
[![](./media/spss-exer-image33b.png){width="4.870138888888889in" height="3.077777777777778in"}](./media/spss-exer-image33b.png){target="_blank"}
Frequency Tables:
- Select Analyze -> Descriptive Statistics -> Frequencies
- Select the following variables: sex, region, race, happy, life,
occcat80
[![](./media/spss-exer-image34.png){width="4.870138888888889in" height="3.077777777777778in"}](./media/spss-exer-image34.png){target="_blank"}
- Investigate the output
**Part 2**: Assess the distribution of the Occupational Prestige Score
("prestg80") with both a histogram (normal curve displayed) and a Q-Q
plot. Is the assumption that the population of Occupational Prestige
Scores is normally distributed reasonable?
- Histogram in Legacy Dialogs
- Select Graphs -> Legacy Dialogs -> Histogram
- Variable: prestg80
- Check box to display normal curve
- Select OK
[![](./media/spss-exer-image35.png){width="4.979166666666667in" height="4.484458661417323in"}](./media/spss-exer-image35.png){target="_blank"}
Investigate the output
[![](./media/spss-exer-image36.png){width="3.9791666666666665in" height="3.177302055993001in"}](./media/spss-exer-image36.png){target="_blank"}
- Q-Q Plot
- Select Analyze -> Descriptive Statistics -> Q-Q Plots
- Select the variable prestg80
- Select OK
[![](./media/spss-exer-image37.png){width="4.135010936132983in" height="2.990590551181102in"}](./media/spss-exer-image37.png){target="_blank"}
- Investigate the output
- Look to see how well the plotted points follow the solid diagonal
line
- It is particularly important to pay attention to the "tails", or the
left most and right most points to see if they follow the line
[![](./media/spss-exer-image38.png){width="4.264880796150481in" height="3.40544072615923in"}](./media/spss-exer-image38.png){target="_blank"}
**Part 3**: Compare the average highest year of school completed
("educ") for males and females.
- Set up the dataset such that the output is split by groups based on
sex
- Select Data -> Split File
- Select "Compare Groups"
- Select the variable sex for "Groups Based on:"
- Select "OK"
[![](./media/spss-exer-image39.png){width="4.628517060367454in" height="3.5987948381452317in"}](./media/spss-exer-image39.png){target="_blank"}
- Compute the 5-Point Summary Descriptives for "educ"
- Select Analyze -> Descriptive Statistics -> Descriptives
- Select the variable "educ"
- Select "OK"
[![](./media/spss-exer-image40.png){width="4.664867672790901in" height="2.9480522747156606in"}](./media/spss-exer-image40.png){target="_blank"}
- Investigate the output
- Males have an average of 13.23 years of education
- Females have an average of 12.63 years of education
[![](./media/EDUC_sex.png)](./media/EDUC_sex.png){target="_blank"}
- Turn the split file feature off
- Select Data -> Split File
- Select "Analyze all cases, do not create groups" (Alternatively,
"Reset" can be selected)
- Select "OK"
**Part 4**: Produce a pie chart for the variable "region". Use "Legacy
Dialogs".
- Select Graphs -> Legacy Dialogs -> Pie
- Under "Data in Chart Are" select "Summaries for groups of cases"
- Select "Define"
[![](./media/spss-exer-image41.png){width="2.5505949256342957in" height="1.8178455818022747in"}](./media/spss-exer-image41.png){target="_blank"}
- Select the variable "region" for "Define Slices by:"
- The default for "Slices Represent" is "N of cases", and leave this
at the default
- Select "OK"
[![](./media/spss-exer-image42.png){width="4.602543744531934in" height="4.990481189851269in"}](./media/spss-exer-image42.png){target="_blank"}
- Investigate the output
[![](./media/spss-exer-image43.png){width="4.5in" height="3.93in"}](./media/spss-exer-image43.png){target="_blank"}
## Additional Exercises
### Exercise A1 -- Categorical Data Analysis
**Question 1**
Open [exercisea1_data](data/exercisea1/exercisea1_data.sav). What percent of
respondents said they were "Very Happy"? What about "Not too happy"? "Pretty
happy"? Use a graph to display the variable.
**Question 2**
Do women appear to be more or less happy than men? Would you say this
apparent relationship is statistically significant?
**Question 3**
Create a scatter plot of respondent's education vs. their spouses'
education. Does this relationship appear to be linear? Add a linear
regression line to the plot. Inspect the correlation between the
respondent's education and their spouses' education. Is this correlation
positive or negative? Is it statistically significant.
### Exercise A1 Solution
**Question 1**
Open [exercisea1_data](data/exercisea1/exercisea1_data.sav). What percent of
respondents said they were "Very Happy"? What about "Not too happy"? "Pretty
happy"? Use a graph to display the variable.
**Solution: **
- We have one categorical variable that we would like to
investigate...check the all on one page handout!
- Analyze -> Descriptive Statistics -> Frequencies
[![](./media/spss-exer-image44.png){width="4.864583333333333in" height="3.0833333333333335in"}](./media/spss-exer-image44.png){target="_blank"}
- Enter the following information
- Select happy
- Select Charts
- Under Chart Type, select Bar Chart
- Under Chart Values, select Percentages
- Select Continue
- Select the box for Display Frequency Tables
- Select OK
[![](./media/spss-exer-image45.png){width="2.8333333333333335in" height="2.9791666666666665in"}](./media/spss-exer-image45.png){target="_blank"}
[![](./media/Happy.png){width="4.84in" height="3in"}](./media/Happy.png){target="_blank"}
[![](./media/spss-exer-image48.png){width="6.5in" height="5.197916666666667in"}](./media/spss-exer-image48.png){target="_blank"}
**Question 2**
Do women appear to be more or less happy than men? Would you say this
apparent relationship is statistically significant?
**Solution: **
- We are going to compare two categorical variables. From out handout,
we will use Pearson Chi-Square crosstabs to do this!
- Analyze -> Descriptive Statistics -> Crosstabs
<!-- -->
- Enter the following information
- Rows: sex
- Columns: happy
[![](./media/spss-exer-image49.png){width="4.628517060367454in" height="3.82830271216098in"}](./media/spss-exer-image49.png){target="_blank"}
- Select the Statistics button
- Check the box for Chi-Square
- Select Continue
[![](./media/spss-exer-image50.png){width="3.0051410761154855in" height="3.3971161417322833in"}](./media/spss-exer-image50.png){target="_blank"}
- Select the Cells button
- Check the box for Row under Percentages (leave the rest as
default)
- Check the box for Adjusted Standardized Residuals under
Residuals (leave the rest as default)
- Select Continue
- Select the box for Display Clustered Bar Charts
- Select OK
[![](./media/spss-exer-image51.png){width="3.4375in" height="3.8125in"}](./media/spss-exer-image51.png){target="_blank"}
- The Pearson Chi-Square statistic indicates that the differences
between men and women are statistically significant
(pvalue/asymptotic significance\<.05).
- The residuals, clustered bar chart, and row percentages can tell us
where these differences arise
- An adjusted standardized residual (absolute value) greater than
two shows us where the differences between groups occur. Here,
we see that "not too happy" for males and females has a residual
greater than 2.
- The row proportions indicate that there is a higher proportion
of females that responded "not too happy" when compared to
males.
- The clustered bar chart also shows that there are greater
numbers of women that indicate that they are "not too happy".
[![](./media/Sex_Happy.png){width="6in" height="5.33in"}](./media/Sex_Happy.png){target="_blank"}
[![](./media/spss-exer-image55.png){width="4.507012248468941in" height="3.6041666666666665in"}](./media/spss-exer-image55.png){target="_blank"}
**Question 3**
Create a scatter plot of respondent's education vs. their spouses'
education. Does this relationship appear to be linear? Add a linear
regression line to the plot. Inspect the correlation between the
respondent's education and their spouses' education. Is this correlation
positive or negative? Is it statistically significant.
**Solution:**
- Graphs -> Legacy Dialogues -> Scatter/Dot
- Simple Scatter and Define
- Enter the following information
- Y Axis: speduc
- X Axis: educ
- Select OK
- Check the output for the scatter plot
- Double click the plot in the Output Viewer to open Chart Editor
- Select the button for Add Fit Line at Total (first bar above the
plot, axis with straight line plot)
- Select Linear Fit, Apply, Close
- Close out of chart editor (red X in the upper right corner) and the
updated chart will appear in the Output Viewer.
[![](./media/spss-exer-image56.png){width="3.53125in" height="2.0625in"}](./media/spss-exer-image56.png){target="_blank"}
[![](./media/spss-exer-image57.png){width="5.625in" height="6.114583333333333in"}](./media/spss-exer-image57.png){target="_blank"}
[![](./media/spss-exer-image58.png){width="3.6666666666666665in" height="5.270833333333333in"}](./media/spss-exer-image58.png){target="_blank"}
[![](./media/spss-exer-image59.png){width="3.966178915135608in" height="4.2551870078740155in"}](./media/spss-exer-image59.png){target="_blank"}
[![](./media/spss-exer-image60.png){width="4.316828521434821in" height="3.45207895888014in"}](./media/spss-exer-image60.png){target="_blank"}
- Analyze -> Correlate -> Bivariate
- Enter the following information
- Variables: educ, speduc
- Correlation coefficients: Pearson, Spearman
- Significance: Two Tailed
- Check the box for Flag significant correlations
- Select OK
- The output indicates that the correlation between education and
spouses' education is positive and statistically significant.
[![](./media/spss-exer-image61.png){width="3.992153324584427in" height="3.5929374453193352in"}](./media/spss-exer-image61.png){target="_blank"}
[![](./media/Corr.png){width="6in" height="7in"} ](./media/Corr.png){target="_blank"}
### Exercise A2 -- Continuous Data Analysis
Open [exercisea2_data.sav](data/exercisea2/exercisea2_data.sav).
**Research Question 1:** Is there a relationship between a student's
socio-economic status and whether or not the student would participate
in a racially insensitive joke?
What techniques would you use to investigate the relationship between
SES and whether or not a student would participate in a racially
insensitive joke?
Investigate this relationship graphically and statistically. What did
you find?
**Research Question 2:** Is there a relationship between a student's
race and their post intervention behavior intention scale?
What techniques would you use to investigate a student's race and their
post intervention behavior intention scale?
Investigate this relationship graphically and statistically. What did
you find?
**Research Question 3:** Is there a relationship between the race of a
student and their socio-economic status?
What techniques would you use to investigate the relationship between
race and SES?
Investigate this relationship graphically and statistically. What did
you find?
### Exercise A2 Solution
**Research Question 1:** Is there a relationship between a student's
socio-economic status and whether or not the student would participate
in a racially insensitive joke?
What techniques would you use to investigate the relationship between
SES and whether or not a student would participate in a racially
insensitive joke?
**ANSWER:** SES is an ordinal variable with 4 levels that should be
treated as a categorical variable. Whether or not a student would
participate in a derogatory joke is measured with the "Joke" variable
and it is a categorical variable. The appropriate statistical procedure
to use to compare two categorical variables is the Chi-Square Test of
Independence (crosstabs). The appropriate graphical procedure is a
clustered bar chart.
Investigate this relationship graphically and statistically. What did
you find?
**ANSWER:** There is not a statistically significant relationship
between "SES" and "Joke". We do not have enough evidence to say that
there is a relationship between a student's socio-economic status and
whether or not the student would participate in a racially insensitive
joke.
[![](./media/SES_joke.png){width="8in" height="8in"}](./media/SES_joke.png){target="_blank"}
[![](./media/spss-exer-image66.png){width="8in" height="6in"}](./media/spss-exer-image66.png){target="_blank"}
**Research Question 2:** Is there a relationship between a student's
race and their post intervention behavior intention scale? What
techniques would you use to investigate a student's race and their post
intervention behavior intention scale?
**ANSWER:** "Race" is a categorical variable that can take on up to 9
values and a student's post intervention behavior intention scale
("BIndBehint_post") is a continuous variable. The appropriate
statistical procedure is a one-way ANOVA. The appropriate graphical
procedure is a side-by-side box plot.
Investigate this relationship graphically and statistically. What did
you find?
**ANSWER:** There is not a statistically significant relationship
between "Race" and "BIndBehint_Post". We do not have enough evidence to
say that there is a relationship between a student's race and their post
intervention behavior intention score.
[![](./media/BEHAV_race.png){width="8in" height="6in"}](./media/BEHAV_race.png){target="_blank"}
[![](./media/spss-exer-image70.png){width="7.6in" height="7in"}](./media/spss-exer-image70.png){target="_blank"}
**Research Question 3:** Is there a relationship between the race of a
student and their socio-economic status? What techniques would you use
to investigate the relationship between race and SES?
**ANSWER:** "Race" and "SES" are both categorical predictors. The
appropriate statistical procedure to use to compare two categorical
variables is the Chi-Square Test of Independence (crosstabs). The
appropriate graphical procedure is a clustered bar chart.
Investigate this relationship graphically and statistically. What did
you find?
**ANSWER:** There is a statistically significant relationship between
"Race" and "SES". There is a significant relationship between a
student's SES and race. Notice the error message under the Chi-Square
results table---in this case, we need to verify our statistically
significant results with Fisher's Exact Test (pvalue=.025).
[![](./media/SES_race.png){width="8in" height="6in"}](./media/SES_race.png){target="_blank"}
[![](./media/spss-exer-image73.png){width="6in" height="5in"}](./media/spss-exer-image73.png){target="_blank"}
### Exercise A3 -- Methodology Choice Practice
In the below questions first determine what the appropriate analysis
method is based on the variables of interest and carry out these methods
within SPSS.
**A)** From [exercisea3_data_a.sav](data/exercisea3/exercisea3_data_a.sav)
1. Is there a relationship between sex (gender) and job category
(jobcat)?
2. Is there a relationship between job category (jobcat) and minority
status (minority)?
3. Is there a relationship between job category (jobcat) and salary
(salary)?
4. Is there a relationship between experience (jobtime) and salary
(salary)?
**B)** From [exercisea3_data_b.sav](data/exercisea3/exercisea3_data_b.sav)
1. Is there a relationship between general happiness (happy) and
occupational prestige score (prestg80)?
2. Is there a relationship between age (age) and occupational prestige
score (prestg80)?
3. Is there a relationship between general happiness (happy) and
perception of life being exciting or dull (life)?
**Exercise A3 Hints!**
**A)**
1. Two Categorical VariablesClustered Bar Charts, Pearson Chi-Square
Crosstabs
2. Two Categorical VariablesClustered Bar Charts, Pearson Chi-Square
Crosstabs
3. Categorical DV (3+Groups) & Continuous DVOne Way ANOVA, Side-by-Side
Boxplot
4. Two Continuous VariablesPearson Correlation Coefficient, Scatterplot
**B)**
1. Categorical DV (3+Groups) & Continuous DVOne Way ANOVA, Side-by-Side
Boxplot
2. Two Continuous VariablesPearson Correlation Coefficient, Scatterplot
3. Two Categorical VariablesClustered Bar Charts, Pearson Chi-Square
Crosstabs
### Exercise A4 -- Case Study I: Salary (Regression)
Open [exercisea4_data](data/exercisea4/exercisea4_data.sav).
**Background**
This data set contains information on faculty from Bowling Green State
University for the 1993 to 1994 (DeMaris 2004). The purpose of the
exercises below is to investigate whether there was any evidence of
gender inequality in faculty salaries at BGSU.
**Activity 1: Describing the Dataset**
Investigate the 'Faculty' data set using descriptive statistics, one
variable graphing procedures, and bivariate procedures.
**Investigate 'Salary' with descriptive statistics, box plot, and
histogram**
[![](./media/spss-exer-image74.png){width="4.870138888888889in" height="3.077777777777778in"}](./media/spss-exer-image74.png){target="_blank"}
[![](./media/Salary.png){width="6.16875in" height="1.0777777777777777in"}](./media/Salary.png){target="_blank"}
[![](./media/spss-exer-image76.png){width="5.49375in" height="4.947916666666667in"}](./media/spss-exer-image76.png){target="_blank"}
[![](./media/spss-exer-image77.png){width="4.277867454068241in" height="3.415810367454068in"}](./media/spss-exer-image77.png){target="_blank"}
[![](./media/spss-exer-image78.png){width="4.329816272965879in" height="4.666395450568679in"}](./media/spss-exer-image78.png){target="_blank"}
[![](./media/spss-exer-image79.png){width="4.316828521434821in" height="3.446920384951881in"}](./media/spss-exer-image79.png){target="_blank"}
**Investigate 'Gender' with a frequency table and bar chart**
[![](./media/spss-exer-image80.png){width="4.870138888888889in" height="3.077777777777778in"}](./media/spss-exer-image80.png){target="_blank"}
[![](./media/spss-exer-image81.png){width="2.83125in" height="2.9743055555555555in"}](./media/spss-exer-image81.png){target="_blank"}
[![](./media/Male.png){width="6in" height="3in"}](./media/Male.png){target="_blank"}
[![](./media/spss-exer-image83.png){width="4.576569335083114in" height="3.6653280839895013in"}](./media/spss-exer-image83.png){target="_blank"}
**Investigate the average salary for males and females separately
(descriptive statistics, histogram, side-by-side box plot)**
Remember to split the file by the gender variable ('male').
[![](./media/spss-exer-image84.png){width="4.610416666666667in" height="3.584722222222222in"}](./media/spss-exer-image84.png){target="_blank"}
[![](./media/spss-exer-image85.png){width="4.870138888888889in" height="3.077777777777778in"}](./media/spss-exer-image85.png){target="_blank"}
[![](./media/Salary_gender.png){width="6.5in" height="2in"}](./media/Salary_gender.png){target="_blank"}
The descriptive statistics table above indicates that males earn more
than females on average.
[![](./media/spss-exer-image87.png){width="4.5in" height="4in"}](./media/spss-exer-image87.png){target="_blank"}
[![](./media/spss-exer-image88.png){width="5in" height="5in"}](./media/spss-exer-image88.png){target="_blank"}
[![](./media/spss-exer-image89.png){width="5in" height="5in"}](./media/spss-exer-image89.png){target="_blank"}
Also remember to remove the 'Split File' option.