In a typical maths text book,
data are ‘clean’, easy to use and rather boring. The example below comes from
Chapter 10 of STP National Curriculum Mathematics 9B which just happened, along
with a calculus text, to be within easy reach of my computer as I was typing.
Since there wasn't any data in the calculus text, the STP text
became the scapegoat. For the most part, I quite like the STP National
Curriculum Maths series and I am sure that I could have made my point equally
well with any of 10 or 15 other texts on my shelves.
Scatter
graphs are introduced by considering the hypothesis: “Tall people have larger
feet than short people.” The authors give the following data:
Person
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
Height
(cm)
|
160
|
166
|
174
|
158
|
167
|
161
|
170
|
171
|
166
|
163
|
164
|
168
|
Shoe
size (continental)
|
36
|
38
|
40
|
37
|
37
|
38
|
42
|
41
|
40
|
39
|
37
|
39
|
I used an Excel spreadsheet to obtain the following graph. (The authors’ graph is similar but does not include a regression line.)
The
authors go on to say that “The points do not all fit on a straight line… Taller
people tend to have larger feet but the relationship between height and shoe
size is not strong enough to justify the original statement.” Who
could disagree? My Excel spreadsheet calculates the correlation coefficient, r
= 0.73, which is indicative of strong correlation. As a correction, we simply
need to insert the phrase “tend to” into the original hypothesis to get, “Tall
people tend to have larger feet than short people.” A
good teacher will advise the students that correlation is not causation. When x
and y show strong correlation on a scatter graph x could cause y, y could cause
x, or both could be the effect of an underlying cause. Most students will
understand that height is not the cause of shoe size but that genetics and
nutrition underlie both.
Having
dispensed above – in reverse order–with ‘boring’ and ‘easy to
use’ I move on to ‘clean.’ Cleanliness is a complex topic, e.g see http://en.wikipedia.org/wiki/Data_cleansing .
My point
is that the text (and presumably the teacher and the class) just assume that
‘height’ and ‘shoe size’ are well defined. But is that the case? How
were the heights obtained? Are all the heights up-to-date? Was an effort made
to prevent slouching or standing on tippy-toes? Could it be that some people
were wearing their shoes while others were in bare feet? Were some heights measured
in feet and inches and afterwards converted to centimeters? What about hair
styles? If someone has a ‘bee-hive’ hairstyle how do we measure her
height?
Amy
Winehouse: http://en.wikipedia.org/wiki/File:WinehouseLA.jpg
Next
consider the shoes. Currently, I have three pairs of shoes from
different manufacturers. Their sizes are 45, 45, and 47 but they all fit! It
is unlikely that everybody was wearing the same shoe. Unless it is part of a
school uniform or they were all interviewed while shopping at Bata.
Of course it is possible interviewees were asked for their average shoe size. Is that a
statistic that you know about yourself? An interested student could find a
project here: what is the percentage of people who know the range and central
tendency (mean, median and/or mode) of their shoe size?
On the other hand, real data tend to be untidy, more difficult to
interpret and potentially more exciting. I hope to follow up
tomorrow by looking at some data from TIMSS (Trends in International
Mathematics and Science Study). TIMSS is an assessment of the maths and science
skills of fourth grade and eighth grade students around the world.
No comments:
Post a Comment