## Saturday, December 15, 2012

### Data in the Maths Text Book

In a typical maths text book, data are ‘clean’, easy to use and rather boring. The example below comes from Chapter 10 of STP National Curriculum Mathematics 9B which just happened, along with a calculus text, to be within easy reach of my computer as I was typing.

Since there wasn't any data in the calculus text, the STP text became the scapegoat.  For the most part, I quite like the STP National Curriculum Maths series and I am sure that I could have made my point equally well with any of 10 or 15 other texts on my shelves.

Scatter graphs are introduced by considering the hypothesis: “Tall people have larger feet than short people.”  The authors give the following data:

 Person 1 2 3 4 5 6 7 8 9 10 11 12 Height (cm) 160 166 174 158 167 161 170 171 166 163 164 168 Shoe size (continental) 36 38 40 37 37 38 42 41 40 39 37 39

I used an Excel spreadsheet to obtain the following graph.  (The authors’ graph is similar but does not include a regression line.)

The authors go on to say that “The points do not all fit on a straight line… Taller people tend to have larger feet but the relationship between height and shoe size is not strong enough to justify the original statement.”  Who could disagree? My Excel spreadsheet calculates the correlation coefficient, r = 0.73, which is indicative of strong correlation. As a correction, we simply need to insert the phrase “tend to” into the original hypothesis to get, “Tall people tend to have larger feet than short people.”  A good teacher will advise the students that correlation is not causation. When x and y show strong correlation on a scatter graph x could cause y, y could cause x, or both could be the effect of an underlying cause. Most students will understand that height is not the cause of shoe size but that genetics and nutrition underlie both.

Having dispensed above – in reverse order–with ‘boring’ and ‘easy to use’ I move on to ‘clean.’ Cleanliness is a complex topic, e.g see http://en.wikipedia.org/wiki/Data_cleansing .

My point is that the text (and presumably the teacher and the class) just assume that ‘height’ and ‘shoe size’ are well defined.  But is that the case? How were the heights obtained? Are all the heights up-to-date? Was an effort made to prevent slouching or standing on tippy-toes? Could it be that some people were wearing their shoes while others were in bare feet? Were some heights measured in feet and inches and afterwards converted to centimeters? What about hair styles?  If someone has a ‘bee-hive’ hairstyle how do we measure her height?

Next consider the shoes.  Currently, I have three pairs of shoes from different manufacturers. Their sizes are 45, 45, and 47 but they all fit!  It is unlikely that everybody was wearing the same shoe. Unless it is part of a school uniform or they were all interviewed while shopping at Bata.

Of course it is possible interviewees were asked for their average shoe size.  Is that a statistic that you know about yourself? An interested student could find a project here: what is the percentage of people who know the range and central tendency (mean, median and/or mode) of their shoe size?

On the other hand, real data tend to be untidy, more difficult to interpret and potentially more exciting.  I hope to follow up tomorrow by looking at some data from TIMSS (Trends in International Mathematics and Science Study). TIMSS is an assessment of the maths and science skills of fourth grade and eighth grade students around the world.