next up previous contents
Next: Probability Up: Producing Data Previous: Surveys - How to   Contents

Frequently Asked Questions

  1. How many is representative in a survey? Here is an actual email received from an admirerer.
    Your web-site was very informative. Can you answer a question for us regarding percentage of returns. If a survey was mailed to 4,000 households asking three simple questions and included a postage-paid return postcard, how many responses would be considered representative? Any guidance you could give would be appreciated.

    The ``number of returns'' is not what makes a sample representative. For example, suppose that you sent the survey to 4,000 households, and 2,000 returned the survey, but these were all from families with no children.

    It is very difficult in a self-selection survey (only those households that bother to return the survey will do so) to ensure 'representativeness'. The only way to that the responses will be representative of your population is if the decision to return a survey is unrelated to any of the survey responses or to other attributes in the population.

    You can do some cross checks to avoid obvious biases, e.g. check the household sizes of the respondents against the distribution in the population, etc, but you can never be sure of catching all possible problems.

  2. Randomization = Representativeness?? A student wrote:
    I was just going over the notes and was thinking a bit harder about the soup analogy and am a little confused. . . in the notes, there are two instances where we talked about sample size, at one point it says ``random sampling tends to give representative samples if our sample sizes are big enough'' . . but then I remember in class when you spoke about the soup analogy you said that it doesn't matter the sample size as long as it is the ``key'' number. . .however, I was thinking that if you had a bowl of soup or a large container of soup, even though it is mixed properly, of course what ever sample size you take of each, the soup will still be the same but isn't that because all the particles of the soup are the same?. . . if you were to take a survey, everybody is DIFFERENT, so even though it's thoroughly mixed, every ``particle'' is different, so why wouldn't a larger sample size be better?

    While it is true that a random sample is representative, regardless of sample size, this really applies in some sort of 'average' sense. For example, selecting one person at random from a population, will over time, tend to select people in the same proportion as seen in the population, but to say that a single person is 'representative' of the population is a bit 'misleading'.

    Consequently, that is why in one part of the notes it stated ``if sample sizes are big enough''. I've fixed the notes at this point to clarify for the next time I teach this class - thanks for pointing this out.

    The soup analogy is not perfect - it is true that the particles are all the same, but the idea that you have a soup with various constitents that needs to be properly stirred (randomization) before taking a sample to ensure representativeness holds. Any ideas what to use rather than salt?

    Furthermore she wrote

    See, my reasoning is that, ideally, you want to survey EVERYBODY, right?. . . but that's not possible, so you want to take a sample, so wouldn't it make sense that the closer the number of the sample size is to your whole population, the more representative it would be of your pop, becuase you're getting closer to pop. size with the more ppl you survey?. . . does that make sense?. . I'm just a little bit confused with this point. . . because I think of an analogy if you have 50 people just say, you want to determine how many hours they study. . . now everybody studies differnt amt of hours. now if you have a sample size of 5 out of 50 that's only 10% of the ppl's opinion. . now ideally you want to ask everybody, but you can't so you need to take a sample size. . . now if you take 25 people, that's 50% of the ppls opinion, that's closer to the total pop. so it would be more representative right?, and then if you take 40 people out of 50, that's even better, so doesn't a larger sample size give a more representative picture?

    There are two issues that are going on here and some will be explored in more detail in about 2 weeks when I start talking about sampling distributions. First, everytime you take a sample, you would likely get a different result. You saw this when you sampled from StatVillage - your two samples lead to different estimates of the true mean income. As you sample size increases, the variation among the estimates over repeated sampling from the same population should decrease. Using your example above, estimates from samples of 40 taken from a population of 50 will be less variable than estimates from samples of size 5 taken from a population of 50. The variation of estimates around the true value is called 'precision' and sample size controls precision.

    What makes a sample representative? As you mentioned above, you believe that a sample of 25 is more representative of the population that a sample of size 5. However, suppose that I only ask the 25 males in your population, but select 5 people at random. Which is more representative? I think you would agree that selecting 25 people without randomization leads to all sorts of risk that your sample will be biased in some way - that is correct - it is randomization that tries to prevent these biases from creeping in. If both samples are selected at random, then the large sample will have less variability in its estimate, i.e. it will be more precise, but not any more 'representative' that the sample of size 5.

    It seems weird to think that a sample of 5/50 is just as representative as 25/50, but part of the problem is the small numbers. How about a sample of 1000 in Canada vs 1000 in US. With such large sample sizes, it seems likely that the various groups of people will occur in your sample in roughly the same proportion as in the population - this is true even though the US has 10x more people.

    It seems weird, only because people use the term 'representative' rather loosely to mean both 'similar in composition to the actual population' and 'providing an estimate close to the true value'. The latter is technically precision.


next up previous contents
Next: Probability Up: Producing Data Previous: Surveys - How to   Contents
Copyright 2008: Carl J. Schwarz cschwarz@stat.sfu.ca