Misreport Error - Darren's Public Notes

# 37. Misreport errors ## 37.1. Definition This is the deliberate misreporting or obfuscation of the "estimated error" in a statistical analysis in order to manipulate the results. ### 37.1.1. Errors and Probabilities In most studies and statistical analyses it is impractical to evaluate an entire population and so a random sample is taken. The size of that sample is important, as is the randomness of the sample. Obviously a larger sample will give us greater confidence that the results are representative of the wider population. The level of confidence can actually be determined using some well-developed and proven mathematical techniques, like the central limit theorem. Confidence is expressed as a probability of the true result (for the larger group) being within a certain range of the estimate (the figure for the smaller sample group). This is the "plus or minus" figure often quoted in statistical surveys. The probability part of the confidence level is usually not mentioned. It is often assumed to be some standard number like 95% or 99%. The two numbers are related. If a survey has an estimated error of +/- 5% at 95% confidence, it also has an estimated error of +/- 6.6% at 99% confidence. The actual relationship is that +/- x% at 95% confidence is always +/- 1.32*x at 99% confidence. So, the smaller the estimated error, the larger the required sample at a given confidence level - this is pretty intuitive, really. The bigger the (random) sample, the more trustworthy is the result. ### 37.1.2. Confusion is easy However, because the confidence figure is usually omitted in study results, most people assume that there is a 100% certainty that the true result is within the estimated error. This is not mathematically correct and can make a huge difference in interpreting a study result. A poll with perfectly unbiased sampling and truthful answers has a mathematically determined margin of error, which only depends on the number of people polled. However, often only one margin of error is reported for a whole survey, whilst the survey actually also evaluates sub-groups. When results are reported for population sub-groups (with obviously smaller sample sizes), a larger margin of error will apply, but this is often not made clear in study results. For instance, a survey of 1000 people may contain 100 people from a certain ethnic or economic group. The results which focus on that group will be much less reliable than results for the full population. If the margin of error for the full sample was 4%, say, then the margin of error for such a subgroup could be much greater, at around 13%. This can really become very significant in how results are interpreted. Often the misreporting of errors is just sloppy work or perhaps the researcher is assuming something about the reader's knowledge (mistakenly). However, it is an easy option for a manipulator because almost no member of the public will realise that they are being manipulated. ## 37.2. Persistence Long. It's unlikely that most people are going to get their heads around this anytime soon. ## 37.3. Accessibility Low. As a method of manipulation, misusing or allowing misinterpretation of statistical errors is only for the specialist. ## 37.4. Conditions/Opportunity/Effectiveness It is highly effective because only a statistician will realise that errors are not being properly represented or are being deliberately unreported. It is easy to translate results into headlines because few people even read articles these days, never mind the statistical small print in a whole report. However, it is only for use by a statistician who knows how to carefully bend the results to convey a particular message with obscure statistical caveats. ## 37.5. Methodology/Refinements/Sub-species None known ## 37.6. Avoidance and Counteraction To recognise this you need to understand quite a lot about statistical methods, or better still, hire an honest statistician to audit a statistical report in detail.