Biased Sample Selection - Darren's Public Notes

# 35. Statistical manipulation - Biased Sampling - Selective bias ## 35.5. Methodology/Refinements/Sub-species ### 35.5.2. Biased sample selection This is the accidental or deliberately biased selection of samples by researchers in order to support a particular hypothesis, for example, "this drug has this benefit". This is a distortion of statistical analysis, resulting from a non-random method of collecting samples. If the selection bias is accidental and is not taken into account, then the conclusions drawn may well be wrong. When the bias is deliberate, the main objective is to prove a fraudulent proposition. The bias means that some members of the sample population are less likely to be included than others. It results in a non-random sample of a population in which some were favoured over others, because of a particular characteristic which the manipulator knows will tend to result in a pre-ordained result. #### 35.5.2.1. Examples Taking a poll about healthcare whilst standing outside a hospital would probably bias the results, because it biases the selection of participants. This is an instance of the "healthy user" and "geographical" biases (below). In the same way, polling by fixed line telephone at dinner time every day will bias the sample, because it will only include those who have a fixed line telephone (this already biases economic status and age group).It is prejudiced by time, because it only focuses on samples who are habitually at home at a certain time, thus excluding some shift workers, certain professions and some age groups. Below is a typical case of sample exclusion bias. A classic case of selection bias is called the "caveman effect." Much of our archaeological understanding of prehistoric man comes from caves. This includes cave paintings made tens of thousands of years ago. They survived because they were made inside protected caves. Any other evidence of this period which might have existed on trees, cliff walls, and animal skins has long been eroded by the weather. In a similar way, other structural evidence of fire pits, burial sites, etc. are more likely to remain intact up to the present day in caves. Prehistoric people are therefore associated with caves, because that is where the data still exists. But that does not necessarily mean that most of them lived in caves. This is an instance of pre-screening and geographical bias. #### 35.5.2.2. Types of biased sample selection There are several ways in which sampling can be biased: - **Healthy user bias:** This is where the sample population tends to be healthier or less healthy than the general population. - **Exclusion bias:** This occurs when a methodology excludes certain subsets of the population on the basis of economic, social, demographic or geographical bias. It can also occur when a sample group is migratory and a study fails to follow up with the original sample group because it has left the area. - **Pre-screening:** This is where participants are attracted to or rejected from the sample population by means of some deliberate or accidental screening mechanism. For example, looking for volunteers of a certain age group and socio-economic background to prove that smoking is bad for you, may pre-screen out younger, wealthier and healthier members of society. The conditions placed on the sample population act as an unknown screen, which can bias the sample population in unexpected ways. - **Geographical bias:** In this case a sample is biased towards a particular geographic area. Any street poll for instance, will certainly exclude the large proportion of the population that are disabled in some way, and tend to favour sampling of those who are not. So asking a question about aid for the disabled to a sample population of able-bodied public may elicit an unrepresentative response. #### 35.5.2.3. Conclusion The biased sample can lead to an over or under representation of a particular parameter in the sample population. A biased sample causes problems because any statistical conclusion based on that sample has the potential to be consistently incorrect. In practice, almost every sample is somewhat biased because it is almost impossible to ensure a perfectly random sample. However, if the degree of under / over representation is small, the sample can be treated as a reasonable approximation to a random sample. In addition, if the group that is under-represented isn't that different to the other groups in the variable being measured, then a random sample can still be a very similar to a truly random and completely representative sample. The fraudulent use of deliberate bias in sample selection can lead to pre-determined results and mislead a manipulated victim. It is very difficult for a victim to determine if and where the manipulation has taken place, especially when the results look credible and the source seems respectable - like a pharmaceutical company or a large chemical corporation.