Studying Studies
- Noah Pessin
- Feb 12
- 3 min read
Hey Everybody,
This last semester, I took an elective Computer Science course called Data Science (you can probably guess why) and I wanted to share some of the key concepts and topics I learned now that the class is over. So, my next few posts will be about the most important of these topics, starting with how to set up a data collection project.
Imagine you are tasked with determining how well a medical treatment works. You have to start somewhere, so how do you collect data for analysis? Well, there are two ways: You can conduct an experiment, where you, as the researcher, decide who, out of the subjects, gets the treatment and who doesn’t. Or, you can conduct an observational study, where the researcher plays no role in deciding who gets the treatment.Â
An experiment is beneficial because the researcher controls and manipulates the variables in a more controlled setting. But an observational study is best for randomization and the results come in a natural setting.
An example of an experiment is when the researcher gives ten people the medical treatment and gives ten people nothing and makes them live in a basic bedroom for a week. This means that results will come directly from the treatment itself. An observational study looks like a researcher giving a nurse 100 treatment pills and 100 placebo pills and giving them out to 200 patients. Then, once the patients come back in a month, the nurse gets the results and gives them back to the researcher. The study’s data also may come from a previously collected dataset created for a completely different study (be careful because it may not be entirely legit).
There are also a few different ways to set up your experiment or study. Randomization is when the control (placebo) and treatment groups are completely random in order to factor out any biases caused by the researcher. A double-blind study is when neither the researcher nor the subjects (patients) know who gets the treatment or a placebo so the subjects don’t act differently and the researcher doesn’t have any bias based on who gets what.
But what if there's a confounding variable in your study? A confounding variable is something that is not primarily accounted for but affects both the treatment or the response (result). I will use the example of comparing how carrying a lighter (treatment) affects lung cancer (response). The confounding variable in this situation is whether or not the subject already smokes because it affects both if the subject carries a lighter on them and if they have lung cancer or not. We need to do something in order to distinguish this variable so it doesn’t disrupt the overall results.
One, you can do something called blocking, when you have a smaller subject group, where smokers and non-smokers are evenly distributed amongst the control and treatment groups prior to the study being conducted. This is because if they are randomly distributed and then skewed to one group, it will have a major effect on the results of the study. Two, you can stratify (stratification), when you have a large sample set, where you separate the confounding variable at the end of a study. The results may not be very skewed, hence the separating after the study, but it is still important to consider this variable in the results.
Both an observational study and an experiment have their pros and cons so it’s important to know when to use each one to collect the data you, as the researcher, or your client wants.
Your Study Buddy,
Noah