Study Designs in Epidemiology

In this article, we will learn about the main epidemiological study designs, including cross-sectional and ecological studies, case-control and cohort studies, as well as the more complex nested case-control, case-cohort designs, and randomised controlled trials.

Pranath Fernando


March 4, 2022

1 Introduction

Choosing an appropriate study design in Epidemiology is a critical decision that can largely determine whether a study will successfully answer your research question. A quick look at the contents page of a biomedical journal or even at the health news section of a news website is enough to tell you that there are many different ways to conduct epidemiological research.

In this article, we will learn about the main epidemiological study designs, including cross-sectional and ecological studies, case-control and cohort studies, as well as the more complex nested case-control and case-cohort designs. Finally we will look at randomised controlled trials, which is often considered the optimal study design, especially in clinical research. You will also develop the skills to identify strengths and limitations of the various study designs.

2 Epidemiological Study Designs

Not all study designs are born equal. It is widely accepted that results from certain types of studies, are more likely to reflect the truth than others. This is often called Hierarchy of Evidence and considers systematic reviews, meta-analysis, and randomized controlled trials, as the best sources of evidence.

While this is mostly true, it does not account for the quality of studies. Many would argue that a well conducted case-control study, can be more informative than a trial with methodological problems.

Websites that publish epidemiological studies include Google Scholar and PubMed.

3 Descriptive Study Designs

Descriptive Study Designs include case reports, case series cross-sectional studies and ecologic studies. As implied by the name, descriptive studies are used to describe patterns in a population. These patterns can be related to prevalence or incidence or trends. A descriptive study could be about a single individual, this is known as a case report. An example would be an unusual set of symptoms or clinical features, such as a child with visual disturbances accompanied by abdominal pain or it can be about separate individuals with unusual symptoms. This would be known as case series. Descriptive studies can also be based on populations, as is the case with cross-sectional studies. These studies look at a snapshot in a given moment in time.

Using the findings of these descriptive studies, epidemiologists can then develop hypotheses about the causes of disease patterns and about the factors that affect disease risk. To further examine these hypotheses epidemiologists must turn to analytic epidemiology. Where descriptive studies describe the occurrence of disease or its determinants within a population, analytic studies are concerned with how the determinant may influence the occurrence of disease among individuals.

4 Analytic Study Designs

An analytic study aims to quantify the effect of an intervention or an exposure on an outcome. To quantify the effect you need to know the rate of occurrence in a comparison group as well as in the exposed group. There are 2 types of analytic study: observational and experimental. In an observational study you simply observe the exposure and the disease status of each participant. You don’t try and change the exposure in any way. The 2 most common types of observational studies are case control studies and cohort studies. In a case control study, you would identify your cases when you initiate the study and then you find controls to compare them to. In this type of study you assess the exposure in the disease cases and compare them to the controls. A cohort study is different in that you identify a population first, for example nurses in England and then you would assess exposure, for example physical activity.

The second type of analytic study designs are referred to as experimental studies and you can think of these as analogous to treating people like lab rats. In this case the investigator is able to assign the exposure to individuals from a particular population after which the outcome is measured in exposed and then unexposed groups. Ideally the assignment of the exposure should be random. These types of experiments are called randomized controlled trials and they usually considered the gold standard in analytic epidemiology. An example of a randomized control trial would be assigning some people to receive a particular vaccine, and then other people no vaccine and then examining whether the vaccine works in reducing the occurrence of a given condition.

To summarise, we search for the determinants of disease first by relying on descriptive epidemiology to generate hypotheses about associations between exposures and outcomes and then analytic studies are undertaken to test specific hypotheses.

5 Ecological Studies

In many epidemiological studies data is collected from individuals who are compared to each other in terms of exposure and outcome. But individual data is not always available and can be difficult to collect. Alternatively, we can conduct an ecological study which does not require data from individuals.

The core principle of ecological studies is that it focuses on the comparison of groups rather than individuals. In other words, the unit of observation is the group. This implies that you analyze only aggregate level data which usually cannot be linked to a specific person. The size of the group can vary. You could use a school or a work site as a unit of analysis, but it could also be something much larger, such as a geographic region or an entire country. Sometimes, the unit of analysis is not geographically defined. It could be an occupation or even a time interval. The idea is the same though. You aggregate data on exposure and outcome at the group level and subsequently, you take a number of groups and use the aggregate data in your analysis.

If we did an ecological study and found an association between a group and a exposure does that imply the exposure caused the outcome? Not necessarily. There can be many alternative explanations for this association. From chance to bias and confounding that apply to all study designs. Association does not always imply causation. But there is also something specific to this study design that we should never forget. Assuming that associations between groups hold for individuals is called ecological fallacy or aggregation bias.

So why bother with ecological studies? Well, usually, an ecological study is the first step in exploring a research question and can generate hypothesis about disease etiology. Ecological studies typically use secondary data sources that are already available. So, they’re relatively inexpensive and quick to complete. Sometimes, the level of inference that you’re interested in is at the population level anyway. For example, when looking at the impact of tax increases on cigarette consumption, in which case, conducting an ecological study is absolutely fine. Ecological studies are also suitable when the variability of exposure within each group is limited. If there is little variation in individual chocolate consumption within each country, you can be more confident about the association shown in the graph. On the other hand, any ecological study is subject to the ecological fallacy and relies on secondary data collected for different purposes which may not always be comparable between countries or time periods. It might also be unclear if the exposure preceded the outcome.

Ecological studies can be a valuable tool in epidemiology especially when we have limited time and resources. However, we should not assume that group level associations are necessarily applicable to individuals.

6 Primary and Secondary Data

Data collection is crucial for epidemiological research. Whilst there are various methods to collect data, all information which is gathered can be categorised into two different types: primary and secondary.

Primary data is data that has been collected for the first time by an investigator. Primary data can be collected via questionnaires, interviews or tests. The advantage of primary data is that collection methods can be adapted to the objectives of the study. However, collecting primary data can be costly and time intensive, which may mean that it is not always feasible to obtain.

Secondary data, also known as existing data, is data which has already been collected for other purposes. Some examples of secondary data include census data, medical records, employment records and mortality registers. Secondary data is readily available and therefore cheaper to obtain. Moreover, secondary data often has large sample sizes and is collected both comprehensively as well as on a routine basis. This can be advantageous to researchers who want to compare data over time to detect population-level changes. On the other hand, the format of secondary data may not be suitable for the researcher. Similarly, data coverage could be insufficient or the type of data collected may not be tailored to the research objectives of the researcher.

Primary and secondary data have strengths and limitations. The type of data which a researcher chooses to obtain or use can depend on a variety of factors such as the research question at hand, the time and resources available for the project, as well as the skills of the researcher. Several studies make use of both primary and secondary data to fulfil different requirements of the research.

6.1 Some COVID-19 examples

The rapid developments during the first few months of the COVID-19 pandemic created an urgent need for data and analyses that would provide much needed information about this new disease.

Examples of primary data used for such analyses include (a) results of PCR tests among travellers leaving Wuhan early in the epidemic (e.g. all passengers in a repatriation flight) to assess the prevalence of infection among them; (b) data from seroprevalence studies in which a representative sample of the population is tested to measure antibodies against the SARS-CoV-2 virus; (c) data collected during clinical trials testing the effectiveness of potential treatments of COVID-19.

Examples of secondary data used for such analyses incude (a) data on the number of confirmed cases or/and deaths by country or region used to conduct ecological analyses; (b) data from the electronic health records of patients hospitalised for COVID-19 to investigate potential risk factors for worse COVID-19 outcomes.

7 Cross-sectional Studies

Cross-sectional studies, are usually described as snapshots of the population of interest, at a specific point in time. We use the word snapshot, because we assess both the exposure and the outcome at the same moment in time.

The same moment in time, may last for days or weeks, if you’re collecting data from large numbers of people. The point here, is that each individual is only assessed once, and there is no follow up. As a result, you can assess the prevalence of a disease or condition with a cross-sectional study, but not the incidence rate or risk, both of which require follow-up period. This is as you can imagine, the main limitation of cross-sectional studies. No information regarding the temporal relationship between exposure and outcome, can be collected and therefore, you’re unable to determine if the exposure preceded the outcome. This is why surveys are most frequently used for descriptive purposes. If you want to investigate causal associations, you would probably choose a different study design.

The fact that there is no follow-up, makes cross-sectional studies relatively cheap and easy to conduct. On the other hand, the lack of follow-up means that you only assess cases of the disease that are present at the time of the survey. Those who have been cured or have died of the disease, are not in the sample anymore, which limits our ability to measure the true extent of the disease. While the most frequent method of data collection in cross-sectional studies is through questionnaires, you could collect blood samples, use diagnostic tests or do physical measurements. As long as participants are only assessed once, it will still be a cross-sectional study.

Overall cross-sectional studies despite all their limitations, still play a key role in epidemiology and public health, and provide valuable data for both researchers and policy makers.

8 Case-control Studies

A case control study involves comparing individuals with a particular condition or disease, known as the cases, to a group of individuals with the same general characteristics but without the condition or disease of interest known as controls. Information on past exposure to possible risk factors is obtained for both the cases and the controls, and the frequency and intensity of exposure in the cases is then compared with that in the controls. The starting point of most case control studies is the identification of cases, however prior to selecting cases clear eligibility criteria should be defined, based on the objectives of your study. This is referred to as the case definition, for example, you may only be concerned with a population within a certain age bracket or a specific gender. Cases can be sourced from a variety of places such as hospitals, clinics or the community setting, however, you must be aware of capturing all representative cases, for example not just those that are more advanced that make it to surgery. These cases should be representative of everyone with the disease under investigation. Usually it is not too difficult to obtain a suitable source of cases but selecting controls tends to be more problematic.

Assessing exposure in cases and controls has to be carefully considered. Self reported recall of usual behavior may not be comparable in cases and controls, for example if you have a chronic illness such as cancer, you may be more motivated to find out why you got the disease and thus think about your past differently and more likely to report it differently compared to if you did not have cancer or were a control participant. This is called recall bias. Another important factor is how many cases and controls are required. The number of cases that can be studied is often limited by the rarity of the disease being studied. If this is the case statistical confidence can be increased by having more than one control per case. As a result studies often allocate 2 or more controls per case.

The advantages of case control studies are: they good for studying rare diseases because you can identify all of the existing cases that have already accrued over many years; they are relatively inexpensive to conduct; they can be quick to obtain data because you can assess exposure and outcome all at the same time. However they have disadvantages, and these include: there can be bias associated with exposure assessment, that is, the presence of disease may affect how an individual reports past exposure. There’s often difficulty in selecting a good control group, and they are limited to assessing just one chosen outcome. They also can’t tell you any information about the temporal relationship between exposure and the disease.

The main principle of case-control studies is that we select a group of individuals with the outcome of interest (cases) and a group of individuals without the outcome (controls), and we explore whether they have been exposed to the exposure under study.

The measure of association that can be estimated in a case-control study is the odds ratio (OR).

9 Cohort Studies

In relation to the hierarchy of evidence, we’re climbing up the ladder. And with regards to observational study designs, cohort studies are considered the most robust than case-control studies.

The cohort study typically involves a group of people without disease who are observed over a period of time to see what happens to them. This is also known as a longitudinal study. As a result, the first step in conducting a cohort study is to select your target population and assess their exposure status. Next you will follow these people to check up if they develop the disease of outcome or outcome of interest. So the defining characteristic of a cohort study is that you track people forward in time, you always assess exposure prior to disease.

The key principal of a cohort study is that a number of individuals without the disease or outcome of interest are selected and followed up for a period of time. Some of them are exposed to the exposure under study, while the rest are unexposed. By the end of the study period, some individuals will have developed the disease/outcome of interest both in the exposed and in the unexposed group.

Depending on the data you have collected during the follow-up period, you can calculate the risk and/or the incidence rate of the disease in the exposed and the unexposed groups. Hence, you are able to calculate the Relative Risk or Risk Ratio (RR), the Risk Difference or Attributable Risk (AR) and the Incidence Rate Ratio (IRR).

10 Strengths and Weaknesses of Cohort and Case-control Studies Compared

In epidemiology, studies can be either observational or experimental. Observational studies are studies in which the investigator only observes populations or individuals, and does not interfere or manipulate the exposure. We will look at the strengths and limitations of two most commonly used observational study designs: cohort studies and case-control studies.

10.1 Cohort studies

In cohort studies, a group of individuals without the disease are followed-up over a period of time to observe what happens to them. Cohort studies try to find associations between previously defined characteristics of a cohort and the development of disease.

Advantages of cohort studies include:

  • They enable researchers to investigate multiple outcomes simultaneously.
  • The temporal relationship between exposure and disease can be explored. In other words, we can be certain that the exposure preceded the disease.
  • Cohort studies can allow researchers to calculate incidence rates as well as risks (and the respective ratios).
  • Cohort studies suffer from fewer ethical concerns as researchers are not assigning exposures or intervening with participants.

On the other hand, there are also limitations of cohort studies which should be acknowledged.

  • One weakness of cohort studies is that they usually have a long duration which also implies larger costs.
  • Cohort studies are not useful for studying rare diseases.
  • Loss to follow-up which is likely to occur when running cohort studies can introduce bias.
  • In occupational cohorts, the healthy worker effect may introduce bias. The healthy worker effect refers to the low mortality or disease incidence in healthy populations or industrial cohorts compared to the general population.

Cohort studies are warranted when the time between exposure and disease is relatively short, the occurrence of the disease is not rare, and when adequate funding is available.

10.2 Case-control studies

Case-control studies are another type of observational study where the investigator does not interfere or manipulate the exposure. In case-control studies, individuals with a particular disease are compared with individuals without the disease with regard to their exposure status.

Advantages of case-control studies include:

  • One of the major strengths of a case-control study is that it is good for studying rare diseases.
  • Compared to cohort studies, it is also relatively inexpensive and has a shorter duration, reducing the time required to acquire results.

On the other hand, like all study designs, case-control studies have limitations.

  • Case-control studies are prone to selection bias. Selection bias can occur as a result of how the participants are recruited into the study; this bias can be related to the case-control status of the participant or the exposure status.
  • Case-control studies do not allow the investigation of multiple outcomes.

11 Nested Studies

Cohort studies are often extremely large national or international studies, and subsequently there are very rich data sources. As a result it’s important that epidemiologists utilize this data effectively. One way to do so is to conduct new studies within these cohorts. One such study is a nested case control study.

A nested case control study is a case control study embedded within a prospective cohort study. The prospective cohort study generates cases, and potential controls, for the nested case control study. As a result, the cohort study provides a well defined source population of both cases and controls. One of the main differences between a traditional case control study, and a nested case control study, is that the cases are diagnosed after exposure assessment during the follow up period.

In case cohort studies the aim is to achieve the same goal as cohort studies but more efficiently using a sample of the denominators of the exposed and unexposed cohorts and if conducted properly case cohort studies provide information that should replicate findings from a cohort study case cohort studies are very similar to nested case control studies. The main difference is the way in which the controls is selected. In the case cohort study cases are defined as those participants of the cohort who develop the disease of interest but the control group selected from all cohort participants at baseline before the cases develop. This means that controls are randomly selected from all cohort disciplines regardless of whether they go on to develop the disease of interest or not.

Case cohort studies share the same advantages of nested case control studies including the efficiency, flexibility and the reduction of information and selection bias however they also have some additional benefits. These include the ability to examine multiple outcomes; the ability to include person time in the analyses and they are good when excluding cases from the control group is logistically difficult. For example in diseases with a high proportion of subclinical phases such as prostate cancer to exclude all prostate cancers you would have to screen detect them however case cohort studies are not always feasible in particular they’re not suitable when exposures change over time; for example if exposure is measured at the beginning of a follow up period and differs from the overall exposure during the entire study period. To summarize - the case cohort study is an efficient alternative to analyzing the full cohort. When carefully planned and analyzed it is a strong choice for follow up studies with multiple outcomes of interest.

Nested case-control and case-cohort studies are studies nested within cohort studies.

  • One of the major strengths of nested case-control and case-cohort studies is that the data or biospecimen is collected prior to the disease, ensuring that the exposure preceded the disease. This also means there is less chance of bias when assessing the exposure. Finally, nested studies also reduce selection bias.
  • When dealing with valuable biological samples it may be too costly to analyse all biological samples or researchers may want to use samples for investigating multiple research questions. In that case, it is more advantageous to use nested case-control or case-cohort studies than full cohort analyses. Similarly, costs to data entry can be high and it may be more cost-effective to only analyse data from those who become cases and a sub cohort of non-cases.
  • Overall, they allow for the most efficient use of resources.
  • Nested studies are useful for studying rare outcomes.
  • Specific to case-cohort studies, one of its strengths is that it allows for the estimation of risk factor distributions and prevalence rates as well as unbiased assessment of correlations among variables, and can also include person-time in the analyses.
  • Nested case-control and case-cohort studies have limitations as well. For example, nested case-control studies can suffer from reduced precision and power as a result of the sampling of controls.

12 Randomised Controlled Trials

Whether practicing clinical medicine, or working on a research project, all you’re ever trying to look for are associations. In medicine, this could be an association between a clinical symptom, like a cough, or a potential cause, like smoking, with a diagnosis, say heart failure or lung cancer. There are two basic approaches for assessing whether an exposure is associated with a particular outcome: using experimental or observational studies. However, the strength of an association is judged by the robustness of the evidence. We’ve already learnt about observational study designs where, as the name suggests, you simply observe the study sample.

A major problem with observational studies is that the observed groups may differ in many other characteristics in addition to the one being investigated. As a result, clinical medicine puts most emphasis on robust evidence from experimental studies or clinical trials, which are considered gold standards in terms of evidence. The best sort of trials are randomised controlled trials. Randomised controlled trials are experimental studies which compare and assess the effectiveness of two or more treatments, to see if one treatment is better than another. The treatment being tested could be a drug or some method of care, but there must always be a comparator group which acts as the control.

Treatments being tested could be compared with no treatment, ideally using a placebo as the control. For example, if you were testing a new drug, the placebo would be a tablet which looked identical, ideally, to the active drug in every way, but does not contain any active ingredient. Trials using this method are referred to as placebo controlled trials. Alternatively, once you have a treatment that is effective and safe, you may test a new treatment against the existing standard treatment, to check if it is more effective or to examine what the side effects are, and how common they are.

Information from the follow up of the control group allows the researchers to see whether the new treatment, or treatments, that they’re testing are any more or less effective than the existing treatment or placebo. To maximise the value of the clinical trial, the choice of controls is clearly critical. There’s no point in showing you a new drug or intervention is better than one that no one uses, or than the wrong dose of a drug that people do use. Randomised trials are characterised by the fact that the study subjects are allocated by the investigator to the different study groups through the use of randomisation, and the investigators then intervene differentially on participants. It’s an experiment. While randomised controlled trials are recognised as the gold standard study design for evaluating the impact of an intervention on an outcome, the process of randomisation alone does not wholly protect against bias. Incorrect analyses of the data can introduce bias, even where randomisation has been correctly implemented. It’s important to preserve the advantages of randomisation during the conduct of the study, and in analysis. If you don’t investigators may reach an incorrect and biased assessment of results.

For example, by not evaluating patients according to the group which they were originally assigned. This concept of analysing patients according to which group they were originally assigned is called ‘intention to treat’. Imagine you have 200 patients who had an acute myocardial infarction, a heart attack. You randomised them so that 100 go to the coronary care units, and 100 go mountain climbing. In the coronary care unit, 18 died and 82 went home, so the survival rate is 82%. On the other hand, with the mountain climbers, 1 died because he was daft enough to go up the mountain, but 9 others who went up the mountain lived. The other 90 were lost, or if they were wise, they went home - we don’t know whether they went home or died on the mountain. So indeed, they might have died at the mountain somewhere - you don’t know. But if you just analyse the data for the 10 participants that you do have outcome information on, mountain climbing gives you a survival rate of 90% - one died out of the 10 you found. So, mountain climbing appears to be better than the coronary care unit?

This story also emphasises that you have to try very hard not to lose patients. What happened to the 90 last mountaineers is critical to interpreting your trial, but equally importantly, you must include them in your analysis. If patients withdraw from the trial, you try to find out whether they are alive at the end, and what happened to them, and you include them in your original groups, because they were randomised to do that, even if they didn’t take the drugs or carry out the instructions they were supposed to. This is the basic idea of why a trial should be randomised and controlled, and of the importance of selecting control interventions. Remember, importantly, you must account for missing trial participants, and include all participants in your analysis and in their original groups, regardless of whether or not they followed their allocation intervention.

12.1 Strengths and Weaknesses of Randomised Controlled Trials

Randomised Controlled Trials (RCTs) is often considered the optimal study design for a number of reasons.

  • Randomisation substantially reduces the risk of bias in the study.
  • RCTs are also relevant to actual interventions in populations and settings of interest.
  • They can provide precise measures of efficacy which we can use to evaluate interventions.

However, RCTs are also subject to certain limitations, including:

  • The results may not be generalisable to populations that are different than the sample used in the study.
  • They can be quite complex and costly to conduct.
  • Due to cost and practical considerations, they often rely on surrogate endpoints. For example, a biomarker is measured instead of a health outcome which might require a long time to develop.
  • They are experimental studies, which raises ethical issues. Some exposures (e.g. smoking or radiation) cannot be studied with RCTs because it is unethical to intentionally expose people to them.

13 Conclusion

We have looked at the main types of Epidemiological study designs. There are many classifications of study designs which may slightly differ from each other, depending on the criteria they use to characterise studies. We looked at two main categories of studies; analytic vs. descriptive, but one could also start with the contrast between experimental and observational studies.

Note that a cross-sectional study can also be considered descriptive when, for example, its main purpose is to describe the prevalence of a disease. Experimental studies are, by definition, analytic. Study designs such as nested case-control and case-cohort also belong to the analytic studies.