The International Classification of Disease System (ICD)

In this article we will look at the history of the International Classification of Diseases (ICD) system, which has been developed collaboratively so that the medical terms and information in death certificates can be grouped together for statistical purposes. In practical examples we will look at how to extract ICD-9 codes from MIMIC III database and visualise them.

Pranath Fernando


March 18, 2022

1 Introduction

In an earlier article we looked at how we can extract clinical outcomes and patient level data from the MIMIC-III EHR (Electronic Health Record) database. In this article we will look at the history of the International Classification of Diseases (ICD) system, which has been developed collaboratively so that the medical terms and information in death certificates can be grouped together for statistical purposes. In practical examples we will look at how to extract ICD-9 codes from MIMIC III database and visualise them.

2 The International Classification of Disease System (ICD)

The World Health Organization is an agency that works on behalf of 194 member states. The aim of the organization is to promote the best standards in health for all people, regardless social and economic condition. As well as regardless of race, gender, religion and political beliefs. The main principle behind the organization work, is that access to affordable and articulate healthcare is a human right. For this reason, it promotes the fusion of universal health coverage.

There are several determinants that influence human health. This can be biomedical and genetic factors. Health behaviors, socioeconomic factors and environmental factors. The organization recognizes that we need to have common metrics to measure health and wellbeing. And some of those metrics are related to life expectancy, as well as mortality. Other metrics are subjective, and it depends of how well the person feelings. Disability, as well as illness and comorbidity are also measures of health and wellbeing.

The World Health Organisation aims to coordinate action among the member states, in order to intervene and improve health globally. To achieve this, it is required to collect data from patients, and this data will be analyzed from researchers, statisticians and clinicians to estimate indices of health and wellbeing. Technological and machine learning advances can promote healthcare and narrow the gap between rich and poor countries.

In order to collect data that can be compared across different locations and times. We need to have common notations and definitions. For this reason, the World Health Organization maintains a family of international classification schemes. In other words, there is a set of integrated classifications that provide a common language for health information across the world. The International Classification of Diseases, is the international standard diagnostic tool. For epidemiology, health management and clinical purposes.

The International Classification of Diseases, have been designed with the aim to describe various aspects of the health and the health systems in a consistent manner. In this way, it helps the development of reliable statistical systems at local, national and international levels. With the aim of improving status and health care. In practice, this process is used to translate diagnosis of diseases and other health problems from words into an alphanumeric code. The usage of the International Classification of Disease system. Provides a systematic way for storage, retrieval and analysis of the data.

The first type of users exposed in these classifications is in a clinic and it includes physician nurses and health workers. They integrate this information and they used it to support decision making for their patients. The second type of users are in administration and this can be health information managers, policymakers, insurers and national health program managers.

This data are also of paramount importance for population, health and epidemiology as well as research. They allow quantifying disability, diseases and risk factors in a global level. And they enable research in decision support system, based on artificial intelligence.

Summarizing, the International Classification of Diseases is one of the oldest and most important classification in medicine. It enables the optimal application of computer technology in the processing and retrieval of clinical information. Importantly, it is recognized internationally. Which enables sound statistical comparison of data, from different regions in different times.

3 The Evolution of the ICD System

The first effort to systematically classify diseases goes back in the 17th century. John Graunt, who was an epidemiologist and statistician, was looking into the death of children who’re born alive, but died before the age of six. He recognized the need to organize mortality data into some logical form and therefore develop the first statistical study of disease called the London Bills of Mortality.

William Farr is considered as the first medical statistician of the general Register Office of England and Wales. He submitted his report on Nomenclature and Statistical Classification of Diseases in 1855. In this report, he included most of those fatal diseases that affect health. In fact, in mid 80s, it was recognized the need of classification of diseases that was uniform and internationally accepted. Farr pointed out that medicine has progressed by that time and many diseases could affect particular organs, pointing out for a classification of diseases related to the organic systems they affect. He also considered previous classifications as largely symptomatic and the arrangements could not be used for statistical analysis.

The beginning of modern classification can be considered as the 1893. The chief of statistical services of Paris prepared a classification based on the principle of distinguishing between general diseases and those localized to a particular organ or anatomical site. Bertillon presented his report on causes of death and incapacity for work, including hospital admissions. Bertillon’s main headings included general diseases, diseases of nervous systems and sense organs, circulatory system, respiratory system, digestive system, and many others. The International Statistical Institute adapted the first edition of international classification system, the so-called the Internationally List of Causes of Death in 1893.

The ICD-10 coding system was endorsed by the 43rd World Health Assembly in May 1990. It came into use in World Health Organization member states as from 1994. ICD-10 involved a thorough rethinking of its structure and an effort to devise a stable and flexible classification which won’t require fundamental changes. Also, the structure of codes have changed from numeric to alphanumeric, which allows for significant expansion. The ICD-11 coding has been adopted by the 72nd World Health Assembly in 2019, and it comes into effect in January 2022. ICD-11 has been designed for digital use and it’s fully electronic. It aims to assist implementation and reduce error in diagnosis while it makes it more adaptable in local countries. The system has an improved ability to code for the quality and safety of health care and highlights socioeconomic factors that directly and indirectly contribute to people’s health. Finally, it also tries to simplify diagnostic descriptions, particularly in relation to mental health.

Summarizing, the need to organize disease data systematically was recognized in the 17th century. However, it wasn’t until the late 80s where the first international list of causes of death was founded. ICD codes are ubiquitously used in medicine and they are necessary to be able to compare statistics across different countries and across different times.


ICD-9 is the disease classification system used in MIMIC-III. We will review its main structure, and we are going to see how the ICD codes can help us extract summary statistics from MIMIC-III database for example, to the number and distribution of patients across age which are diagnosed with a specific disease. We’re going to also see how we’re going to be able to put together queries to extract data with relation to the most common ICD codes in the MIMIC database and how these codes are distributed across ICU units.

The main structure of the ICD-9 coding system consists of three digits that reflect a category and two digits that reflect the cause or the location. The World Health Organization requires a minimum of three-character categories level for international reporting and comparison. Therefore, these three digits always need to be provided with the corresponding number. Whereas the fourth digit is filled with X when there is no further information about the sub-division.

Here, we see a more detailed overview of the ICD-9 categories. In the first column, we see the codes related to the three first digits of the ICD-9 code. On the right column, we see the description of each of these categories. We start here with epidemic diseases and then we see diseases like neoplasm, endocrine, nutritional, and metabolic diseases and immunity disorders. We see here diseases of the blood and blood forming organs, mental disorders, and then we see also a number of diseases related with specific systems, such as the nervous system and sense organs, the circulatory system, the respiratory system, the digestive system, the genitourinary system, and so on.

Subsequently, we see developmental diseases, for example, congenital abnormalities. We also see injury and poisoning category. Finally, we see here that the last two categories, the first digit can be a letter. Both of this category offer a supplemental classification. We’re going to see how we can extract those codes from MIMIC-III. ICD codes in MIMIC-III are under the table of Diagnoses_icd.