Artificial intelligence in the diagnosis and prevention of multi-stage colorectal cancer: potential algorithmic bias and systematic mitigation

Anupama Nair

doi:10.21037/jmai-24-169

Review Article

Artificial intelligence in the diagnosis and prevention of multi-stage colorectal cancer: potential algorithmic bias and systematic mitigation

Anupama Nair

Honors College, Virginia Commonwealth University, Richmond, VA, USA

Correspondence to: Anupama Nair, CPhT. Honors College, Virginia Commonwealth University, 701 West Grace Street, Richmond, VA 23220, USA. Email: anunair1933@gmail.com.

Abstract: In the United States, colorectal cancer (CRC) rates reported from 2015–2019 revealed that CRC was the 3^rd most commonly diagnosed cancer among men and women and 2^nd highest in terms of cancer-related deaths among men 50 years of age or younger. Artificial intelligence (AI), through the mode of machine learning (ML) models, is a novel approach for screening and diagnosis of CRC that requires further study to determine its appropriate mode of application. This study implemented an analytic review of: (I) the role of etiological factors in disproportionate incidence of CRC across racial lines in the United States population; (II) the use of race and ethnicity as predictors in CRC ML models; (III) the use of glandular segmentation and manipulation of prediction thresholds in CRC ML models, in order to understand the potential consequences of biases in artificial intelligence implementation in the United States. Six peer-reviewed studies were chosen for analysis in order to mitigate potential bias in the study. Each study was chosen from internet search via Google and last searched the 15^th of December of 2023. Search was conducted through the Google Scholar database, using the search terms of “colorectal cancer incidence AND race/ethnicity”, “colorectal cancer diagnosis AND artificial intelligence”, and “technical modifications + machine learning + colorectal cancer”. This search yielded a total of nine viable peer-reviewed articles assessing the three main areas of study. It was found that the Black and African American (BAA) population had a higher incidence of CRC due to etiological factors including higher rates of non-insulin dependent diabetes mellitus (NIDDM), obesity, smoking, and limitation of resources. Such etiological factors negatively impacted ML models, where inconsistencies in data acquisition led to poor model performance for this population. Modifications to ML models in the form of the usage of race and ethnicity as predictors increased model viability. Glandular segmentation and manipulation of prediction thresholds of CRC histopathological or whole slide imaging (WSI) in region and subsite-specific identification allowed for more accurate diagnosis of CRC imaging among the BAA population, revealing that such modifications may improve diagnostic ML model outputs across racial lines. Further research should continue to explore the implementation of race-based prediction for ML models diagnosing for CRC due to its novel nature, demonstrated biases, and improvement given technical modifications.

Keywords: Colorectal cancer (CRC); machine learning (ML); racial bias; artificial intelligence-based tumor biomarker (AI-based tumor biomarker); molecular pathological epidemiology

Received: 04 June 2024; Accepted: 19 September 2024; Published online: 25 November 2024.

doi: 10.21037/jmai-24-169

Introduction

Background

According to the Surveillance, Epidemiology, and End Results (SEER) program, colorectal cancer (CRC) rates reported from 2015–2019 in the United States revealed that CRC was the 3^rd most commonly diagnosed cancer among men and women (1). Although CRC was the third most commonly diagnosed cancer among the adult population, it was found to be the second highest in terms of cancer-related deaths in the United States among men 50 years of age or younger (1). Over one half of these deaths were related to modifiable risk factors among this population, including: physical inactivity, smoking, poor diet, high alcohol consumption, and high body weight (1-4). Early screening for CRC is essential for the purposes of early intervention and thus improved treatment outcomes (1,5).

In the United States, the Black and African American (BAA) population faces the highest incidence of CRC, an observed 20% higher incidence compared with the White population (5,6). BAAs face limitations in resources including access to screening and healthcare, which exacerbate CRC incidence and mortality rates among this group of individuals (6). Excess body weight has been linked with the development of cancer, observed through the disproportionate level of obesity found among the adult United States population, African Americans at 49.6%, compared with non-Hispanic White individuals (42.2%) (3). Obesity further plays a role in the development of non-insulin dependent diabetes mellitus (NIDDM), a risk factor for the development of CRC (1,4). Adult BAAs experience NIDDM at a rate of 13.2%, almost double the rate at which is observed among the White population (7.6%) (4). Another etiological factor that increases CRC risk is smoking, which is found to be slightly higher at among African Americans at 16.8% compared to non-Hispanic White individuals (16.6%) (2). The existence of such barriers to healthcare access in addition to etiologic factors that increase CRC risk has the potential to create a systematic pattern of CRC incidence in the BAA population (7,8).

CRC can be split into four stages: Stage I, Stage II, Stage III, and Stage IV (9). Stage I CRC involves cancer cells found in the mucosa of the colon and rectum (9). Stage II CRC involves cancer cells growing in the peritoneum (9). Stage III CRC involves cancer cells growing in the peritoneum where there are metastases to one to three lymph nodes (9). Stage IV CRC involves the cancer cells growing in the peritoneum where there are metastases to over four lymph nodes (9).

Current treatment methods for CRC are diverse, and include surgical resection of cancerous tissue, resection with the possibility of the insertion of a colostomy, radiation therapy of cancerous tissue, ablation therapy, chemotherapy, radioembolization, and targeted therapies to reverse tumor growth and stop cancer spread (8,9). However, no one treatment can guarantee that CRC will be eliminated, and oftentimes, patients are subjected to a combination of such treatments in accordance with cancer stage and disease progression (5,9).

Rationale and knowledge gap

Early diagnosis is essential for CRC, as there is extensive evidence from clinical records citing improved patient outcomes correlated with early diagnosis and associated treatment (1,8,10-12). Current screening is limited to fecal occult blood tests, fecal immunochemical tests, barium enema, colonoscopy, sigmoidoscopy, virtual colonoscopy, and genetic tests, while diagnosis for CRC can be conducted through blood tests, biopsy, and genetic analysis (10). When a patient is given a diagnosis, further staging is done to determine the appropriate course of treatment, and these staging measures include computerized tomography (CT) scan, positron emission tomography (PET) scan, magnetic resonance imaging (MRI), ultrasound, chest X-ray, lymph node biopsy, or tumor marker analysis (8,10). Artificial intelligence (AI), through the mode of machine learning (ML) models, is a novel approach to screening and diagnosis of CRC that requires further study to determine its appropriate mode of application (8,10).

The ML model creation process involves data collection, model training based on the collected data, testing of the model to ensure its accuracy, and finally, its deployment onto external datasets (7,8). Sample bias can lead a model to be able to make predictions for a general population, but not that for groups of minority (7,8). In the deployment stage, after the ML model has been used in clinical practice, domain shift can occur, a situation in which the data that was used for training of the model differs vastly from the patient population it is being used for (7,13). This has the potential to lead to negative patient outcomes where ML models misdiagnose patient imaging (7). Thus, it is imperative action is taken to correct existing biases in ML models and ensure that new models are not created from biased frameworks, as to not perpetuate a system of inequitable healthcare among individuals belonging to underrepresented communities (7,8).

Objective

Because the BAA population faces a systematic lack of resources and due to etiological factors including higher rates of NIDDM, obesity, and smoking, BAAs have a higher incidence of CRC. This observed lack of resources presents itself in limited diagnostic capabilities among CRC ML models, and such the usage of race and ethnicity as predictors when applied to CRC ML models has the potential to reduce racial bias in model output compared to ML models that do not include these predictors, as it lowered the area under the receiver operating curve (AUC) values and reduced racial bias in model output. Due to the importance of accurate diagnosis in CRC, it may be beneficial for diagnostic outputs from ML models utilizing race-based detection to be confirmed by clinical professionals. Because the use of glandular segmentation and manipulation of prediction thresholds of CRC histopathological or whole slide imaging (WSI) in region and subsite-specific identification allows for more accurate diagnosis of CRC imaging among the BAA population, such modifications may improve diagnostic ML model outputs across racial lines. The author presents this article in accordance with the PRISMA reporting checklist (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-169/rc).

Methods

This study implemented a systematic review of: (I) the role of etiological factors in disproportionate incidence of CRC across racial lines in the United States population; (II) the use of race and ethnicity as predictors in CRC ML models; (III) the use of glandular segmentation and manipulation of prediction thresholds in CRC ML models, in order to understand the potential consequences of biases in artificial intelligence implementation in the United States. Six peer-reviewed studies were chosen for analysis in order to mitigate potential bias in the study (5,8,13-17).

Search strategy

Each study was chosen from internet search via Google and last searched the 15^th of December of 2023. Search was conducted through the Google Scholar database, using the search terms of “colorectal cancer incidence AND race/ethnicity”, “colorectal cancer diagnosis AND artificial intelligence”, and “technical modifications + machine learning + colorectal cancer” (Figure 1). This search yielded a total of nine viable peer-reviewed articles assessing the three main areas of study. Studies were chosen for inclusion based on how well they corresponded to the three main areas of study assessing the role of etiological factors in the development of colorectal cancer, the use of race and ethnicity as predictors in CRC ML models, and the use of specific technical modifications to ML models diagnosing for CRC. Due to the limited nature of data in this area of study, there were no set exclusion factors that resulted in study removal. However, from the nine studies, a total of six were chosen for final analysis, given the other three mainly assessed criteria outside of the realm of this study and were thus removed (Figure 1). Due to the limited nature of data in this area of study, six studies were ultimately chosen for analysis.

Figure 1 PRISMA 2020 flow diagram for new systematic reviews which included searches of databases and registers only. CRC, colorectal cancer.

Results

Because the BAA population faces a systematic lack of resources and due to etiological factors including higher rates of NIDDM, obesity, and smoking, BAAs have a higher incidence of CRC. Since environmental limitations lead to limitations in image acquisition, BAAs are studied at lower rates than their White counterparts, which may lead to a limited amount of data to inform ML models for diagnosing CRC.

Khor et al. obtained the unadjusted rate of CRC recurrence by racial and ethnic subgroup from the Kaiser Permanente (Southern California, USA) linked cancer registry in the creation of four ML models to assess the role of race and ethnicity as predictors for the diagnosis of CRC recurrence in patients who had undergone surgical resection from medical imaging (5). Khor et al. utilized unadjusted CRC recurrence rates as a baseline for understanding the true recurrence of CRC among the Kaiser Permanente linked cancer registry dataset (5). Khor et al. calculated that the cumulative recurrence of CRC following 3 years was 11.2% among all racial and ethnic groups evaluated (5). Khor et al. found that the cumulative recurrence of CRC following 3 years was highest among BAA patients at 13.8% and lowest among non-Hispanic White patients at 9.5% (5).

Khor et al. found that BAA individuals demonstrated higher recurrence of CRC 3 years after initial resection and argued that high CRC recurrence rates among the BAA population are indicative of a disparity in either diagnostic or environmental factors, with the possibility of both playing a role in CRC recurrence among this group of individuals (5).

Murphy et al. collected data on incidence rates of colorectal cancer from the National Cancer Institute’s SEER program (14). Murphy et al. specified that the images studied were those containing malignant tumors of the colon and rectum from 1992–2006. Murphy et al. excluded images containing lymphomas and appendiceal carcinomas (14). Murphy et al. utilized the SEER Statistical Model to calculate 95% confidence intervals for the male-to-female incidence rate ratio (MF IRR) of CRC, specified by anatomic subsite (14). Murphy et al. noted a significant difference in CRC diagnoses among Black males and females, as they had a disproportionately higher diagnosis rate for proximal cancer (1.17 IRR) compared to the other studied ethnic groups: non-Hispanic White, Hispanic White, Asian/Pacific Islander, and American Indian/Alaskan Native, which were all above 1.18 IRR (14). Murphy et al. explained these differences in CRC diagnosis where Black individuals possessed higher CRC rates, attributing it to external etiologic factors, citing health-care coverage and unspecified “other factors” (14). Murphy et al. proposed that Black individuals were less likely to get screened for diagnostic abnormalities compared to White individuals (14).

Troisi et al. obtained data from the period 1975 to 1994 among White and Black individuals in the United States diagnosed with primary invasive carcinomas of the colon or rectum in order to examine changes in CRC incidence rates in the United States similar to recurrence reporting from Khor et al. and incidence reporting by Murphy et al. (5,14-16). Troisi et al. excluded lymphomas and appendiceal carcinomas from the collected data, mirroring the methodology of Murphy et al. (14,16). Troisi et al. specified the evaluated subsites as the proximal colon, distal colon, and the rectum (16). Troisi et al. further noted that the any data regarding the splenic flexure was grouped in with the proximal colon as it is part of the transverse colon, which originally contained the anatomical structure (16). Troisi et al. utilized the SEER program, the same program utilized by Murphy et al. to compute incidence rates using the population estimates obtained from the data and plot them on a logarithmic scale (14,16). Troisi et al. specified that the cases of data were split into categories for age, race, and gender, where it was noted that the only races included were White and Black (16). Troisi et al. evaluated incidence rates over time periods ranging from 1975 to 1994, with 3-year sub-intervals from 1975–1978, 1979–1982, 1983–1986, 1987–1990, and 1991–1994 (16).

Troisi et al. examined incidence rates of CRC in the United States population from the years 1975–1995 and observed that Black individuals experienced the highest rates of proximal CRC increase compared to the White population over this period, with a 30% increase observed in Black males and a 22.5% increase observed in Black females, in agreement with the findings of Khor et al. and Murphy et al. (5,14,16). Such studies revealing higher incidence of CRC among the BAA population indicated the extent to which CRC is prevalent among this population, in addition to longitudinal studies including Troisi et al. that showed such increases in incidence have grown over a relatively short period of time, giving ground to the idea that potential systemic or etiological factors exist among the BAA population which feeds into a cycle of increasing CRC incidence (16). Etiological and systemic factors can play a large role in CRC incidence among the BAA population, and such it is necessary these factors are identified so that steps can be taken for its mitigation.

Troisi et al., similar to Murphy et al., identified etiological factors as an explanation to the high incidence of CRC among the BAA population, where changes to environment, smoking, diet, and NIDDM were argued to be prevalent in this population, and additionally correlated to the development of CRC (14,16). The existence of observed high incidence of CRC among the BAA population intertwined with associated etiological and systemic factors that have contributed to the high incidence provide basis for the conclusion that diagnostic or treatment measures may be improved to decrease CRC incidence among BAAs.

The link between diet, lifestyle, and environment with the development and progression of CRC is discussed at length by Bahrami et al., where it is described how CRC specifically is a disease that is highly related to diet including consumption of red meats, alcohol, and dairy products, factors that increase likelihood of CRC tumor development (17). Individuals from racial and or ethnic backgrounds that consume these products thus may be observed to have disproportionately higher rates of CRC. Another tie to increased CRC incidence is among individuals with gut microbiome-related issues, particularly in relation to excessive consumption of lacto-fermented foods. In relation to lifestyle practices, Bahrami et al. discusses bowel emptying and how varying practices such as sitting on the toilet, observed in Western countries, versus squatting in Eastern countries particularly in Asian communities influence CRC occurrence (17). This is because squatting during bowel emptying results in a higher rate of bowel emptying, associated with lower CRC incidence. All of these mentioned factors play a role in CRC development, vary differentially within each individual patient, and may be an avenue for explaining race and ethnicity-based CRC incidence. To add, molecular pathological epidemiology of CRC reviewed by Ogino et al. has noted that the transdisciplinary field provides great insight into the overlap between exogenous and endogenous genetic factors with CRC tumor progression (18). Tied in with personal tumor biomarkers, genome-wide association studies can show the pathogenic progression of CRC and pose a mode of highly personalized CRC prevention and cancer therapy.

In relation to the role of lifestyle, diet, and environment in differential CRC presentation among racial and ethnic lines, is the use of personalized AI-based tumor biomarkers. A study conducted by Chang et al., where a team of researchers used chromatin packing scaling D in rectal mucosa as an indicator of potentially cancerous or cancerous lesions was deployed onto data of 256 patients with various stages of CRC (19). Increased packing scaling D of chromatin was found among patients with CRC, furthermore emphasized by the finding that when rectal D was applied as a screening biomarker for patients with advanced adenomas, yielding an AUC value of 0.85. This finding provides substantial grounds for the idea that personalized AI-based tumor biomarkers may be applied in race-based CRC diagnosis, an area for future study in this novel area of research.

Khor et al. concluded that higher recurrence of CRC among the BAA population along with observed longitudinal increase of CRC in this group indicated a need for improvements in time-based screening and diagnostic procedures, where current ML models may be inadequate in application to the United States population (5). Waljee et al. created two ML models to diagnose CRC among the sub-Saharan African population, a longitudinal model which contained data regarding patient CRC recurrence over an extended time period as proposed by Khor et al., and a single time point regression model. The longitudinal model outperformed the single time point regression model, achieving an AUC score of 0.735 as opposed to the single timepoint regression model that achieved a score of 0.683 (5,8).

Waljee et al. concluded that longitudinal models may be proposed to the diagnosis of CRC as they contain a greater amount of data and thus are better equipped to create accurate predictions for this disease among diverse populations such as the United States (8). Although the study conducted by Waljee et al. was implemented among the sub-Saharan African population, the same methodology of the usage of a longitudinal model has the potential to improve CRC diagnosis among the BAA population, as opposed to models operating from single samples of data, reflecting time-based changes of CRC incidence among this population (8).

Gichoya et al. trained AI models from the public and private MXR, CXP, EMX, NLST, RSPECT, EM-CT, DHA, EM-Mammo, and EM-CMS datasets for lung and breast cancer to diagnose patient race from the medical imaging data (15). Gichoya et al. indicated that three main experiments were used to explore existing disparities among racial lines for AI in medical imaging including race-recognition from medical imaging, anatomic and phenotypic as potential confounding variables to the performance of such models, and to find the possible mechanism by which models classify by race (15). Gichoya et al. found a possible barrier to image classification capability to be image quality, which may be influenced by a patient race among the non-White population (15). Gichoya et al. noted that image acquisition practices vary across different locales and among different communities, suggesting a possible link between lower quality images being obtained from non-White populations (15).

Waljee et al. similarly detailed a lack of resources among the sub-Saharan African population, which may be responsible for the greater incidence of CRC among this group (8). Waljee et al. cited insufficiencies in trained clinical personnel and equipment like Gichoya et al. in middle- and low-income countries, concluding from this information that racial bias in imaging can be due to a lack of resources in historically marginalized communities (8,15). The lack of resources and etiological factors that play a role in the increased incidence of CRC among the BAA population can be associated with poor performance in CRC ML models where there are limitations in data acquisition and proper model trainings, which perpetuate a cycle of incorrect or inaccurate diagnoses. With incorrect diagnoses is the linked continuation of disease progression among BAAs, where this group of individuals experiences lower diagnosis rates in addition to later diagnosis at a time where disease has progressed significantly further. All of the above factors: etiological, systemic, and model-based, have the potential to feed into a cycle of increased CRC incidence among the BAA population, and such it is essential to modify existing CRC ML models so that diagnosis of CRC is attained at earlier stages for BAAs with higher accuracy.

Because the use of race and ethnicity as predictors when applied to ML models diagnosing CRC lowered AUC values and reduced racial bias in model output compared to ML models that did not include these predictors, race and ethnicity predictors play a prominent role in accurate race-based CRC diagnosis. Due to the importance of accurate diagnosis in CRC, it may be beneficial for diagnostic outputs from ML models utilizing race-based detection to be confirmed by clinical professionals.

Khor et al. created four prediction models that were applied to the linked CRC cancer registry and electronic health record data from Kaiser Permanente Southern California data from 2008–2013: Model 1 (race-neutral), excluded race and ethnicity as predictors; Model 2 (race-sensitive), included race and ethnicity as predictors; Model 3 (interaction), evaluated interaction terms for race and ethnicity; Model 4 (race-stratified), was separated into a model for each race with covariates so that each group would have separate coefficients (5). Khor et al. observed the discriminative ability of the created models in terms of the AUC and found overall the race-neutral model had the highest AUC for the BAA population (0.79) opposed to the race-sensitive model which produced a slightly lower AUC of 0.76 for this group of individuals (5). When race and ethnicity were included as predictors in the form of the race-sensitive model created by Khor et al., AUC decreased among the BAA group and also produced the lowest racial bias score of 0.047 out of the race-sensitive, race-neutral, race-stratified, and interaction models (5).

In their study evaluating four machine learning models based on how they responded to race as a predictor, Khor et al. found the highest net benefit of 72.5% among the race-neutral model for the BAA population where it was closely followed by the race-sensitive model at 70.9%, while the lowest net benefit of 60.4% was found for the interaction model for the non-Hispanic White population (5). Khor et al. stratified for race and ethnicity, and found that the model was able to provide greater benefit for the BAA population at the 5% risk threshold in the validation data (5). Based on the race-stratified model providing the greatest net benefit for the African American population compared to other racial and ethnic groups, Khor et al. concluded that race and ethnic inclusion as predictors can aid in improved CRC recurrence diagnosis among this population (5). Additionally, the decrease of AUC values in the BAA population when race and ethnicity were applied as model predictors highlighted the advantageous nature of the utilization of race and ethnicity as ML model predictors in the diagnosis of CRC recurrence, providing basis to the idea that the inclusion of such predictors increases model performance for BAAs and this may be a potential solution to increasing CRC incidence among this population.

The rationale for such model creation is supported by the findings of Troisi et al. where it was found that there were increases of proximal CRC rates over the period 1975–1995 among Black females and males in the United States, with Black females experiencing a 22.5% increase over this period and a 30% increase observed in Black males (16). Increases of CRC among the BAA population may be indicative of etiologic and systemic factors, and thus CRC rates may be lessened among this group with proper ML model diagnosis at early stages of disease progression in addition to accurate model output. Accurate model output is essential to appropriate treatment administration, therefore ML models diagnosing for CRC based on race may present an advantage to incidence mitigation, as race-based ML model diagnosis had the ability to improve diagnostic outputs among CRC ML models and reduce scores of racial bias in CRC ML models.

Khor et al. found that at the 5% risk threshold, the model false positive rate (FPR) and false negative rate (FNR) was 68.2% in the race-neutral model. Khor et al. furthered that the FPR was highest among the Hispanic subgroup at 70.0% and was lowest among the White group at 64.1% (5). Khor et al. found that the smallest difference in FPR was found in the race-neutral model, with the largest difference belonging to the race-stratified model (5). Khor et al. noted that when interaction terms were applied in addition to stratification by race and ethnicity, overall FNR worsened but became more equal among racial subgroups (5). When race was excluded as a predictor, FPR was highest among the Hispanic subgroup, a group of minority in the United States, and was lowest among the White subgroup. From such findings, it is plausible to conclude that the omission of race as a predictor when diagnosing for CRC through ML models decreased model performance and may function to inhibit the application of proper treatment among groups of minorities, including the Hispanic and BAA populations in the United States. Thus, it is essential that model output is monitored closely so that limitations in diagnostic capabilities and model design do not lead to the creation of further inequities for the BAA population, in addition to etiologic and systemic factors which play a role in observed increased CRC incidence.

Gichoya et al. found that the trained AI models from the MXR, CXP, EMX, NLST, RSPECT, EM-CT, DHA, EM-Mammo, and EM-CMS datasets had the capability to diagnose race from images from cropped and noised images that medical professional cannot from obtaining AUC scores that ranged from 0.91 to 0.99, higher values than obtained by Khor et al., for chest X-ray images for the evaluated lung and non-lung segments (5,15). Gichoya et al. stated this poses a risk for future model deployment in terms of effective diagnosis and image classification. While the ability to diagnose race is essential to proper model functioning, Gichoya et al. argued there is a chance for uncontrolled model deployment to have the opposite of the intended impact, when medical professionals lack the ability to differentiate from the imaging themselves (15). Gichoya et al. expanded that such occurrences put communities of marginalized background at risk, including the Black community in the United States, where existing disproportionate incidence of CRC may be exacerbated by poor model performance (15). Gichoya et al. concluded that appropriate model supervision in the deployment stage is essential to the development of accurate ML models that improve diagnostic outcomes for marginalized communities in the United States (15).

Goyal et al. discussed the use of artificial intelligence as a measure to diagnose medical scans when used in conjunction with physician oversight (10). Goyal et al. trained an ML model from 8,641 images that had the capability of detecting polyps during endoscopies that screen for CRC (10). Goyal et al. trained the convolutional neural network to have an accuracy of 96.4%, proving its effectiveness in CRC diagnosis (10). Goyal et al. argued that the use of such technology aids physicians in not missing polyps during these screenings, however noted that AI modalities for CRC are novel in nature and should simply serve as an additional form of support for physicians in the diagnostic process, to aide in disease treatment, echoing the conclusions made by Gichoya et al. surrounding the supervision of AI ML models for CRC (10,15).

Yin et al. applied artificial intelligence to adjuvant therapies involving CRC as well as targeted therapies. Yin et al. evaluated the use of multiple convolutional neural networks on the diagnosis of CRC, where it was discovered that the combination of knowledge of these networks resulted in improved detection rates of CRC at early stages (11). Yin et al. emphasized the importance of such a finding since it helped to diagnose and treat patients at earlier stages of this disease (11). Yin et al. added that early detection not only helps to treat patients at earlier stages, resulting in better outcomes, but also aides the use of concurrent adjuvant therapies that are given in addition to standard treatment measures (11). Yin et al. concluded that artificial intelligence should continue to be improved over time, helping to better the current diagnosis and treatments measures for CRC (11). Yin et al. argued the same as Goyal et al. and Gichoya et al., that the use of AI should not overpower the opinion and evaluation of physicians, meaning both should be considered when creating patient treatment plans (10,11,15).

Ho et al. created and evaluated an AI algorithm based on two models in order to determine the best approach for diagnosing multi-stage CRC, where it was found that erroneous outputs in risk classification in terms of false positive diagnoses indicated the model’s limited use in deployment CRC (13). From obtaining these false positive outputs, Ho et al. proposed a workflow for the use of AI in pathology, where a pathologist or clinical expert evaluates all ML model findings before the submission of a final report, putting into practice the conclusions drawn by the studies conducted by Yin et al., Goyal et al. and Gichoya et al. (10,11,13,15). Ho et al. cited other possible factors for the false positive outputs of the models, including the data set’s relatively small sample size along with the institution’s need for assistive workflow, yet remained set on the idea that ML models require appropriate oversight to maintain accurate diagnosis (13). Although CRC ML models can diagnose CRC from medical imaging, there still remains potential for the creation of false outputs, which require further supervision from clinical professionals. ML models in application to diagnosing CRC are novel in approach, and evaluation of model output by clinical professionals before they are utilized for treatment assignment may be beneficial for the mitigation of improper treatment courses and early disease diagnosis

Because the use of glandular segmentation and manipulation of prediction thresholds of CRC histopathological or WSI in region and subsite-specific identification allows for more accurate diagnosis of CRC imaging among the BAA population, such modifications may improve diagnostic ML model outputs across racial lines.

Gichoya et al. created the following models: MXR Resnet34 CXP Resnet34, EMX Resnet34, MXR Resnet to CXP, MXR Resnet to EMX, CXP Resnet34 to MXR, CXP Resnet34 to EMX, EMX Resnet to34 to MXR, and EMX Resnet34 to CXP which were highly accurate in identifying patient race from chest X-ray, mammogram, and spine imaging. From the MXR dataset, Gichoya et al. found that when bone density was removed as a predictor for patient race, the resulting AUC value for Black patients was 0.960 with average pixel thresholds indicating no signals that could be used to detect race (15). Gichoya et al. concluded from this that race could not be identified in the images from pixel brightness and there was no difference for racial identity recognition found for patients among age and gender lines (15).

Alternatively, Ho et al. suggested the use of gland segmentation and ML classifiers as a method for anatomical identification of CRC risk sites from histopathological CRC imaging, posing a possible alternative to race-based identification (13). Ho et al. stated that gland segmentation of medical imaging allows for the analysis of smaller image patches and provides a greater amount of qualitative data compared to non-segmented images, which is useful for data analysis by either ML models or pathologists (13). Ho et al. proposed that race-based recognition of medical imaging may be improved by this modality, and work to improve ML models’ diagnostic capabilities (13).

Ho et al. detailed the unique nature of this particular study, in that segmentation was utilized as output, a novel approach in ML model creation (13). Ho et al. emphasized the advantage of segmentation over heatmaps, as segmentation allows images to be broken and analyzed in smaller parts, with more detailed annotations and overall, more information for pathologists (13).

Waljee et al. utilized a similar approach to gland segmentation called the bag of words (BoW) approach, where the ML model that was created had the purpose of determining specific tissue regions that were risk sites for the development of CRC (8). When provided with a greater amount of information though gland segmentation and similar tissue analysis processes, Waljee et al. found that pathologists have the ability to make more informed decisions in the diagnostic and treatment process for CRC (8). Thus, Waljee et al. concluded that segmentation of images allowed for the extraction and evaluation of a greater amount of image-based data, which may lead to more informed decision making for patients with CRC (8). The use of glandular segmentation provided not only advantages for medical imaging analysis in CRC ML models, but also in data output in terms of the greater amount of data that could be extracted from the CRC imaging that underwent glandular segmentation. Thus, the usage of this approach to decreasing the incidence of CRC may be beneficial for imaging involving BAA, as it may work to produce diagnostic outputs at higher accuracy when race is included as a predictor.

Adjusting imaging methods poses both advantages and drawbacks in several applications, where Khor et al. found that the smallest difference in FPR was found in the race-neutral model, the model which did not include race as a predictor for model output compared to the other three models which used race as an output predictor, with the largest difference belonging to the race-stratified model (5). Khor et al. found that when interaction terms were applied in addition to stratifying by race and ethnicity, overall FNR worsened but became more equal among racial subgroups (5). When race and ethnicity were used as predictors in the model, it led to a large difference in FPR and a worsened overall FNR, providing basis to the argument made by Khor et al. that the consideration of patient race in ML models has advantages in terms of racial bias reduction, but similarly possesses drawbacks in the form of inaccurate diagnostic outputs (5).

Ho et al. created an ML model to diagnose multi-stage CRC from histopathological imaging, where a similar trade off was discovered in the use of gland segmentation on CRC WSI (13). Although image segmentation allowed for an increase in true positive rate (TPR) of CRC WSI, it also led to an increase in FPR outputs in the model created by Ho et al. (13). Based on these findings, Ho et al. concluded that the use of gland segmentation must be monitored closely so that FPR and FNR do not increase in CRC ML models (13).

Another dual-sided novel approach to improving ML model diagnostic capabilities was discussed by Ho et al. where it was found that when prediction thresholds were manipulated in ML models diagnosing for multi-stage CRC, it led to the increase in specificity and sensitivity percentages (13). Ho et al. described that when this occurred, and the percentages reached over the 80% level, the WSIs were able to be classified as low or high risk (13). Ho et al. emphasized the adjustability of prediction thresholds, in that they are able to be changed to meet institutional operational requirements (13). When changed to meet these requirements, Ho et al. explained that WSIs are more likely to be classified correctly in histopathological CRC detection, although there is still room for error, found in the false positive outputs obtained by the model (13). Ho et al. concluded that when deployed with the appropriate prediction threshold, a model can attain a higher TPR, while at the same time FPR results increase, and overall, prediction thresholds can be adjusted to the appropriate institutional requirements to achieve the most accurate TPR (13). Manipulating prediction thresholds may possibly increase CRC ML model diagnostic capabilities in terms of higher attained specificity and sensitivity rates, however due to its novel nature further research may need to be completed on a wide array of ML AI models in order to determine the appropriate application of prediction threshold manipulation to diagnosing for CRC.

Nartowt et al. utilized seven different ML models including logistic regression, random forest, linear discriminant analysis, artificial neural network, linear kernel support vector machine, decision tree, and naïve Bayes (12). Nartowt et al. trained the models with data from the National Health Interview Survey as well as the Prostate, Lung, Colorectal, Ovarian Cancer Screening datasets (12). Nartowt et al. implemented methods for imputation among datasets that had missing data (12). Nartowt et al. discovered that following the imputation process, the artificial neural network possessed the best overall concordance, sensitivity, and specificity, similar to sensitivity and specificity findings made by Ho et al. (12,13). Nartowt et al. explained that any misclassifications that needed to be fixed were present in 2% of negative cases that were wrongly classified as high risk while 6% of positive cases were wrongly classified low risk (12). Nartowt et al. concluded that the artificial neural network was the most optimal for CRC screening and prevention purposes, and in addition presented a low-cost and effective method for potentially decreasing mortality related to CRC as well as increasing early-stage detection, in support with the conclusion made by Waljee et al. regarding the low-cost nature of AI implementation in the diagnosis of CRC (8,12).

Sample bias can lead a model to be able to make predictions for a general population, but not that for groups of minority, as argued by Vokinger et al. (7). Vokinger et al. proposed the use of continual learning, a process through which ML models may be improved by the incorporation and concurrent training from new data, in order to mitigate and reduce systematic error (7). Vokinger et al. concluded that it is imperative action is taken to correct existing biases in ML models and ensure that new models are not created from biased frameworks, as to not perpetuate a system of inequitable healthcare among individuals belonging to underrepresented communities (7). By conducting a study in sub-Saharan Africa, Waljee et al. explained the possible link between CRC diagnosis and patient race, where ML models created from American frameworks were not accurate in identifying CRC in the Black sub-Saharan African population (8). Thus, Waljee et al. made a similar conclusion to Vokinger et al. that ML models with bias require modification so that erroneous diagnostic output does not perpetuate racial bias in AI (7,8).

Discussion

Key findings

Rising incidence of CRC among the BAA population tied to etiologic and systemic factors including NIDDM, obesity, smoking, and a lack of resources, that perpetuate CRC incidence indicated a need for improvements in diagnosis of CRC among this population. The application of early treatment is associated with improved outcomes among patients with CRC, and thus environmental factors which limit image acquisition and data collection among the BAA population provides basis for the proposal of improvements to diagnostic measures, that of ML models through the mode of AI which diagnose for CRC based on race among adults in the United States. Since limited data acquisition led to poor model performance among the BAA population, changes to model creation yielded improvements to CRC diagnosis, observed through lowered AUC values and reduced racial bias scores when race and ethnicity were applied as predictors to CRC ML models compared to CRC ML models which did not have race inclusion.

Strengths and limitations

Limitations to the scope of the study include that research in the area of intersection of artificial intelligence and colorectal cancer is novel in nature. Thus, there is limited data available for study purposes and analysis. Due to this, conclusions made may not reflect the true relationship between proposed variables.

Implications and actions needed

Due to the importance of accurate diagnosis in CRC, it may be beneficial for diagnostic outputs from ML models utilizing race-based detection to be confirmed by clinical professionals. Further modifications to improving race-based CRC diagnosis may lie in the usage of glandular segmentation and manipulation of prediction thresholds of CRC histopathological or WSI in region and subsite-specific identification, as it allowed for more accurate diagnosis of CRC imaging among the BAA population.

Conclusions

Due to the novel nature of the application of ML models in the diagnosis of CRC, further research to improve disease diagnosis may be explored through models created for the purpose of gender-based diagnosis, since there are disparities in the United States across these lines for CRC. Additionally, an area of further study for race-based CRC diagnosis lies in technical modifications that can be made to ML models to improve model output when either race or race interaction terms are applied. Such modifications to CRC ML models aid in the improvement of accurate diagnosis for BAA individuals, a marginalized yet prominent population in the United States. It is essential that action is taken to implement and research such ML model modifications, so that novel systems of diagnosis do not contribute to existing inequitable health outcomes among the BAA population in the United States.

Acknowledgments

None.

Footnote

Reporting Checklist: The author has completed the PRISMA reporting checklist. Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-169/rc

Peer Review File: Available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-169/prf

Funding: None.

Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://jmai.amegroups.com/article/view/10.21037/jmai-24-169/coif). The author has no conflicts of interest to declare.

Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. There are no experiments performed in this study, and thus there is no need for external ethic board approval and informed consent.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Siegel RL, Wagle NS, Cercek A, et al. Colorectal cancer statistics, 2023. CA Cancer J Clin 2023;73:233-54. [Crossref] [PubMed]
Tobacco use in racial and ethnic populations. American Lung Assoc. n.d. Available online: https://www.lung.org/quit-smoking/smoking-facts/impact-of-tobacco-use/tobacco-use-racial-and-ethnic
Lofton H, Ard JD, Hunt RR, et al. Obesity among African American people in the United States: A review. Obesity (Silver Spring) 2023;31:306-15. [Crossref] [PubMed]
Rodríguez JE, Campbell KM. Racial and Ethnic Disparities in Prevalence and Care of Patients With Type 2 Diabetes. Clin Diabetes 2017;35:66-70. [Crossref] [PubMed]
Khor S, Haupt EC, Hahn EE, et al. Racial and Ethnic Bias in Risk Prediction Models for Colorectal Cancer Recurrence When Race and Ethnicity Are Omitted as Predictors. JAMA Netw Open 2023;6:e2318495. [Crossref] [PubMed]
Augustus GJ, Ellis NA. Colorectal Cancer Disparity in African Americans: Risk Factors and Carcinogenic Mechanisms. Am J Pathol 2018;188:291-303. [Crossref] [PubMed]
Vokinger KN, Feuerriegel S, Kesselheim AS. Mitigating bias in machine learning for medicine. Commun Med (Lond) 2021;1:25. [Crossref] [PubMed]
Waljee AK, Weinheimer-Haus EM, Abubakar A, et al. Artificial intelligence and machine learning for early detection and diagnosis of colorectal cancer in sub-Saharan Africa. Gut 2022;71:1259-65. [Crossref] [PubMed]
Colon cancer stages. The Ohio State University Comprehensive Cancer Center. n.d. Available online: https://cancer.osu.edu/for-patients-and-caregivers/learn-about-cancers-and-treatments/cancers-conditions-and-treatment/cancer-types/gastrointestinal-cancers/colon-cancer/stages
Goyal H, Mann R, Gandhi Z, et al. Scope of Artificial Intelligence in Screening and Diagnosis of Colorectal Cancer. J Clin Med 2020;9:3313. [Crossref] [PubMed]
Yin Z, Yao C, Zhang L, et al. Application of artificial intelligence in diagnosis and treatment of colorectal cancer: A novel Prospect. Front Med (Lausanne) 2023;10:1128084. [Crossref] [PubMed]
Nartowt BJ, Hart GR, Muhammad W, et al. Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification. Front Big Data 2020;3:6. [Crossref] [PubMed]
Ho C, Zhao Z, Chen XF, et al. A promising deep learning-assistive algorithm for histopathological screening of colorectal cancer. Sci Rep 2022;12:2222. [Crossref] [PubMed]
Murphy G, Devesa SS, Cross AJ, et al. Sex disparities in colorectal cancer incidence by anatomic subsite, race and age. Int J Cancer 2011;128:1668-75. [Crossref] [PubMed]
Gichoya JW, Banerjee I, Bhimireddy AR, et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health 2022;4:e406-e414. [Crossref] [PubMed]
Troisi RJ, Freedman AN, Devesa SS. Incidence of colorectal carcinoma in the U.S.: an update of trends by gender, race, age, subsite, and stage, 1975-1994. Cancer 1999;85:1670-6. [Crossref] [PubMed]
Bahrami H, Tafrihi M. Global trends of cancer: The role of diet, lifestyle, and environmental factors. Cancer Innov 2023;2:290-301. [Crossref] [PubMed]
Ogino S, Chan AT, Fuchs CS, et al. Molecular pathological epidemiology of colorectal neoplasia: an emerging transdisciplinary and interdisciplinary field. Gut 2011;60:397-411. [Crossref] [PubMed]
Chang A, Prabhala S, Daneshkhah A, et al. Early screening of colorectal cancer using feature engineering with artificial intelligence-enhanced analysis of nanoscale chromatin modifications. Sci Rep 2024;14:7808. [Crossref] [PubMed]

doi: 10.21037/jmai-24-169
Cite this article as: Nair A. Artificial intelligence in the diagnosis and prevention of multi-stage colorectal cancer: potential algorithmic bias and systematic mitigation. J Med Artif Intell 2025;8:63.

Artificial intelligence in the diagnosis and prevention of multi-stage colorectal cancer: potential algorithmic bias and systematic mitigation

Introduction

Background

Rationale and knowledge gap

Objective

Methods

Search strategy

Results

Discussion

Key findings

Strengths and limitations

Implications and actions needed

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share