2. 纽约大学医学院人类健康学系生物统计学教研室 纽约 10016
2. Division of Biostatistics, Department of Population Health, School of Medicine, New York University, New York 10016, USA
There are 75% of deaths from skin malignancies caused by melanomas[1]. The incidence of melanoma has been steadily increasing over the last two decades and is expected to continue to increase[2]. It is a worldwide burden, not only in the United States, Europe, and Australia, but also in Asian countries[3].
Although the incidence of melanoma in whites is significantly higher than that in other racial groups[4], the survival outcome was found to be worse in nonwhites in most studies[5-7]. However, it is still controversial whether race/ethnicity is an important prognostic factor for melanoma. For example, in a retrospective study of patients diagnosed with acral lentiginous melanoma (ALM) between 1987 and 2013 in Northern California, nonwhites demonstrated a lower hazard ratio (HR) of 0.88, as compared with White patients[8]. In contrast, Blacks were found to have a slightly higher risk of death (HR=1.1) than White patients diagnosed between 2004 and 2016 in the National Cancer Database[9]. Further exploration to explain worse survival outcomes in non-white groups is still lacking.
In this study, we carried out an in-depth investigation of whether race/ethnicity was a key prognostic factor using Cox proportional hazard (CoxPH) analysis and machine learning (ML) methods. As we found that race/ethnicity was not an independent factor in melanoma, we tested the interactions and correlations between races and other factors to explain the survival disadvantage in nonwhites.
Materials and MethodsData source and prefiltering We retrieved the data of patients with cutaneous melanoma diagnosed between 2004 and 2014 from the SEER database (SEER-18 Regs Custom registry, Nov 2018) using SEER*Stat 8.3.6 (https://seer.cancer.gov/data-software/). We collected data from 2004, because Collaborative Staging System was introduced in 2004, which is more precise in staging (https://seer.cancer.gov/tools/collabstaging/). Patients with the following records were preliminarily excluded: cancer was diagnosed at autopsy; race was classified as "unknown race"; follow-up survival month flag was "0 day of survival" or "unknown"; type of follow-up was "not active"; diagnostic confirmation was "not microscopically confirmed". We also excluded 49 926 patients who had more than one cancer or who were diagnosed with melanoma after another cancer diagnosis. In total, 149 115 patients with malignant melanoma of the skin were included in this study.
Variable selection Race/ethnicity was classified as non-Hispanic white (NHW), non-Hispanic black (NHB), non-Hispanic Asian and Pacific islander (NHAPI), non-Hispanic American Indian/Alaska native (NHAI/AN), and Hispanic. Covariates included demographic and social characteristics (age, sex, diagnosis year, poverty, education, and marital status), clinical factors (stage, tumor thickness, ulceration, histology subtype, tumor site, and metastasis status), treatment data (primary surgery, other surgery, chemotherapy, and radiation), and survival data (survival time in months and survival status). The outcome of the survival analysis was the time in months from diagnosis to death from any cause (overall survival) or death from melanoma (melanoma-specific survival). Stage was defined according to Derived AJCC Stage Group, 6th ed (2004-2015)[10].The detailed description of all variables in SEER can be found at https://seer.cancer.gov/manuals/2023/SPCSM_2023_MainDoc.pdf.
Race importance evaluation and interaction with prognostic factors Survival analysis and two ML models were used to determine whether race was an independent prognostic factor of melanoma and Logistic model was used to investigate the interaction effects between race and stage/ulceration factors. The Kaplan-Meier (KM) curve was used to describe overall survival and melanoma-specific survival[11] and the log-rank method was applied to test differences among racial groups. Since the follow-up consists of annual visits and lasts for a minimum of 5 years, 5-year survival rate was calculated using KM method considering the right-censored data. We used log[-log(survival)] curves to confirm whether covariates satisfied the proportional hazard assumption. Because NHAI/AN (n=313) violates the proportional hazards assumption of the Cox model, it was not been included in the Cox model for analysis. An univariable and multivariable CoxPH regression model was fitted (NHW is the reference race) and adjusted for pathological, demographic, surgical, and socioeconomic factors.
Two ML algorithms, random forest (RF) and gradient boosting machine (GBM), were implemented to predict melanoma-specific mortality with the same variables used in the CoxPH model. Since the data were substantially imbalanced with respect to survival outcome, we down-sampled alive patients by randomly selecting one-third of them and up-sampled twice as many dead patients. To obtain consensus feature importance, we sampled and trained the data 50 times with a ratio of 7:3 between the training and testing sets. 5-fold cross validation was used to optimize the hyperparameters through an exhaustive grid search. In each repetition, the model was refitted using the training set with optimal parameters and subsequently used to predict the survival status in the testing set. The average area under the receiver operating characteristic curve (AUC) and accuracy were calculated for both models.
A multivariable logistic regression model was fitted to evaluate the interaction effect between race/ethnicity and clinical factors (ulceration and stage) on the odds of melanoma-specific mortality (the odds of death). In both the early and later stages, the odds ratios (ORs) of melanoma-specific death (NHWs as the reference race) and 95% CIs were estimated across racial/ethnic groups, after adjusting for demographic and clinical factors.
Statistical test Collinearity between categorical variables was assessed using Cramer's V method, and Pearson's test was applied to continuous variables. The Kruskal-Wallis test was applied to examine racial differences in tumor thickness and poverty proportion, and the chi-squared test was used for ulceration and stage. For pairwise comparisons between NHW and non-white groups, the Wilcoxon rank-sum test and proportion test were performed on the tumor thickness/poverty proportion and ulceration/stage proportion, respectively.
Data availability statement The R code and raw data used for this analysis are available at https://github.com/MinxiBen/Ethinic-Disparity-in-Melanoma.
ResultsGeneral condition For 149 115 patients with malignant melanoma of the skin included in this study, the median follow-up period was 66 months. Comprehensive demographic characteristics of the examined population can be found in Supplementary Tab 1. The sample mostly included NHWs (95.1%), followed by Hispanics (3.5%), NHAPIs (0.7%), NHBs (0.5%), and NHAIs/ANs (0.2%). Histology varied substantially by race (P < 0.001). Superficial Spreading Melanoma subtype was much more common in NHWs (30.7%). Acral Lentiginous Melanoma was rare in NHWs (0.7%) but had a higher incidence in NHAPIs (12.3%) and NHBs (17.4%). Race showed low correlations with other variables (Cramer's V < 0.1; Supplementary Fig 1). Although the stage was correlated with several factors, it was retained for a more comprehensive adjustment. The discrepancies among races in terms of ulceration, stage, tumor thickness, and poverty proportion factors at diagnosis are shown in Fig 1. NHWs had thinner tumor (mean[sd]: 1.19 [1.61]) than other races (Kruskal-Wallis test, P < 0.001). The neighborhood poverty proportion was lower in NHWs than other racial groups except NHAPIs (Wilcoxon test, P < 0.001). In addition, NHWs had a lower proportion of ulceration (12.9%) and a lower proportion of stage Ⅲ/Ⅳ (11.1%) than other races (chi-squared test, P < 0.001).
Factor | HR | 95% CI | P |
Race | |||
NHW | Ref | ||
NHB | 1.21 | 1.02-1.43 | 0.03 |
NHAPI | 1.19 | 1-1.41 | 0.05 |
Hispanic | 1.15 | 1.05-1.26 | < 0.001 |
Sex | |||
Female | Ref | ||
Male | 1.33 | 1.27-1.39 | < 0.001 |
Age | 1.03 | 1.02-1.03 | < 0.001 |
Partner | |||
Partner | Ref | ||
No partner | 1.26 | 1.21-1.32 | < 0.001 |
Unknown | 0.83 | 0.77-0.89 | < 0.001 |
Site | |||
Upper limb and shoulder | Ref | ||
Lower limb and hip | 1.08 | 1.01-1.15 | 0.02 |
Trunk | 1.28 | 1.21-1.35 | < 0.001 |
Head and neck | 1.44 | 1.36-1.53 | < 0.001 |
Overlapping lesions | 1.72 | 0.89-3.31 | 0.11 |
Unknown | 1.58 | 1.39-1.81 | < 0.001 |
Histology | |||
Superficial spreading | Ref | ||
Lentigo maligna | 0.84 | 0.74-0.96 | 0.01 |
Desmoplastic | 0.84 | 0.72-0.97 | 0.02 |
Melanoma NOS | 1.17 | 1.11-1.24 | < 0.001 |
Amelanotic | 1.04 | 0.83-1.30 | 0.73 |
Nodular | 1.3 | 1.22-1.39 | < 0.001 |
Acral lentiginous | 1.56 | 1.38-1.77 | < 0.001 |
Poverty | 1.01 | 1.01-1.02 | < 0.001 |
Education | 1.01 | 1.00-1.01 | 0.01 |
Diagnosis year | 0.96 | 0.95-0.97 | < 0.001 |
Stage | |||
Ⅰ | Ref | ||
Ⅱ | 3.45 | 3.23-3.69 | < 0.001 |
Ⅲ | 8.74 | 8.19-9.32 | < 0.001 |
Ⅳ | 27.33 | 25.04-29.84 | < 0.001 |
Ulceration | |||
No | Ref | ||
Yes | 1.71 | 1.63-1.79 | < 0.001 |
Thickness | 1.11 | 1.1-1.2 | < 0.001 |
Other surgery | |||
Not performed | Ref | ||
Performed | 0.71 | 0.65-0.78 | < 0.001 |
Unknown | 1.32 | 0.73-2.38 | 0.36 |
Primary surgery | |||
No | Ref | ||
Yes | 0.48 | 0.44-0.53 | < 0.001 |
Radiation | |||
No | Ref | ||
Yes | 1.72 | 1.61-1.84 | < 0.001 |
Chemotherapy | |||
No | Ref | ||
Yes | 1.56 | 1.45-1.67 | < 0.001 |
Survival analysis In the overall and melanoma-specific KM survival curves stratified by race, NHBs had the worst survival compared to other races, followed by NHAPIs, Hispanics, and NHWs (Fig 2). The 5-year melanoma-specific survival rates for NHWs, NHBs, NHAPIs, and Hispanics were 90.1% (95%CI: 89.9%-90.3%), 69.3% (95%CI: 65.7%-73.0%), 76.2% (95%CI: 73.3%-79.1%) and 84.6% (95%CI: 83.6%-85.7%), respectively. In the unadjusted CoxPH model, mortality HRs in NHBs, NHAPIs, and Hispanics were 3.90 (95%CI: 3.31-4.60), 2.10 (95% CI: 2.03-2.85) and 1.59 (95% CI: 1.45-1.74), respectively, compared with NHWs. However, in the adjusted model, survival differences by race were attenuated and racial differences remained significant only for Hispanics compared with NHWs (1.15, 95%CI: 1.05-1.26, P < 0.001)(Tab 1). The HRs for NHAPIs and NHBs exhibited borderline significance (NHAPI: 1.19, 95%CI: 1.00-1.41, P=0.05; NHB: 1.21, 95%CI: 1.02-1.43, P=0.03)(Tab 1). Clinical factors (e.g., stage, ulceration, and tumor thickness, P < 0.001), demographic information (e.g., age and diagnosis year, P < 0.001), and socioeconomic factors, such as education and poverty, were still significant after adjustment (Tab 1).
Machine learning for importance of race factor Unlike traditional survival analysis, ML techniques are less affected by multicollinearity[12] and improve the accuracy of cancer prediction outcomes in clinical research[13]. We assessed the accuracy of the ML models before evaluating the importance of race in predicting survival status. The average ROC curves of both models obtained after 50 rounds of training are shown in supplementary Fig 2. The GBM model achieved a mean AUC of 0.915 6 (95%CI: 0.915 2-0.916 0) and an average accuracy of 83.97% (95%CI: 83.91%-84.03%). The RF model outperformed GBM, with an average AUC of 0.967 1 (95%CI: 0.966 8-0.967 4) and an average accuracy of 90.75% (95%CI: 90.70%-90.81%). In both ML models, tumor thickness was the most important factor. The order of the top 10 important variables was similar in the two models, except that education and poverty ranked below stage and ulceration in the GBM model (Fig 3). In contrast, race was a consistently low-ranking variable (not shown in the figure) in both models, suggesting that race/ethnicity may not be a key prognostic factor in melanoma. These results are consistent with those of the CoxPH model, in which clinical and social factors were statistically more significant than race after adjusting for other factors.
Logistic regression model Since race is not an independent prognostic factor for melanoma, racial disparity may be attributed to the interaction effects between race and other factors. In the Logistic regression model, all races except for NHAIs/ANs reported higher odds of death than NHWs in the early phase (stage Ⅰ/Ⅱ without ulceration) of melanoma (Fig 4). In contrast, NHAIs/ANs had higher odds of death than NHWs in advanced stages of cancer. The odds of death were significantly higher in NHAPIs, Hispanics, than that in NHWs without ulcerations. However, when being with ulceration, the odds of death in NHAPIs and Hispanics were not significantly different from that in NHWs. In contrast, NHAIs/ANs had significantly higher odds of death than NHWs with ulceration (Fig 4A). NHAPIs, Hispanics, NHBs had significantly higher odds of death than NHWs at stage Ⅰ/Ⅱ. Nevertheless, at stage Ⅲ/Ⅳ, their odds of death are not significantly different from NHWs'. On the contrary, NHAIs/ANs had higher odds of death than NHWs at stage Ⅲ/Ⅳ (Fig 4B).
DiscussionPrevious studies examining the effect of race as a prognostic factor for melanoma spanned a wide time period[5, 7]. However, the staging systems used in these studies were not as comprehensive or systematic as the Collaborative Stage system introduced in 2004. The SEER dataset utilized in our study benefited from the updated stage system for melanoma staging, which enhanced the objectivity and accuracy of our results. Compared to prior studies, our Cox model incorporated additional confounding factors such as education level, economic status, which increases the reliability of the results. The discrepancies in databases and adjusted factors may result in minor variations in the findings compared to previous studies. Through our analysis, we determined that race exerts only a marginal effect on disparities in melanoma survival. We also applied ML models with improved predictive performance to further support our conclusions in the Cox model. Racial disparities in survival may be attributed to the interaction between race and other factors.
As stage and ulceration are important prognostic factors in melanoma[14], they were selected to further investigate their interaction with race on melanoma-specific mortality by Logistic model, which was rarely analyzed in previous studies. In Logistic regression analysis, non-white exhibited higher mortality ORs diagnosed at the early stage or in the absence of ulceration as compared to the advanced stage or presence of ulceration, with whites as the reference group. This may be related to the rapid progression of melanoma from the early stage. On the one hand, this may be caused by divergent melanoma pathogenesis across races. For instance, ALM, as a major subtype in non-white, is associated with a higher propensity to metastasize[15], leading to a poorer prognosis in non-white at the early stage. In SEER, ALM accounts for 17.4% and 12.7% of melanomas in NHBs and NHAPIs, respectively, while only 0.7% of melanomas in NHWs are ALM. Although other cutaneous melanoma subtypes (e.g., SSM) have > 18 times more single-nucleotide variants than ALM, somatic structural variants are considerably more frequent in ALM, which implies a potential difference in the pathogenesis of melanoma among different racial groups[16].
On the other hand, certain socioeconomic factors not captured by SEER program may also exert an influence. First, the absence of timely treatment may lead to higher odds of death in non-whites at an early stage. In the National Cancer Database, blacks had more than twice the odds of enduring an extended waiting period from diagnosis to surgery than whites, and a prolonged period between diagnosis and surgery is associated with increased melanoma-specific mortality[17]. Second, an unfavorable insurance status may hinder access to timely and high-quality care. With a higher proportion of medicaid and a lower proportion of private insurance[18], non-white groups may experience difficulty in accessing superior preventative and healthcare services. This may, in turn, increase their odds of death for melanoma patients diagnosed at early stage. Additionally, the undocumented immigration status of non-white individuals may lead to diagnostic and therapeutic delays for melanoma, resulting in worse prognosis. Given that non-white groups are more likely to be diagnosed at later stages, they deserve more attention in screening for melanoma and early therapeutic intervention.
There are certain limitations in our study. The socioeconomic factors examined only reflect urban levels and do not truly represent individual economic and cultural backgrounds. The factors of insurance status and emigration status are not taken into account due to limitations within the database. Genetic prognostic factors and medical care variables may also be important confounding factors, but due to their lack of inclusion in the SEER database, the mechanisms underlying differences in survival between ethnic groups cannot be better analyzed and explained. Additional studies revealing the biological differences between races are required to quantitatively explain the survival disparity among races. Due to the imbalanced representation of data from various ethnic groups in the SEER dataset, there remains a need for additional data pertaining to non-white populations.
Authors' Contributions BEN Min-xi Data collection and analysis, visualization, conceptualization, writing, and revision; ZHOU Bo-yan Conceptualization, writing, and revision; LI Hui Conceptualization, writing, and revision.
Conflict of Interest Declaration All authors declare that there are no conflicts of interest.
[1] |
OSSIO R, ROLDAN-MARIN R, MARTINEZ-SAID H, et al. Melanoma: a global perspective[J]. Nat Rev Cancer, 2017, 17(7): 393-394.
[DOI]
|
[2] |
TRIPP MK, WATSON M, BALK SJ, et al. State of the science on prevention and screening to reduce melanoma incidence and mortality: the time is now[J]. CA Cancer J Clin, 2016, 66(6): 460-480.
[DOI]
|
[3] |
WU Y, WANG Y, WANG L, et al. Burden of melanoma in China, 1990—2017: findings from the 2017 global burden of disease study[J]. Int J Cancer, 2020, 147(3): 692-701.
[DOI]
|
[4] |
AMERICAN CANCER SOCIETY. Cancer facts and figures[R]. Atlanta: American Cancer Society, 2005.
|
[5] |
DAWES SM, TSAI S, GITTLEMAN H, et al. Racial disparities in melanoma survival[J]. J Am Acad Dermatol, 2016, 75(5): 983-991.
[DOI]
|
[6] |
CHE G, HUANG B, XIE Z, et al. Trends in incidence and survival in patients with melanoma, 1974—2013[J]. Am J Cancer Res, 2019, 9(7): 1396.
|
[7] |
WARD-PETERSON M, ACUNA JM, ALKHALIFAH MK, et al. Association between race/ethnicity and survival of melanoma patients in the United States over 3 decades: a secondary analysis of SEER data[J]. Medicine, 2016, 95(17): e3315.
[DOI]
|
[8] |
ASGARI M, SHEN L, SOKIL M, et al. Prognostic factors and survival in acral lentiginous melanoma[J]. Br J Dermatol, 2017, 177(2): 428-435.
[DOI]
|
[9] |
BEHBAHANI S, MADDUKURI S, CADWELL JB, et al. Gender differences in cutaneous melanoma: Demographics, prognostic factors, and survival outcomes[J]. Dermatol Ther, 2020, 33(6): e14131.
|
[10] |
GAL TJ, SILVER N, HUANG B. Demographics and treatment trends in sinonasal mucosal melanoma[J]. Laryngoscope, 2011, 121(9): 2026-2033.
[DOI]
|
[11] |
RICH JT, NEELY JG, PANIELLO RC, et al. A practical guide to understanding Kaplan-Meier curves[J]. Otolaryngol Head Neck Surg, 2010, 143(3): 331-336.
[DOI]
|
[12] |
DUMANCAS GG, BELLO G. Comparison of machine-learning techniques for handling multicollinearity in big data analytics and high-performance data mining[C]. International Conference for High Performance Computing Networking Storage and Analysis, 2015.
|
[13] |
RYU SM, LEE S-H, KIM E-S, et al. Predicting survival of patients with spinal ependymoma using machine learning algorithms with the SEER database[J]. World Neurosurg, 2019, 124: e331-e339.
[DOI]
|
[14] |
KEUNG EZ, GERSHENWALD JE. The eighth edition American Joint Committee on Cancer (AJCC) melanoma staging system: implications for melanoma treatment and care[J]. Expert Rev Anticancer Ther, 2018, 18(8): 775-784.
[DOI]
|
[15] |
BIAN SX, HWANG L, HWANG J, et al. Acral lentiginous melanoma-population, treatment, and survival using the NCDB from 2004 to 2015[J]. Pigment Cell Melanoma Res, 2021, 34(6): 1049-1061.
[DOI]
|
[16] |
HAYWARD NK, WILMOTT JS, WADDELL N, et al. Whole-genome landscapes of major melanoma subtypes[J]. Nature, 2017, 545(7653): 175-180.
[DOI]
|
[17] |
TRIPATHI R, ARCHIBALD LK, MAZMUDAR RS, et al. Racial differences in time to treatment for melanoma[J]. J Am Acad Dermatol, 2020, 83(3): 854-859.
[DOI]
|
[18] |
KOOISTRA L, CHIANG K, DAWES S, et al. Racial disparities and insurance status: an epidemiological analysis of Ohio melanoma patients[J]. J Am Acad Dermatol, 2018, 78(5): 998-1000.
[DOI]
|