User login
Addition of Durvalumab After Chemoradiotherapy Improves Progression-Free Survival in Unresectable Stage III Non-Small-Cell Lung Cancer
Study Overview
Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.
Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.
Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.
Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.
Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.
Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.
After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.
Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.
Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.
Commentary
Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].
The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.
Applications for Clinical Practice
In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.
—Daniel Isaac, DO, MS
1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.
2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.
3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.
Study Overview
Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.
Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.
Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.
Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.
Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.
Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.
After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.
Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.
Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.
Commentary
Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].
The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.
Applications for Clinical Practice
In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.
—Daniel Isaac, DO, MS
Study Overview
Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.
Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.
Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.
Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.
Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.
Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.
After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.
Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.
Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.
Commentary
Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].
The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.
Applications for Clinical Practice
In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.
—Daniel Isaac, DO, MS
1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.
2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.
3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.
1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.
2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.
3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.
Mepolizumab for Eosinophilic Chronic Obstructive Pulmonary Disease
Study Overview
Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.
Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).
Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).
Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.
Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.
Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.
A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.
Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.
There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).
There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.
Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.
Commentary
Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].
In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.
There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.
The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).
Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.
Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.
The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.
The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.
A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.
Applications for Clinical Practice
In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.
—Arun Jose, MD, The George Washington University, Washington, DC
1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.
2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.
3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.
4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.
5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.
6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.
7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.
Study Overview
Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.
Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).
Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).
Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.
Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.
Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.
A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.
Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.
There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).
There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.
Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.
Commentary
Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].
In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.
There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.
The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).
Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.
Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.
The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.
The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.
A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.
Applications for Clinical Practice
In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.
—Arun Jose, MD, The George Washington University, Washington, DC
Study Overview
Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.
Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).
Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).
Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.
Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.
Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.
A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.
Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.
There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).
There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.
Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.
Commentary
Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].
In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.
There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.
The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).
Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.
Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.
The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.
The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.
A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.
Applications for Clinical Practice
In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.
—Arun Jose, MD, The George Washington University, Washington, DC
1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.
2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.
3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.
4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.
5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.
6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.
7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.
1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.
2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.
3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.
4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.
5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.
6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.
7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.
Teens with PID underscreened for HIV, syphilis
CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.
Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.
The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.
The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.
In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.
Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.
The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).
In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.
Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.
Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.
By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.
Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.
“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.
Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.
SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.
CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.
Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.
The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.
The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.
In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.
Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.
The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).
In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.
Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.
Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.
By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.
Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.
“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.
Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.
SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.
CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.
Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.
The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.
The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.
In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.
Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.
The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).
In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.
Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.
Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.
By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.
Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.
“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.
Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.
SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.
REPORTING FROM AAP 2017
Key clinical point:
Major finding: Hispanic females were least likely to be screened (adjusted OR, 0.8), compared with non-Hispanic white females.
Study details: Retrospective study of 10,698 adolescent patients with PID from a national database.
Disclosures: The study was funded in part by the National Institute of Child Health and Development. The authors had no relevant financial disclosures.
Source: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine
Simple Patient Care Instructions Translate Best: Safety Guidelines for Physician Use of Google Translate
From the University of Arizona College of Medicine – Tucson, Tucson, AZ.
Abstract
- Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
- Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
- Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
- Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.
Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.
Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.
There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.
Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].
One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.
Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.
Methods
Patient Care Instructions
Original English PCIs
Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:
We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”
A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.
Machine Translation into Spanish
The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).
Machine Back Translation into English (MBTE)
Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.
Ratings of Translation Quality and Safety
Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.
The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.
Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.
Data Analysis
Descriptive Summary of PCI Contributions
The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).
Safety Analysis
Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.
An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.
Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.
Quality Assessment
Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.
We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.
Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.
Results
PCI Contributions
Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.
Safety
Concordance Analysis
The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.
Bilingual and Monolingual Safety Ratings
Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).
Identification of Unsafe Translations in Machine Spanish and MBTE
The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.
Original English c haracteristics of Unsafe Translation
A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.
Quality
Bilingual and Monolingual Raters Assessments of Quality
The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R2=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R2=0.565 (P = 0.000).
Discussion
In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.
We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).
The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?
The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?
Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.
Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.
Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.
In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.
A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.
It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.
Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.
Conclusion
This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.
A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.
The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.
Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, jmiller@eyes.arizona.edu.
Financial disclosures: None.
1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.
2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.
3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.
4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.
5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.
6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.
7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.
8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.
9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.
10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.
11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.
12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.
13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.
14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.
15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.
16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.
17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.
18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.
19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.
From the University of Arizona College of Medicine – Tucson, Tucson, AZ.
Abstract
- Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
- Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
- Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
- Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.
Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.
Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.
There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.
Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].
One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.
Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.
Methods
Patient Care Instructions
Original English PCIs
Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:
We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”
A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.
Machine Translation into Spanish
The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).
Machine Back Translation into English (MBTE)
Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.
Ratings of Translation Quality and Safety
Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.
The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.
Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.
Data Analysis
Descriptive Summary of PCI Contributions
The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).
Safety Analysis
Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.
An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.
Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.
Quality Assessment
Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.
We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.
Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.
Results
PCI Contributions
Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.
Safety
Concordance Analysis
The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.
Bilingual and Monolingual Safety Ratings
Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).
Identification of Unsafe Translations in Machine Spanish and MBTE
The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.
Original English c haracteristics of Unsafe Translation
A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.
Quality
Bilingual and Monolingual Raters Assessments of Quality
The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R2=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R2=0.565 (P = 0.000).
Discussion
In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.
We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).
The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?
The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?
Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.
Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.
Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.
In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.
A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.
It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.
Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.
Conclusion
This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.
A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.
The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.
Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, jmiller@eyes.arizona.edu.
Financial disclosures: None.
From the University of Arizona College of Medicine – Tucson, Tucson, AZ.
Abstract
- Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
- Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
- Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
- Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.
Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.
Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.
There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.
Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].
One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.
Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.
Methods
Patient Care Instructions
Original English PCIs
Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:
We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”
A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.
Machine Translation into Spanish
The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).
Machine Back Translation into English (MBTE)
Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.
Ratings of Translation Quality and Safety
Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.
The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.
Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.
Data Analysis
Descriptive Summary of PCI Contributions
The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).
Safety Analysis
Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.
An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.
Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.
Quality Assessment
Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.
We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.
Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.
Results
PCI Contributions
Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.
Safety
Concordance Analysis
The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.
Bilingual and Monolingual Safety Ratings
Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).
Identification of Unsafe Translations in Machine Spanish and MBTE
The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.
Original English c haracteristics of Unsafe Translation
A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.
Quality
Bilingual and Monolingual Raters Assessments of Quality
The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R2=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R2=0.565 (P = 0.000).
Discussion
In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.
We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).
The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?
The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?
Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.
Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.
Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.
In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.
A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.
It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.
Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.
Conclusion
This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.
A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.
The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.
Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, jmiller@eyes.arizona.edu.
Financial disclosures: None.
1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.
2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.
3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.
4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.
5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.
6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.
7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.
8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.
9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.
10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.
11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.
12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.
13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.
14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.
15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.
16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.
17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.
18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.
19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.
1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.
2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.
3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.
4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.
5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.
6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.
7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.
8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.
9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.
10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.
11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.
12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.
13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.
14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.
15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.
16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.
17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.
18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.
19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.
Nondrug Treatments May Benefit Patients With Epilepsy
WASHINGTON, DC—Many patients with pharmacoresistant epilepsy may benefit from nondrug treatments, including vagus nerve stimulation (VNS), the ketogenic diet, and corpus callosotomy, according to a study presented at the 71st Annual Meeting of the American Epilepsy Society. The treatments may reduce generalized and focal seizures, and most parents whose children underwent these procedures would opt for the same treatment under similar circumstances, the researchers said.
About 20% to 30% of patients have pharmacoresistant epilepsy. The ketogenic diet, corpus callosotomy, and VNS have been studied as alternatives to antiepileptic drugs (AEDs) for these patients, but few studies have compared the modalities.
Dave F. Clarke, MD, MBBS, Professor of Pediatric Neurology at the Baylor College of Medicine and Clinical Director of Epilepsy at Texas Children’s Hospital in Houston, and colleagues compared seizure control, cognitive and behavioral factors, quality of life, and parent satisfaction among patients who received VNS, underwent corpus callosotomy, or initiated the ketogenic diet. They identified 336 patients who had received one of these treatments at Dell Children’s Medical Center of Central Texas in Austin between January 2010 and November 2015. Parents of 210 of the patients completed a nine-item telephone survey.
Of the 210 patients whose parents completed the survey, 98 (33.6%) had initiated the ketogenic diet, 150 (51.4%) had received VNS, and 44 (15.1%) had undergone corpus callosotomy. Patients were between the ages of 8 months and 20 years. Patients who had initiated the ketogenic diet had a mean age of about 7, and patients who received VNS or underwent corpus callosotomy had a mean age of about 10. Patients had failed more than three AEDs on average (range, two to 13).
Parents reported a 50% or greater reduction in generalized seizures in 63% of patients who went on the ketogenic diet, 54% of patients who underwent corpus callosotomy, and 52% of patients who received VNS. Parents reported a 50% or greater reduction in focal seizures in 56% of children who went on the ketogenic diet, 56% of patients who had corpus callosotomy, and 53% of patients who received VNS.
In addition, parents reported improved quality of life in 48% of patients on the ketogenic diet, 63% of patients who had corpus callosotomy, and 44% of patients who received VNS. Overall, 80% of parents whose children were on the ketogenic diet or received VNS and 75% of parents whose children underwent corpus callosotomy reported that they were satisfied with the treatment that their child had received.
“Higher health-related quality of life after intervention was predicted by improved behavior, increased engagement, diminished frequency of atonic or generalized tonic-clonic seizures, and reduction in epilepsy-related injuries,” the researchers concluded. Parents were more likely to say that they would repeat the procedure if, after the treatment, “their child was more engaged, had diminished frequency of atonic or generalized tonic-clonic seizures, and had a reduction in epilepsy-related injuries.”
“Unfortunately, many doctors keep trying medications without considering alternatives,” said Dr. Clarke. “Based on the parents’ feedback, I would suggest doctors introduce the concept of alternatives after two AEDs fail to control seizures.” If surgery to ablate or remove the area of the brain where seizures originate is not an option, neurologists should talk to parents about the ketogenic diet, VNS, or corpus callosotomy. “If parents think the diet can be tolerated, trying it first may not be a bad option,” he said.
—Jake Remaly
WASHINGTON, DC—Many patients with pharmacoresistant epilepsy may benefit from nondrug treatments, including vagus nerve stimulation (VNS), the ketogenic diet, and corpus callosotomy, according to a study presented at the 71st Annual Meeting of the American Epilepsy Society. The treatments may reduce generalized and focal seizures, and most parents whose children underwent these procedures would opt for the same treatment under similar circumstances, the researchers said.
About 20% to 30% of patients have pharmacoresistant epilepsy. The ketogenic diet, corpus callosotomy, and VNS have been studied as alternatives to antiepileptic drugs (AEDs) for these patients, but few studies have compared the modalities.
Dave F. Clarke, MD, MBBS, Professor of Pediatric Neurology at the Baylor College of Medicine and Clinical Director of Epilepsy at Texas Children’s Hospital in Houston, and colleagues compared seizure control, cognitive and behavioral factors, quality of life, and parent satisfaction among patients who received VNS, underwent corpus callosotomy, or initiated the ketogenic diet. They identified 336 patients who had received one of these treatments at Dell Children’s Medical Center of Central Texas in Austin between January 2010 and November 2015. Parents of 210 of the patients completed a nine-item telephone survey.
Of the 210 patients whose parents completed the survey, 98 (33.6%) had initiated the ketogenic diet, 150 (51.4%) had received VNS, and 44 (15.1%) had undergone corpus callosotomy. Patients were between the ages of 8 months and 20 years. Patients who had initiated the ketogenic diet had a mean age of about 7, and patients who received VNS or underwent corpus callosotomy had a mean age of about 10. Patients had failed more than three AEDs on average (range, two to 13).
Parents reported a 50% or greater reduction in generalized seizures in 63% of patients who went on the ketogenic diet, 54% of patients who underwent corpus callosotomy, and 52% of patients who received VNS. Parents reported a 50% or greater reduction in focal seizures in 56% of children who went on the ketogenic diet, 56% of patients who had corpus callosotomy, and 53% of patients who received VNS.
In addition, parents reported improved quality of life in 48% of patients on the ketogenic diet, 63% of patients who had corpus callosotomy, and 44% of patients who received VNS. Overall, 80% of parents whose children were on the ketogenic diet or received VNS and 75% of parents whose children underwent corpus callosotomy reported that they were satisfied with the treatment that their child had received.
“Higher health-related quality of life after intervention was predicted by improved behavior, increased engagement, diminished frequency of atonic or generalized tonic-clonic seizures, and reduction in epilepsy-related injuries,” the researchers concluded. Parents were more likely to say that they would repeat the procedure if, after the treatment, “their child was more engaged, had diminished frequency of atonic or generalized tonic-clonic seizures, and had a reduction in epilepsy-related injuries.”
“Unfortunately, many doctors keep trying medications without considering alternatives,” said Dr. Clarke. “Based on the parents’ feedback, I would suggest doctors introduce the concept of alternatives after two AEDs fail to control seizures.” If surgery to ablate or remove the area of the brain where seizures originate is not an option, neurologists should talk to parents about the ketogenic diet, VNS, or corpus callosotomy. “If parents think the diet can be tolerated, trying it first may not be a bad option,” he said.
—Jake Remaly
WASHINGTON, DC—Many patients with pharmacoresistant epilepsy may benefit from nondrug treatments, including vagus nerve stimulation (VNS), the ketogenic diet, and corpus callosotomy, according to a study presented at the 71st Annual Meeting of the American Epilepsy Society. The treatments may reduce generalized and focal seizures, and most parents whose children underwent these procedures would opt for the same treatment under similar circumstances, the researchers said.
About 20% to 30% of patients have pharmacoresistant epilepsy. The ketogenic diet, corpus callosotomy, and VNS have been studied as alternatives to antiepileptic drugs (AEDs) for these patients, but few studies have compared the modalities.
Dave F. Clarke, MD, MBBS, Professor of Pediatric Neurology at the Baylor College of Medicine and Clinical Director of Epilepsy at Texas Children’s Hospital in Houston, and colleagues compared seizure control, cognitive and behavioral factors, quality of life, and parent satisfaction among patients who received VNS, underwent corpus callosotomy, or initiated the ketogenic diet. They identified 336 patients who had received one of these treatments at Dell Children’s Medical Center of Central Texas in Austin between January 2010 and November 2015. Parents of 210 of the patients completed a nine-item telephone survey.
Of the 210 patients whose parents completed the survey, 98 (33.6%) had initiated the ketogenic diet, 150 (51.4%) had received VNS, and 44 (15.1%) had undergone corpus callosotomy. Patients were between the ages of 8 months and 20 years. Patients who had initiated the ketogenic diet had a mean age of about 7, and patients who received VNS or underwent corpus callosotomy had a mean age of about 10. Patients had failed more than three AEDs on average (range, two to 13).
Parents reported a 50% or greater reduction in generalized seizures in 63% of patients who went on the ketogenic diet, 54% of patients who underwent corpus callosotomy, and 52% of patients who received VNS. Parents reported a 50% or greater reduction in focal seizures in 56% of children who went on the ketogenic diet, 56% of patients who had corpus callosotomy, and 53% of patients who received VNS.
In addition, parents reported improved quality of life in 48% of patients on the ketogenic diet, 63% of patients who had corpus callosotomy, and 44% of patients who received VNS. Overall, 80% of parents whose children were on the ketogenic diet or received VNS and 75% of parents whose children underwent corpus callosotomy reported that they were satisfied with the treatment that their child had received.
“Higher health-related quality of life after intervention was predicted by improved behavior, increased engagement, diminished frequency of atonic or generalized tonic-clonic seizures, and reduction in epilepsy-related injuries,” the researchers concluded. Parents were more likely to say that they would repeat the procedure if, after the treatment, “their child was more engaged, had diminished frequency of atonic or generalized tonic-clonic seizures, and had a reduction in epilepsy-related injuries.”
“Unfortunately, many doctors keep trying medications without considering alternatives,” said Dr. Clarke. “Based on the parents’ feedback, I would suggest doctors introduce the concept of alternatives after two AEDs fail to control seizures.” If surgery to ablate or remove the area of the brain where seizures originate is not an option, neurologists should talk to parents about the ketogenic diet, VNS, or corpus callosotomy. “If parents think the diet can be tolerated, trying it first may not be a bad option,” he said.
—Jake Remaly
Homelessness: Whose job is it?
Despite programs to end homelessness, it remains a substantial and growing problem in many cities in the United States.1,2 In 2016, there were an estimated 10,550 homeless people living in my home state of Colorado, a 6% increase from the prior year.2 A recent point-estimate study found that there were more than 5,000 homeless individuals in the Denver metropolitan area on a single night in January 2017.3 Because of the relative scarcity of housing, a growing number of cities like Denver now utilize a practice known as vulnerability indexing to prioritize homeless persons at high risk of mortality from medical conditions for placement in permanent supportive housing.4
Although hospitalists like myself frequently care for vulnerable homeless patients in the hospital, most have little formal training in how best to care for and advocate for these individuals beyond treating their acute medical need, and little direct contact with community organizations with expertise in doing so. Instead, we have learned informally through experience. Hospital providers are often frustrated by the perceived lack of services and support available to these patients, and there is substantial variability in the extent to which providers engage patients and community partners during and after hospitalization. Despite the growing practice of vulnerability indexing in the community, hospital-based providers do not routinely assess vulnerability with respect to housing. Previous research indicates that housing status is assessed in only a minority of homeless patients during their hospital stay.12 Thus, hospitalization often represents a missed opportunity to identify vulnerability and utilize it to connect patients with housing and other resources.
Addressing the significant known health disparities faced by homeless persons is one of the greatest health equity challenges of our time.13 We need better ways of understanding, identifying, and addressing vulnerability among homeless patients who are hospitalized, paired with improved integration with local community organizations. This will require moving beyond the idea that homelessness is the social worker’s job to one of shared responsibility and advocacy.
Collaborative research and other partnerships that engage both community organizations and individuals affected by homelessness are crucial to further understand the specific needs, barriers, challenges, and opportunities for improving hospital care and care transitions in this population. As well-respected community members and systems thinkers who witness these inequities on a daily basis, hospitalists are well positioned to help lead this work.
Dr. Stella is a hospitalist at Denver Health and Hospital Authority, and an associate professor of medicine at the University of Colorado. She is a member of The Hospitalist editorial advisory board.
References
1. Ending Chronic Homelessness. (Aug 2017). U.S. Interagency Council on Homelessness. Available at: https://www.usich.gov/goals/chronicsness. Accessed: Oct 21, 2017.
2. 2016 Annual Homeless Assessment Report (AHAR) to Congress. (Nov 2016). U.S. Department of Housing and Urban Development Office of Community Planning and Development, Part 1. Available at: https://www.hudexchange.info/resources/documents/2016-AHAR-Part-1.pdf. Accessed: Oct 21, 2017.
3. 2017 Point-In-Time Report, Seven-County Metro Denver Region. Metro Denver Homeless Initiative. Available at: http://www.mdhi.org/2017_pit. Accessed Oct 22, 2017.
4. Henwood BF et al. Examining mortality among formerly homeless adults enrolled in Housing First: An observational study. BMC Public Health. 2015;15:1209.
5. Weinstein LC et al. Moving from street to home: Health status of entrants to a Housing First program. J Prim Care Community Health. 2011;2:11–5.
6. Kushel MB et al. Factors associated with the health care utilization of homeless persons. JAMA. 2001;285(2):200-6.
7. Kushel MB et al. Emergency department use among the homeless and marginally housed: Results from a community-based study. Am J Public Health. 2002;92(5):778-84.
8. Baggett TP et al. Mortality among homeless adults in Boston: Shifts in causes of death over a 15-year period. JAMA Intern Med. 2013 Feb 11;173(3):189–95.
9. Johnson et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood). 2015 Aug;34(8):1312-9.
10. Durfee J et al. The impact of tailored intervention services on charges and mortality for adult super-utilizers. Healthc (Amst). 2017 Aug 25. pii: S2213-0764(17)30057-X. doi: 10.1016/j.hjdsi.2017.08.004. [Epub ahead of print]
11. Rinehart DJ et al. Identifying subgroups of adult super utilizers in an urban safety-net system using latent class analysis: Implications for clinical practice. Med Care. 2016 Sep 14. doi: 10.1097/MLR.0000000000000628. [Epub ahead of print]
12. Greysen RS et al. Understanding transitions of care from hospital to homeless shelter: A mixed-methods, community-based participatory approach. J Gen Intern Med. 2012;27(11):1484-91.
13. National Health Care for the Homeless Council. (Oct 2012). Improving Care Transitions for People Experiencing Homelessness. (Lead author: Sabrina Edgington, policy and program specialist.) Available at: www.nhchc.org/wp-content/uploads/2012/12/Policy_Brief_Care_Transitions.pdf. Accessed Oct 21, 2017.
14. Koh HK et al. Improving healthcare for homeless people. JAMA. 2016;316(24):2586-7.
Despite programs to end homelessness, it remains a substantial and growing problem in many cities in the United States.1,2 In 2016, there were an estimated 10,550 homeless people living in my home state of Colorado, a 6% increase from the prior year.2 A recent point-estimate study found that there were more than 5,000 homeless individuals in the Denver metropolitan area on a single night in January 2017.3 Because of the relative scarcity of housing, a growing number of cities like Denver now utilize a practice known as vulnerability indexing to prioritize homeless persons at high risk of mortality from medical conditions for placement in permanent supportive housing.4
Although hospitalists like myself frequently care for vulnerable homeless patients in the hospital, most have little formal training in how best to care for and advocate for these individuals beyond treating their acute medical need, and little direct contact with community organizations with expertise in doing so. Instead, we have learned informally through experience. Hospital providers are often frustrated by the perceived lack of services and support available to these patients, and there is substantial variability in the extent to which providers engage patients and community partners during and after hospitalization. Despite the growing practice of vulnerability indexing in the community, hospital-based providers do not routinely assess vulnerability with respect to housing. Previous research indicates that housing status is assessed in only a minority of homeless patients during their hospital stay.12 Thus, hospitalization often represents a missed opportunity to identify vulnerability and utilize it to connect patients with housing and other resources.
Addressing the significant known health disparities faced by homeless persons is one of the greatest health equity challenges of our time.13 We need better ways of understanding, identifying, and addressing vulnerability among homeless patients who are hospitalized, paired with improved integration with local community organizations. This will require moving beyond the idea that homelessness is the social worker’s job to one of shared responsibility and advocacy.
Collaborative research and other partnerships that engage both community organizations and individuals affected by homelessness are crucial to further understand the specific needs, barriers, challenges, and opportunities for improving hospital care and care transitions in this population. As well-respected community members and systems thinkers who witness these inequities on a daily basis, hospitalists are well positioned to help lead this work.
Dr. Stella is a hospitalist at Denver Health and Hospital Authority, and an associate professor of medicine at the University of Colorado. She is a member of The Hospitalist editorial advisory board.
References
1. Ending Chronic Homelessness. (Aug 2017). U.S. Interagency Council on Homelessness. Available at: https://www.usich.gov/goals/chronicsness. Accessed: Oct 21, 2017.
2. 2016 Annual Homeless Assessment Report (AHAR) to Congress. (Nov 2016). U.S. Department of Housing and Urban Development Office of Community Planning and Development, Part 1. Available at: https://www.hudexchange.info/resources/documents/2016-AHAR-Part-1.pdf. Accessed: Oct 21, 2017.
3. 2017 Point-In-Time Report, Seven-County Metro Denver Region. Metro Denver Homeless Initiative. Available at: http://www.mdhi.org/2017_pit. Accessed Oct 22, 2017.
4. Henwood BF et al. Examining mortality among formerly homeless adults enrolled in Housing First: An observational study. BMC Public Health. 2015;15:1209.
5. Weinstein LC et al. Moving from street to home: Health status of entrants to a Housing First program. J Prim Care Community Health. 2011;2:11–5.
6. Kushel MB et al. Factors associated with the health care utilization of homeless persons. JAMA. 2001;285(2):200-6.
7. Kushel MB et al. Emergency department use among the homeless and marginally housed: Results from a community-based study. Am J Public Health. 2002;92(5):778-84.
8. Baggett TP et al. Mortality among homeless adults in Boston: Shifts in causes of death over a 15-year period. JAMA Intern Med. 2013 Feb 11;173(3):189–95.
9. Johnson et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood). 2015 Aug;34(8):1312-9.
10. Durfee J et al. The impact of tailored intervention services on charges and mortality for adult super-utilizers. Healthc (Amst). 2017 Aug 25. pii: S2213-0764(17)30057-X. doi: 10.1016/j.hjdsi.2017.08.004. [Epub ahead of print]
11. Rinehart DJ et al. Identifying subgroups of adult super utilizers in an urban safety-net system using latent class analysis: Implications for clinical practice. Med Care. 2016 Sep 14. doi: 10.1097/MLR.0000000000000628. [Epub ahead of print]
12. Greysen RS et al. Understanding transitions of care from hospital to homeless shelter: A mixed-methods, community-based participatory approach. J Gen Intern Med. 2012;27(11):1484-91.
13. National Health Care for the Homeless Council. (Oct 2012). Improving Care Transitions for People Experiencing Homelessness. (Lead author: Sabrina Edgington, policy and program specialist.) Available at: www.nhchc.org/wp-content/uploads/2012/12/Policy_Brief_Care_Transitions.pdf. Accessed Oct 21, 2017.
14. Koh HK et al. Improving healthcare for homeless people. JAMA. 2016;316(24):2586-7.
Despite programs to end homelessness, it remains a substantial and growing problem in many cities in the United States.1,2 In 2016, there were an estimated 10,550 homeless people living in my home state of Colorado, a 6% increase from the prior year.2 A recent point-estimate study found that there were more than 5,000 homeless individuals in the Denver metropolitan area on a single night in January 2017.3 Because of the relative scarcity of housing, a growing number of cities like Denver now utilize a practice known as vulnerability indexing to prioritize homeless persons at high risk of mortality from medical conditions for placement in permanent supportive housing.4
Although hospitalists like myself frequently care for vulnerable homeless patients in the hospital, most have little formal training in how best to care for and advocate for these individuals beyond treating their acute medical need, and little direct contact with community organizations with expertise in doing so. Instead, we have learned informally through experience. Hospital providers are often frustrated by the perceived lack of services and support available to these patients, and there is substantial variability in the extent to which providers engage patients and community partners during and after hospitalization. Despite the growing practice of vulnerability indexing in the community, hospital-based providers do not routinely assess vulnerability with respect to housing. Previous research indicates that housing status is assessed in only a minority of homeless patients during their hospital stay.12 Thus, hospitalization often represents a missed opportunity to identify vulnerability and utilize it to connect patients with housing and other resources.
Addressing the significant known health disparities faced by homeless persons is one of the greatest health equity challenges of our time.13 We need better ways of understanding, identifying, and addressing vulnerability among homeless patients who are hospitalized, paired with improved integration with local community organizations. This will require moving beyond the idea that homelessness is the social worker’s job to one of shared responsibility and advocacy.
Collaborative research and other partnerships that engage both community organizations and individuals affected by homelessness are crucial to further understand the specific needs, barriers, challenges, and opportunities for improving hospital care and care transitions in this population. As well-respected community members and systems thinkers who witness these inequities on a daily basis, hospitalists are well positioned to help lead this work.
Dr. Stella is a hospitalist at Denver Health and Hospital Authority, and an associate professor of medicine at the University of Colorado. She is a member of The Hospitalist editorial advisory board.
References
1. Ending Chronic Homelessness. (Aug 2017). U.S. Interagency Council on Homelessness. Available at: https://www.usich.gov/goals/chronicsness. Accessed: Oct 21, 2017.
2. 2016 Annual Homeless Assessment Report (AHAR) to Congress. (Nov 2016). U.S. Department of Housing and Urban Development Office of Community Planning and Development, Part 1. Available at: https://www.hudexchange.info/resources/documents/2016-AHAR-Part-1.pdf. Accessed: Oct 21, 2017.
3. 2017 Point-In-Time Report, Seven-County Metro Denver Region. Metro Denver Homeless Initiative. Available at: http://www.mdhi.org/2017_pit. Accessed Oct 22, 2017.
4. Henwood BF et al. Examining mortality among formerly homeless adults enrolled in Housing First: An observational study. BMC Public Health. 2015;15:1209.
5. Weinstein LC et al. Moving from street to home: Health status of entrants to a Housing First program. J Prim Care Community Health. 2011;2:11–5.
6. Kushel MB et al. Factors associated with the health care utilization of homeless persons. JAMA. 2001;285(2):200-6.
7. Kushel MB et al. Emergency department use among the homeless and marginally housed: Results from a community-based study. Am J Public Health. 2002;92(5):778-84.
8. Baggett TP et al. Mortality among homeless adults in Boston: Shifts in causes of death over a 15-year period. JAMA Intern Med. 2013 Feb 11;173(3):189–95.
9. Johnson et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood). 2015 Aug;34(8):1312-9.
10. Durfee J et al. The impact of tailored intervention services on charges and mortality for adult super-utilizers. Healthc (Amst). 2017 Aug 25. pii: S2213-0764(17)30057-X. doi: 10.1016/j.hjdsi.2017.08.004. [Epub ahead of print]
11. Rinehart DJ et al. Identifying subgroups of adult super utilizers in an urban safety-net system using latent class analysis: Implications for clinical practice. Med Care. 2016 Sep 14. doi: 10.1097/MLR.0000000000000628. [Epub ahead of print]
12. Greysen RS et al. Understanding transitions of care from hospital to homeless shelter: A mixed-methods, community-based participatory approach. J Gen Intern Med. 2012;27(11):1484-91.
13. National Health Care for the Homeless Council. (Oct 2012). Improving Care Transitions for People Experiencing Homelessness. (Lead author: Sabrina Edgington, policy and program specialist.) Available at: www.nhchc.org/wp-content/uploads/2012/12/Policy_Brief_Care_Transitions.pdf. Accessed Oct 21, 2017.
14. Koh HK et al. Improving healthcare for homeless people. JAMA. 2016;316(24):2586-7.
Continue to opt for HDT/ASCT for multiple myeloma
High-dose therapy with melphalan followed by autologous stem cell transplant (HDT/ASCT) is still the best option for multiple myeloma even after almost 2 decades with newer and highly effective induction agents, according to a recent systematic review and two meta-analyses.
Given the “unprecedented efficacy” of “modern induction therapy with immunomodulatory drugs and proteasome inhibitors (also called ‘novel agents’),” investigators “have sought to reevaluate the role of HDT/ASCT,” wrote Binod Dhakal, MD, of the Medical College of Wisconsin, and his colleagues. The report is in JAMA Oncology.
To solve the issue, they analyzed five randomized controlled trials conducted since 2000 and concluded that HDT/ASCT is still the preferred treatment approach.
Despite a lack of demonstrable overall survival benefit, there is a significant progression-free survival (PFS) benefit, low treatment-related mortality, and potential high minimal residual disease-negative rates conferred by HDT/ASCT in newly-diagnosed multiple myeloma, the researchers noted.
The combined odds for complete response were 1.27 (95% confidence interval, 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT). The combined hazard ratio (HR) for PFS was 0.55 (95% CI, 0.41-0.7, P less than .001) and 0.76 for overall survival (95% CI, 0.42-1.36, P = .20) in favor of HDT.
PFS was best with tandem HDT/ASCT (HR, 0.49, 95% CI, 0.37-0.65) followed by single HDT/ASCT with bortezomib, lenalidomide, and dexamethasone consolidation (HR, 0.53, 95% CI, 0.37-0.76) and single HDT/ASCT alone (HR, 0.68, 95% CI, 0.53-0.87), compared with SDT. However, none of the HDT/ASCT approaches had a significant impact on overall survival.
Meanwhile, treatment-related mortality with HDT/ASCT was minimal, at less than 1%.
“The achievement of high [minimal residual disease] rates with HDT/ASCT may render this approach the ideal platform for testing novel approaches (e.g., immunotherapy) aiming at disease eradication and cures,” the researchers wrote.
The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
SOURCE: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
High-dose therapy with melphalan followed by autologous stem cell transplant (HDT/ASCT) is still the best option for multiple myeloma even after almost 2 decades with newer and highly effective induction agents, according to a recent systematic review and two meta-analyses.
Given the “unprecedented efficacy” of “modern induction therapy with immunomodulatory drugs and proteasome inhibitors (also called ‘novel agents’),” investigators “have sought to reevaluate the role of HDT/ASCT,” wrote Binod Dhakal, MD, of the Medical College of Wisconsin, and his colleagues. The report is in JAMA Oncology.
To solve the issue, they analyzed five randomized controlled trials conducted since 2000 and concluded that HDT/ASCT is still the preferred treatment approach.
Despite a lack of demonstrable overall survival benefit, there is a significant progression-free survival (PFS) benefit, low treatment-related mortality, and potential high minimal residual disease-negative rates conferred by HDT/ASCT in newly-diagnosed multiple myeloma, the researchers noted.
The combined odds for complete response were 1.27 (95% confidence interval, 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT). The combined hazard ratio (HR) for PFS was 0.55 (95% CI, 0.41-0.7, P less than .001) and 0.76 for overall survival (95% CI, 0.42-1.36, P = .20) in favor of HDT.
PFS was best with tandem HDT/ASCT (HR, 0.49, 95% CI, 0.37-0.65) followed by single HDT/ASCT with bortezomib, lenalidomide, and dexamethasone consolidation (HR, 0.53, 95% CI, 0.37-0.76) and single HDT/ASCT alone (HR, 0.68, 95% CI, 0.53-0.87), compared with SDT. However, none of the HDT/ASCT approaches had a significant impact on overall survival.
Meanwhile, treatment-related mortality with HDT/ASCT was minimal, at less than 1%.
“The achievement of high [minimal residual disease] rates with HDT/ASCT may render this approach the ideal platform for testing novel approaches (e.g., immunotherapy) aiming at disease eradication and cures,” the researchers wrote.
The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
SOURCE: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
High-dose therapy with melphalan followed by autologous stem cell transplant (HDT/ASCT) is still the best option for multiple myeloma even after almost 2 decades with newer and highly effective induction agents, according to a recent systematic review and two meta-analyses.
Given the “unprecedented efficacy” of “modern induction therapy with immunomodulatory drugs and proteasome inhibitors (also called ‘novel agents’),” investigators “have sought to reevaluate the role of HDT/ASCT,” wrote Binod Dhakal, MD, of the Medical College of Wisconsin, and his colleagues. The report is in JAMA Oncology.
To solve the issue, they analyzed five randomized controlled trials conducted since 2000 and concluded that HDT/ASCT is still the preferred treatment approach.
Despite a lack of demonstrable overall survival benefit, there is a significant progression-free survival (PFS) benefit, low treatment-related mortality, and potential high minimal residual disease-negative rates conferred by HDT/ASCT in newly-diagnosed multiple myeloma, the researchers noted.
The combined odds for complete response were 1.27 (95% confidence interval, 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT). The combined hazard ratio (HR) for PFS was 0.55 (95% CI, 0.41-0.7, P less than .001) and 0.76 for overall survival (95% CI, 0.42-1.36, P = .20) in favor of HDT.
PFS was best with tandem HDT/ASCT (HR, 0.49, 95% CI, 0.37-0.65) followed by single HDT/ASCT with bortezomib, lenalidomide, and dexamethasone consolidation (HR, 0.53, 95% CI, 0.37-0.76) and single HDT/ASCT alone (HR, 0.68, 95% CI, 0.53-0.87), compared with SDT. However, none of the HDT/ASCT approaches had a significant impact on overall survival.
Meanwhile, treatment-related mortality with HDT/ASCT was minimal, at less than 1%.
“The achievement of high [minimal residual disease] rates with HDT/ASCT may render this approach the ideal platform for testing novel approaches (e.g., immunotherapy) aiming at disease eradication and cures,” the researchers wrote.
The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
SOURCE: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
FROM JAMA ONCOLOGY
Key clinical point:
Major finding: The combined odds for complete response were 1.27 (95% CI 0.97-1.65, P = .07) with HDT/ASCT, compared with standard-dose therapy (SDT).
Study details: A systematic review and two meta-analyses examining five phase 3 clinical trials reported since 2000.
Disclosures: The researchers reported relationships with a number of companies, including Takeda, Celgene, and Amgen, that make novel induction agents.
Source: Dhakal B et al. JAMA Oncol. 2018 Jan 4. doi: 10.1001/jamaoncol.2017.4600.
Folic acid and multivitamin supplements associated with reduced autism risk
Taking folic acid and/or multivitamin supplements preceding and during pregnancy is associated with a lower risk of offspring developing autism spectrum disorder (ASD), an observational epidemiologic study published Jan. 3 showed.
The findings could have important public health implications, reported Stephen Z. Levine, PhD, and his associates.
The investigators found that 572 children, or 1.3%, received an ASD diagnosis. Dr. Levine and his associates found that children whose mothers took folic acid and multivitamin supplements during pregnancy had a lower risk of developing ASD (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001), compared with those whose mothers took no supplements. Similarly, there was reduced risk among those whose mothers took only folic acid during pregnancy (RR, 0.32; CI, 0.26-0.41; P less than .001) or only multivitamins (RR, 0.35; CI, 0.28-0.44; P less than .001). Likewise, lower risks were seen among offspring whose mothers took supplements before pregnancy: Compared with no supplements, the RR was 0.39 for folic acid and/or multivitamins (CI, 0.30-0.50; P less than .001), 0.56 for just folic acid (95%CI, 0.42-0.74; P = .001), and 0.36 for just multivitamins (95%CI, 0.24-0.52; P less than .001). Similar associations were found among male and female offspring.
“This finding may reflect noncompliance, higher rates of vitamin deficiency, or poor diet among persons with psychiatric conditions,” wrote Dr. Levine, of the department of community mental health at the University of Haifa, Israel, and his associates in JAMA Psychiatry.
Another important finding is that maternal exposure to folic acid and multivitamin supplements 2 years before pregnancy is tied to a lower ASD risk.
The investigators acknowledged that the study was limited by their inability to determine possible confounding factors, such as the vehicle of vitamin dispensations, use of over-the-counter supplements, false-positive classifications from noncompliance, and absence of information on gestational age. In addition, they said, “causality cannot be inferred from observational studies such as this one.” In light of those limitations, investigators said, additional studies replicating these findings are needed.
The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
ezimmerman@frontlinemedcom.com
SOURCE: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Taking folic acid and/or multivitamin supplements preceding and during pregnancy is associated with a lower risk of offspring developing autism spectrum disorder (ASD), an observational epidemiologic study published Jan. 3 showed.
The findings could have important public health implications, reported Stephen Z. Levine, PhD, and his associates.
The investigators found that 572 children, or 1.3%, received an ASD diagnosis. Dr. Levine and his associates found that children whose mothers took folic acid and multivitamin supplements during pregnancy had a lower risk of developing ASD (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001), compared with those whose mothers took no supplements. Similarly, there was reduced risk among those whose mothers took only folic acid during pregnancy (RR, 0.32; CI, 0.26-0.41; P less than .001) or only multivitamins (RR, 0.35; CI, 0.28-0.44; P less than .001). Likewise, lower risks were seen among offspring whose mothers took supplements before pregnancy: Compared with no supplements, the RR was 0.39 for folic acid and/or multivitamins (CI, 0.30-0.50; P less than .001), 0.56 for just folic acid (95%CI, 0.42-0.74; P = .001), and 0.36 for just multivitamins (95%CI, 0.24-0.52; P less than .001). Similar associations were found among male and female offspring.
“This finding may reflect noncompliance, higher rates of vitamin deficiency, or poor diet among persons with psychiatric conditions,” wrote Dr. Levine, of the department of community mental health at the University of Haifa, Israel, and his associates in JAMA Psychiatry.
Another important finding is that maternal exposure to folic acid and multivitamin supplements 2 years before pregnancy is tied to a lower ASD risk.
The investigators acknowledged that the study was limited by their inability to determine possible confounding factors, such as the vehicle of vitamin dispensations, use of over-the-counter supplements, false-positive classifications from noncompliance, and absence of information on gestational age. In addition, they said, “causality cannot be inferred from observational studies such as this one.” In light of those limitations, investigators said, additional studies replicating these findings are needed.
The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
ezimmerman@frontlinemedcom.com
SOURCE: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Taking folic acid and/or multivitamin supplements preceding and during pregnancy is associated with a lower risk of offspring developing autism spectrum disorder (ASD), an observational epidemiologic study published Jan. 3 showed.
The findings could have important public health implications, reported Stephen Z. Levine, PhD, and his associates.
The investigators found that 572 children, or 1.3%, received an ASD diagnosis. Dr. Levine and his associates found that children whose mothers took folic acid and multivitamin supplements during pregnancy had a lower risk of developing ASD (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001), compared with those whose mothers took no supplements. Similarly, there was reduced risk among those whose mothers took only folic acid during pregnancy (RR, 0.32; CI, 0.26-0.41; P less than .001) or only multivitamins (RR, 0.35; CI, 0.28-0.44; P less than .001). Likewise, lower risks were seen among offspring whose mothers took supplements before pregnancy: Compared with no supplements, the RR was 0.39 for folic acid and/or multivitamins (CI, 0.30-0.50; P less than .001), 0.56 for just folic acid (95%CI, 0.42-0.74; P = .001), and 0.36 for just multivitamins (95%CI, 0.24-0.52; P less than .001). Similar associations were found among male and female offspring.
“This finding may reflect noncompliance, higher rates of vitamin deficiency, or poor diet among persons with psychiatric conditions,” wrote Dr. Levine, of the department of community mental health at the University of Haifa, Israel, and his associates in JAMA Psychiatry.
Another important finding is that maternal exposure to folic acid and multivitamin supplements 2 years before pregnancy is tied to a lower ASD risk.
The investigators acknowledged that the study was limited by their inability to determine possible confounding factors, such as the vehicle of vitamin dispensations, use of over-the-counter supplements, false-positive classifications from noncompliance, and absence of information on gestational age. In addition, they said, “causality cannot be inferred from observational studies such as this one.” In light of those limitations, investigators said, additional studies replicating these findings are needed.
The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
ezimmerman@frontlinemedcom.com
SOURCE: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Key clinical point: Taking folic acid and multivitamin supplements before and during pregnancy can reduce risk of autism in children.
Major finding: Children whose mothers took folic acid and/or multivitamin supplements during pregnancy had a decreased risk of developing ASD, compared with those whose mothers did not (relative risk, 0.27; 95% confidence interval, 0.22-0.33; P less than .001).
Study details: Observational epidemiologic study of 45,300 Israeli children born between January 2003 and December 2007 and followed until January 2015.
Disclosures: The study was funded by several entities, including the National Institutes of Health, the Fredrik and Ingrid Thuring Foundation, and the Swedish Society of Medicine. Dr. Levine reported receiving support from Shire Pharmaceuticals, and coauthor Arad Kodesh, MD, is an employee of Meuhedet Health Services. No other relevant financial disclosures were reported.
Source: Levine SZ et al. JAMA Psychiatry. 2018 Jan 3. doi: 10.1001/jamapsychiatry.2017.4050.
Sacred cows
Within the academic pediatric community, there is little argument that the concepts “evidence based” and “early intervention” are gold standards against which we must measure our efforts.
It should be obvious to everyone that if we can intervene early in a child’s developmental trajectory, our chances of affecting his/her outcome are improved. And the earlier the better. If we aren’t supremely committed to prevention, then what sets pediatrics apart from the other specialties?
Likewise, if we aren’t willing to systematically measure our efforts at improving the health of our patients, we run the risk of simply spinning our wheels and even worse, squandering our patients’ time and their parents’ energies. However, a recent article in Pediatrics and a companion commentary suggest that we need to be more careful as we interpret the buzz that surrounds the terms “early intervention” and “evidence based.”
In their one-sentence conclusion of a paper reviewing 48 studies of early intervention in early childhood development, the authors observe, “Although several interventions resulted in improved child development outcomes age 0 to 3 years, comparison across studies and interventions is limited by the use of different outcome measures, time of evaluation, and variability of results” (“Primary Care Interventions for Early Childhood Development: A Systematic Review,” Peacock-Chambers et al. Pediatrics. 2017, Nov 14. doi: 10.1542/peds.2017-1661). Unless you are looking for another reason to slip further into an abyss of despair, I urge that you skip reading the details of the Peacock-Chambers paper and turn instead to Dr. Jack P. Shonkoff’s excellent commentary (“Rethinking the Definition of Evidence-Based Interventions to Promote Early Child Development,” Pediatrics. 2017, Dec. doi: 10.1542/peds.2017-3136).
Dr. Shonkoff observes that there is ample evidence to support the general concept of early intervention as it relates to childhood development. However, he acknowledges that the improvements observed generally have been small. And there has been little success in scaling these few successes to larger populations. It would seem that the sacred cow of early intervention remains standing, albeit on somewhat shaky legs.
Dr. Shonkoff points out that an obsession with statistical significance often has blinded some of us to the importance of the magnitude of (or the lack of) impact when interpreting studies of early intervention. As a result, we may have failed to realize how far research in early childhood development has fallen behind the other fields of biomedical research such as cancer, HIV, and AIDS. His plea is that we begin to leverage our successes in fields such as molecular biology, epigenetics, and neuroscience when designing future studies of early childhood development. He asserts that this kind of basic science – in concert with “on-the-ground experience” (that’s you and me) and “authentic parental engagement” – is more likely to result in greater scalable impact for our patients threatened by developmental delays.
It is refreshing and encouraging reading a critical consideration of the evidence-based sacred cow. Evidence can be viewed from a variety of perspectives. If we continue to filter all of our observations through a statistical significance filter, we run the risk of missing both the forest and the trees.
Dr. Wilkoff practiced primary care pediatrics in Brunswick, Maine for nearly 40 years. He has authored several books on behavioral pediatrics, including “How to Say No to Your Toddler.” Email him at pdnews@frontlinemedcom.com.
Within the academic pediatric community, there is little argument that the concepts “evidence based” and “early intervention” are gold standards against which we must measure our efforts.
It should be obvious to everyone that if we can intervene early in a child’s developmental trajectory, our chances of affecting his/her outcome are improved. And the earlier the better. If we aren’t supremely committed to prevention, then what sets pediatrics apart from the other specialties?
Likewise, if we aren’t willing to systematically measure our efforts at improving the health of our patients, we run the risk of simply spinning our wheels and even worse, squandering our patients’ time and their parents’ energies. However, a recent article in Pediatrics and a companion commentary suggest that we need to be more careful as we interpret the buzz that surrounds the terms “early intervention” and “evidence based.”
In their one-sentence conclusion of a paper reviewing 48 studies of early intervention in early childhood development, the authors observe, “Although several interventions resulted in improved child development outcomes age 0 to 3 years, comparison across studies and interventions is limited by the use of different outcome measures, time of evaluation, and variability of results” (“Primary Care Interventions for Early Childhood Development: A Systematic Review,” Peacock-Chambers et al. Pediatrics. 2017, Nov 14. doi: 10.1542/peds.2017-1661). Unless you are looking for another reason to slip further into an abyss of despair, I urge that you skip reading the details of the Peacock-Chambers paper and turn instead to Dr. Jack P. Shonkoff’s excellent commentary (“Rethinking the Definition of Evidence-Based Interventions to Promote Early Child Development,” Pediatrics. 2017, Dec. doi: 10.1542/peds.2017-3136).
Dr. Shonkoff observes that there is ample evidence to support the general concept of early intervention as it relates to childhood development. However, he acknowledges that the improvements observed generally have been small. And there has been little success in scaling these few successes to larger populations. It would seem that the sacred cow of early intervention remains standing, albeit on somewhat shaky legs.
Dr. Shonkoff points out that an obsession with statistical significance often has blinded some of us to the importance of the magnitude of (or the lack of) impact when interpreting studies of early intervention. As a result, we may have failed to realize how far research in early childhood development has fallen behind the other fields of biomedical research such as cancer, HIV, and AIDS. His plea is that we begin to leverage our successes in fields such as molecular biology, epigenetics, and neuroscience when designing future studies of early childhood development. He asserts that this kind of basic science – in concert with “on-the-ground experience” (that’s you and me) and “authentic parental engagement” – is more likely to result in greater scalable impact for our patients threatened by developmental delays.
It is refreshing and encouraging reading a critical consideration of the evidence-based sacred cow. Evidence can be viewed from a variety of perspectives. If we continue to filter all of our observations through a statistical significance filter, we run the risk of missing both the forest and the trees.
Dr. Wilkoff practiced primary care pediatrics in Brunswick, Maine for nearly 40 years. He has authored several books on behavioral pediatrics, including “How to Say No to Your Toddler.” Email him at pdnews@frontlinemedcom.com.
Within the academic pediatric community, there is little argument that the concepts “evidence based” and “early intervention” are gold standards against which we must measure our efforts.
It should be obvious to everyone that if we can intervene early in a child’s developmental trajectory, our chances of affecting his/her outcome are improved. And the earlier the better. If we aren’t supremely committed to prevention, then what sets pediatrics apart from the other specialties?
Likewise, if we aren’t willing to systematically measure our efforts at improving the health of our patients, we run the risk of simply spinning our wheels and even worse, squandering our patients’ time and their parents’ energies. However, a recent article in Pediatrics and a companion commentary suggest that we need to be more careful as we interpret the buzz that surrounds the terms “early intervention” and “evidence based.”
In their one-sentence conclusion of a paper reviewing 48 studies of early intervention in early childhood development, the authors observe, “Although several interventions resulted in improved child development outcomes age 0 to 3 years, comparison across studies and interventions is limited by the use of different outcome measures, time of evaluation, and variability of results” (“Primary Care Interventions for Early Childhood Development: A Systematic Review,” Peacock-Chambers et al. Pediatrics. 2017, Nov 14. doi: 10.1542/peds.2017-1661). Unless you are looking for another reason to slip further into an abyss of despair, I urge that you skip reading the details of the Peacock-Chambers paper and turn instead to Dr. Jack P. Shonkoff’s excellent commentary (“Rethinking the Definition of Evidence-Based Interventions to Promote Early Child Development,” Pediatrics. 2017, Dec. doi: 10.1542/peds.2017-3136).
Dr. Shonkoff observes that there is ample evidence to support the general concept of early intervention as it relates to childhood development. However, he acknowledges that the improvements observed generally have been small. And there has been little success in scaling these few successes to larger populations. It would seem that the sacred cow of early intervention remains standing, albeit on somewhat shaky legs.
Dr. Shonkoff points out that an obsession with statistical significance often has blinded some of us to the importance of the magnitude of (or the lack of) impact when interpreting studies of early intervention. As a result, we may have failed to realize how far research in early childhood development has fallen behind the other fields of biomedical research such as cancer, HIV, and AIDS. His plea is that we begin to leverage our successes in fields such as molecular biology, epigenetics, and neuroscience when designing future studies of early childhood development. He asserts that this kind of basic science – in concert with “on-the-ground experience” (that’s you and me) and “authentic parental engagement” – is more likely to result in greater scalable impact for our patients threatened by developmental delays.
It is refreshing and encouraging reading a critical consideration of the evidence-based sacred cow. Evidence can be viewed from a variety of perspectives. If we continue to filter all of our observations through a statistical significance filter, we run the risk of missing both the forest and the trees.
Dr. Wilkoff practiced primary care pediatrics in Brunswick, Maine for nearly 40 years. He has authored several books on behavioral pediatrics, including “How to Say No to Your Toddler.” Email him at pdnews@frontlinemedcom.com.
Minorities less likely to seek treatment for psoriasis
Black, Asian, and other non-Hispanic Americans are less likely than are whites to seek treatment for psoriasis, according to data on 842 patients, reported Alexander H. Fischer, MD, of the University of Pennsylvania, Philadelphia, and his colleagues.
Data from previous studies have shown that racial and ethnic minorities have more severe psoriasis and a lower quality of life as a result of the disease, compared with white patients, the researchers noted in a study published as a research letter in the Journal of the American Academy of Dermatology.
A total of 51% of non-Hispanic whites with psoriasis sought treatment from a dermatologist, compared with 47% of Hispanic whites and 38% of non-Hispanic minorities (blacks, Asians, native Hawaiians, Pacific Islanders, and others). In addition, non-Hispanic minorities had significantly fewer ambulatory visits for psoriasis per year than did whites (a mean of 1.30 visits vs. 2.69 visits). Black, Asian, and other non-Hispanic minorities were about 40% less likely than were non-Hispanic whites to seek care for psoriasis.
The number of psoriasis prescriptions obtained was not significantly different among the racial/ethnic groups, the researchers reported.
The study is important because of the lack of data on psoriasis in nonwhite populations, senior author Junko Takeshita, MD, PhD, also of the University of Pennsylvania, said in an interview.
“Based on a few existing studies, we know that psoriasis is less common among minorities, but minorities, particularly blacks, may have more severe disease,” she said. “Also, minorities report poorer quality of life due to psoriasis than whites, independent of psoriasis severity. Furthermore, we previously published a study among Medicare beneficiaries with psoriasis that revealed that blacks are about 70% less likely to receive biologic therapies than whites, independent of socioeconomic status and access to medical care,” she added.
“The take-home message for clinicians is that while psoriasis is less common among minorities than whites, minorities may suffer from a larger burden of disease, yet have fewer visits and are less likely to see a dermatologist for their psoriasis,” Dr. Takeshita said. “This disparity in health care utilization for psoriasis does not seem to be entirely explained by racial/ethnic differences in socioeconomic status and health insurance. It is yet unknown why this disparity exists, and I’m not sure that minority patients being ‘hesitant to pursue care’ is the entire answer, though it may be a contributing factor,” she noted.
The study findings were limited by several factors including the relatively small sample size and the use of self-reports.
Many factors could be contributing to the disparity, including patient, physician/other health care provider, and health care system factors, but “once we identify the major causes of the disparity, we can develop methods to address the causes and reduce the disparity,” said Dr. Takeshita, who is a dermatologist and an epidemiologist. In the meantime, she added, “some things I think that are important to ensure equitable care for psoriasis are making sure that clinicians/dermatologists are comfortable diagnosing and treating psoriasis in nonwhite individuals, and encouraging clinicians to help increase awareness of psoriasis by educating their minority patients that psoriasis is still a common skin disease among nonwhite individuals.”
The study was supported in part by the National Institute of Arthritis and Musculoskeletal and Skin Diseases. Dr. Takeshita has received a research grant from Pfizer; she and another author, Joel Gelfand, MD, have received payment for psoriasis-related continuing medical education work supported indirectly by Eli Lilly; Dr. Gelfand’s other disclosures included serving as a consultant for, and having received research grants from, several other pharmaceutical companies. Dr. Fischer, a medical student at Johns Hopkins University, Baltimore, at the time of the research, and a fourth author had no financial disclosures.
SOURCE: Fischer AH et al. J Am Acad Dermatol. 2018 Jan;78[1]:200-3. doi: 10.1016/j.jaad.2017.07.052.
Black, Asian, and other non-Hispanic Americans are less likely than are whites to seek treatment for psoriasis, according to data on 842 patients, reported Alexander H. Fischer, MD, of the University of Pennsylvania, Philadelphia, and his colleagues.
Data from previous studies have shown that racial and ethnic minorities have more severe psoriasis and a lower quality of life as a result of the disease, compared with white patients, the researchers noted in a study published as a research letter in the Journal of the American Academy of Dermatology.
A total of 51% of non-Hispanic whites with psoriasis sought treatment from a dermatologist, compared with 47% of Hispanic whites and 38% of non-Hispanic minorities (blacks, Asians, native Hawaiians, Pacific Islanders, and others). In addition, non-Hispanic minorities had significantly fewer ambulatory visits for psoriasis per year than did whites (a mean of 1.30 visits vs. 2.69 visits). Black, Asian, and other non-Hispanic minorities were about 40% less likely than were non-Hispanic whites to seek care for psoriasis.
The number of psoriasis prescriptions obtained was not significantly different among the racial/ethnic groups, the researchers reported.
The study is important because of the lack of data on psoriasis in nonwhite populations, senior author Junko Takeshita, MD, PhD, also of the University of Pennsylvania, said in an interview.
“Based on a few existing studies, we know that psoriasis is less common among minorities, but minorities, particularly blacks, may have more severe disease,” she said. “Also, minorities report poorer quality of life due to psoriasis than whites, independent of psoriasis severity. Furthermore, we previously published a study among Medicare beneficiaries with psoriasis that revealed that blacks are about 70% less likely to receive biologic therapies than whites, independent of socioeconomic status and access to medical care,” she added.
“The take-home message for clinicians is that while psoriasis is less common among minorities than whites, minorities may suffer from a larger burden of disease, yet have fewer visits and are less likely to see a dermatologist for their psoriasis,” Dr. Takeshita said. “This disparity in health care utilization for psoriasis does not seem to be entirely explained by racial/ethnic differences in socioeconomic status and health insurance. It is yet unknown why this disparity exists, and I’m not sure that minority patients being ‘hesitant to pursue care’ is the entire answer, though it may be a contributing factor,” she noted.
The study findings were limited by several factors including the relatively small sample size and the use of self-reports.
Many factors could be contributing to the disparity, including patient, physician/other health care provider, and health care system factors, but “once we identify the major causes of the disparity, we can develop methods to address the causes and reduce the disparity,” said Dr. Takeshita, who is a dermatologist and an epidemiologist. In the meantime, she added, “some things I think that are important to ensure equitable care for psoriasis are making sure that clinicians/dermatologists are comfortable diagnosing and treating psoriasis in nonwhite individuals, and encouraging clinicians to help increase awareness of psoriasis by educating their minority patients that psoriasis is still a common skin disease among nonwhite individuals.”
The study was supported in part by the National Institute of Arthritis and Musculoskeletal and Skin Diseases. Dr. Takeshita has received a research grant from Pfizer; she and another author, Joel Gelfand, MD, have received payment for psoriasis-related continuing medical education work supported indirectly by Eli Lilly; Dr. Gelfand’s other disclosures included serving as a consultant for, and having received research grants from, several other pharmaceutical companies. Dr. Fischer, a medical student at Johns Hopkins University, Baltimore, at the time of the research, and a fourth author had no financial disclosures.
SOURCE: Fischer AH et al. J Am Acad Dermatol. 2018 Jan;78[1]:200-3. doi: 10.1016/j.jaad.2017.07.052.
Black, Asian, and other non-Hispanic Americans are less likely than are whites to seek treatment for psoriasis, according to data on 842 patients, reported Alexander H. Fischer, MD, of the University of Pennsylvania, Philadelphia, and his colleagues.
Data from previous studies have shown that racial and ethnic minorities have more severe psoriasis and a lower quality of life as a result of the disease, compared with white patients, the researchers noted in a study published as a research letter in the Journal of the American Academy of Dermatology.
A total of 51% of non-Hispanic whites with psoriasis sought treatment from a dermatologist, compared with 47% of Hispanic whites and 38% of non-Hispanic minorities (blacks, Asians, native Hawaiians, Pacific Islanders, and others). In addition, non-Hispanic minorities had significantly fewer ambulatory visits for psoriasis per year than did whites (a mean of 1.30 visits vs. 2.69 visits). Black, Asian, and other non-Hispanic minorities were about 40% less likely than were non-Hispanic whites to seek care for psoriasis.
The number of psoriasis prescriptions obtained was not significantly different among the racial/ethnic groups, the researchers reported.
The study is important because of the lack of data on psoriasis in nonwhite populations, senior author Junko Takeshita, MD, PhD, also of the University of Pennsylvania, said in an interview.
“Based on a few existing studies, we know that psoriasis is less common among minorities, but minorities, particularly blacks, may have more severe disease,” she said. “Also, minorities report poorer quality of life due to psoriasis than whites, independent of psoriasis severity. Furthermore, we previously published a study among Medicare beneficiaries with psoriasis that revealed that blacks are about 70% less likely to receive biologic therapies than whites, independent of socioeconomic status and access to medical care,” she added.
“The take-home message for clinicians is that while psoriasis is less common among minorities than whites, minorities may suffer from a larger burden of disease, yet have fewer visits and are less likely to see a dermatologist for their psoriasis,” Dr. Takeshita said. “This disparity in health care utilization for psoriasis does not seem to be entirely explained by racial/ethnic differences in socioeconomic status and health insurance. It is yet unknown why this disparity exists, and I’m not sure that minority patients being ‘hesitant to pursue care’ is the entire answer, though it may be a contributing factor,” she noted.
The study findings were limited by several factors including the relatively small sample size and the use of self-reports.
Many factors could be contributing to the disparity, including patient, physician/other health care provider, and health care system factors, but “once we identify the major causes of the disparity, we can develop methods to address the causes and reduce the disparity,” said Dr. Takeshita, who is a dermatologist and an epidemiologist. In the meantime, she added, “some things I think that are important to ensure equitable care for psoriasis are making sure that clinicians/dermatologists are comfortable diagnosing and treating psoriasis in nonwhite individuals, and encouraging clinicians to help increase awareness of psoriasis by educating their minority patients that psoriasis is still a common skin disease among nonwhite individuals.”
The study was supported in part by the National Institute of Arthritis and Musculoskeletal and Skin Diseases. Dr. Takeshita has received a research grant from Pfizer; she and another author, Joel Gelfand, MD, have received payment for psoriasis-related continuing medical education work supported indirectly by Eli Lilly; Dr. Gelfand’s other disclosures included serving as a consultant for, and having received research grants from, several other pharmaceutical companies. Dr. Fischer, a medical student at Johns Hopkins University, Baltimore, at the time of the research, and a fourth author had no financial disclosures.
SOURCE: Fischer AH et al. J Am Acad Dermatol. 2018 Jan;78[1]:200-3. doi: 10.1016/j.jaad.2017.07.052.
FROM THE JOURNAL OF THE AMERICAN ACADEMY OF DERMATOLOGY
Key clinical point: Black, Asian, and non-Hispanic patients with psoriasis often have more severe disease than do white patients but are significantly less likely to seek care.
Major finding:
Data source: A cohort study of data from the Medical Expenditure Panel Survey on 842 psoriasis patients in the United States.
Disclosures: The study was supported in part by the National Institute of Arthritis and Musculoskeletal and Skin Diseases. Two of the four authors had no financial disclosures. One author has received a research grant from Pfizer and payment for psoriasis-related continuing medical education work supported indirectly by Eli Lilly; another author’s disclosures included the latter, as well serving as a consultant for, and having received research grants from, several other pharmaceutical companies.
Source: Fischer AH et al. J Am Acad Dermatol. 2018 Jan;78[1]:200-3. doi: 10.1016/j.jaad.2017.07.05