| MDedge

Addition of Durvalumab After Chemoradiotherapy Improves Progression-Free Survival in Unresectable Stage III Non-Small-Cell Lung Cancer

Article Type

Changed

Wed, 04/29/2020 - 11:32

Author(s)

Study Overview

Objective. To evaluate the efficacy of the PD-L1 antibody durvalumab in the treatment of patients with unresectable stage III non-small-cell lung cancer (NSCLC) following completion of standard chemoradiotherapy.

Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.

Setting and participants. A total of 709 patients underwent randomization between May 2014 and April 2016. Eligible patients had histologically proven stage III, locally advanced and unresectable NSCLC with no evidence of disease progression following chemoradiotherapy. The enrolled patients had received at least 2 cycles of platinum-based chemotherapy concurrently with definitive radiation therapy (54 Gy to 66 Gy). Initially, patients were randomized within 2 weeks of completing radiation; however, the protocol was amended to allow randomization up to 42 days following completion of therapy. Patients were not eligible if they had previous exposure to anti-PD-1 or PD-L1 antibodies or active or prior autoimmune disease in the last 2 years. All patients were required to have an WHO performance status of 0 or 1. The patients were stratified at randomization by age (< 65 or > 65 years), sex and smoking status. Enrollment was not restricted to level of PD-L1 expression.

Intervention. Patients were randomized in a 2:1 ratio to receive consolidation durvalumab 10 mg/kg or placebo every 2 weeks for up to 12 months. The intervention was discontinued if there was evidence of confirmed disease progression, treatment with an alternative anticancer therapy, toxicity or patient preference. The response to treatment was assessed every 8 weeks for the first year and then every 12 weeks thereafter.

Main outcome measures. The primary endpoints of the study were progression-free survival (PFS) by blinded independent review and overall survival (OS). Secondary endpoints were the percentage of patients alive without disease progression at 12 and 18 months, objective response rate, duration of response, safety, and time to death or metastasis. Patients were given the option to provide archived tumor specimens for PD-L1 testing.

Results. The baseline characteristics were balanced. The median age at enrollment was 64 years and 91% of the patients were current or former smokers. The vast majority of patients (> 99% in both groups) received concurrent chemoradiotherapy. The response to initial concurrent therapy was similar in both groups with complete response rates of 1.9% and 3% in the durvalumab and placebo groups, respectively, and partial response rates of 48.7% and 46.8%. Archived tumor samples showed ≥ 25% PD-L1 expression in 22.3% of patients (24% in durvalumab group versus 18.6% in placebo group) and < 25% in 41% of patients (39.3%% in durvalumab group versus 44.3% in placebo group). PD-L1 status was unknown in 36.7% of the enrolled patients. Of note, 6% of patients enrolled had EGFR mutations.

After a median follow-up of 14.5 months, the median PFS was 16.8 months with durvalumab versus 5.6 months with placebo (P < 0.001; hazard ratio [HR] 0.52, 95% confidence interval [CI] 0.42–0.65). The 12-month PFS rate was 55.9% and 35.3% in the durvalumab and placebo group, respectively. The 18-month PFS rate was 44.2% and 27% in the durvalumab and placebo group, respectively. The PFS results were consistent across all subgroups. The PFS benefit was observed regardless of PD-L1 expression. The median time to death or metastasis was 23.2 months in the durvalumab group versus 14.6 months with placebo (HR 0.52; 95% CI 0.39–0.69). The objective response rate was significantly higher in the durvalumab group (28.4% vs. 16%, P < 0.001). The median duration of response was longer with durvalumab. Of the patients who responded to durvalumab, 73% had ongoing response at 18 months compared with 47% in the placebo group. OS was not assessed at this interm analysis.

Adverse events (AE) of any grade occurred in over approximately 95% in both groups. Grade 3 or 4 AE occurred in 29.9% in the durvalumab group and 26.1% in the placebo group. The most common grade 3 or 4 AE was pneumonia, occurring in about 4% of patients in each group. More patients in the durvalumab group discontinued treatment (15.4% vs 9.8%). Death due to an AE occurred in 4.4% of the durvalumab group and 5.6% of the placebo group. The most frequent AE leading to discontinuation was pneumonitis or radiation pneumonitis and pneumonia. Pneumonitis or radiation pneumonitis occurred in 33.9% (3.4% grade 3 or 4) and 24.8% (2.6% grade 3 or 4) of the durvalumab and placebo groups, respectively. Immune-mediated AE of any grade were more common in the duvalumab group occurring in 24% of patients (vs. 8% in placebo). Of these, 14% of patients in the durvalumab group required glucocorticoids compared with 4.3% in the placebo group. The most AE of interest was diarrhea, which occurred in 18% of the patients in both groups.

Conclusion. The addition of consolidative durvalumab following completion of concurrent chemoradiotherapy in patients with stage III, locally advanced NSCLC significantly improved PFS without a significant increase in treatment-related adverse events.

Commentary

Pre-clinical evidence has suggested that chemotherapy and radiation therapy may lead to upregulation of PD-L1 expression by tumor cells leading to increased PD-L1 mediated T cell apoptosis [1,2]. Given prior studies documenting PD-L1 expression as a predictive biomarker for response to durvalumab, the authors of the current trial hypothesized that the addition of durvalumab after chemoradiotherapy would provide clinical benefit likely mediated by upregulation of PD-L1. The results from this pre-planned interim analysis show a significant improvement in progression-free survival with the addition of durvalumab with a 48% decrease in the risk of progression. This benefit was noted across all patient subgroups. In addition, responses to durvalumab were durable, with 72% of the patients who responded having an ongoing response at 18 months. Interestingly, the response to durvalumab was independent of PD-L1 expression, which is in contrast to previous studies showing PD-L1 expression to be a good biomarker for durvalumab response [3].

The results of the PACIFIC trial represent a clinically meaningful benefit and suggests an excellent option for patients with unresectable stage III NSCLC. One important point to highlight is that the addition of durvalumab was well tolerated and did not appear to significantly increase the rate of severe adverse events. Of particular interest is the similar rates of grade 3 or 4 pneumonitis, which appeared to be around 3% for each group. Overall survival data remain immature at the time of this analysis; however, given the acceptable toxicity profile and improved PFS this combination should be considered for these patients in clinical practice. Ongoing trials are underway to evaluate the role of single-agent durvalumab in the front-line setting for NSCLC.

Applications for Clinical Practice

In patients with unresectable stage III NSCLC who have no evidence of disease progression following completion of chemoradiotherapy, the addition of durvalumab provided a significant and clinically meaningful improvement in progression-free survival without an increase in serious adverse events. While the overall survival data is immature, the 48% improvement in progression-free survival supports the incorporation of durvalumab into standard practice in this patient population.

—Daniel Isaac, DO, MS

References

1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.

2. Zhang P, Su DM, Liang M, Fu J. Chemopreventive agents induce programmed death-1-ligand 1 (PD-L1) surface expression in breast cancer cells and promote PD-L1 mediated T cell apoptosis. Mol Immun 2008;45:1470–6.

3. Antonia SJ, Brahmer JR, Khleif S, et al. Phase ½ [What should this be? 3?]study of the safety and clinical activity of durvalumab in patients with non-small cell lung cancer (NSCLC). Presented at the 41st European Society for Medical Oncology Annual Meeting, Copenhagen, October 7–11 2016.

Article PDF

JCOM02501007.PDF

Issue

Journal of Clinical Outcomes Management - 25(1)

Publications

Journal of Clinical Outcomes Management

Topics

Oncology

Read more about Addition of Durvalumab After Chemoradiotherapy Improves Progression-Free Survival in Unresectable Stage III Non-Small-Cell Lung Cancer

Sections

Outcomes Research in Review

Author(s)

Author(s)

Article PDF

Article PDF

Study Overview

Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.

Commentary

Applications for Clinical Practice

—Daniel Isaac, DO, MS

Study Overview

Design. Interim analysis of the phase III PACIFIC study, a randomized, double-blind, international study.

Commentary

Applications for Clinical Practice

—Daniel Isaac, DO, MS

References

1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.

References

1. Deng L, Liang H, Burnette B, et al. Irradiation and anti-PD-L1 treatment synergistically promote antitumor immunity in mice. J Clin Invest2014;124:687–95.

Issue

Journal of Clinical Outcomes Management - 25(1)

Issue

Journal of Clinical Outcomes Management - 25(1)

Publications

Journal of Clinical Outcomes Management

Publications

Journal of Clinical Outcomes Management

Topics

Oncology

Article Type

Article

Sections

Outcomes Research in Review

Disallow All Ads

Content Gating

No Gating (article Unlocked/Free)

Alternative CME

Disqus Comments

Default

Consolidated Pubs: Do Not Show Source Publication Logo

Use ProPublica

Article PDF Media

JCOM02501007.PDF

Teambase ID

18000D00.SIG

Mepolizumab for Eosinophilic Chronic Obstructive Pulmonary Disease

Article Type

Article

Changed

Wed, 04/29/2020 - 11:30

Author(s)

Arun Jose, MD

Study Overview

Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.

Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).

Setting and participants. Participants were recruited from over 15 countries in over 100 investigative sites. Inclusion criteria were adults (40 years or older) with a diagnosis of COPD for at least 1 year with: airflow limitation (FEV1/FVC < 0.7); some bronchodilator reversibility (post-bronchodilator FEV1 > 20% and ≤ 80% of predicted values); current COPD therapy for at least 3 months prior to enrollment (a high-dose inhaled corticosteroid, ICS, with at least 2 other classes of medications, to obtain “triple therapy”); and a high risk of exacerbations (at least 1 severe [requiring hospitalization] or 2 moderate [treatment with systemic corticosteroids and/or antibiotics] exacerbations in past year).

Notable exclusion criteria were patients with diagnoses of asthma in never-smokers, alpha-1 antitrypsin deficiency, recent exacerbations (in past month), lung volume reduction surgery (in past year), eosinophilic or parasitic diseases, or those with recent monoclonal antibody treatment. Patients with the asthma-COPD overlap syndrome were included only if they had a history of smoking and met the COPD inclusion criteria listed above.

Intervention. The treatment period lasted for a total of 52 weeks, with an additional 8 weeks of follow-up. Patients were randomized 1:1 to placebo or low-dose medication (100 mg) using permuted-block randomization in the METREX study regardless of eosinophil count (but they were stratified for a modified intention-to-treat analysis at screening into either low eosinophilic count [< 150 cells/uL] or high [≥ 150 cells/uL]). In the METREO study, patients were randomized 1:1:1 to placebo, low-dose (100 mg), or high-dose (300 mg) medication only if blood eosinophilia was present (≥ 150 cells/uL at screening or ≥ 300 cells/uL in past 12 months). Investigators and patients were blinded to presence of drug or placebo. Sample size calculations indicated that in order to provide a 90% power to detect a 30% decrease in the rate of exacerbations in METREX and 35% decrease in METREO, a total of 800 patients and 660 patients would need to be enrolled in METREX and METREO respectively. Both studies met their enrollment quota.

Main outcome measures. The primary outcome was the annual rate of exacerbations that were either moderate (requiring systemic corticosteroids and/or antibiotics) or severe (requiring hospitalization). Secondary outcomes included the time to first moderate/severe exacerbation, change from baseline in the COPD Assessment Test (CAT) and St. George’s Respiratory Questionnaire (SGRQ), and change from baseline in blood eosinophil count, FEV1, and FVC. Safety and adverse events endpoints were also assessed.

A modified intention-to-treat analysis was performed overall and in the METREX study stratified on eosinophilic count at screening; all patients who underwent randomization and received at least one dose of medication or placebo were included in that respective group. Multiple comparisons were accounted for using the Benjamini-Hochberg Test, exacerbations were assumed to follow a negative binomial distribution, and Cox proportional-hazards was used to model the relationship between covariates of interest and the primary outcome.

Main results. In the METREX study, 1161 patients were enrolled and 836 underwent randomization and received at least 1 dose of medication or placebo. In METREO, 1071 patients were enrolled and 674 underwent randomization and received at least one dose of medication or placebo. In both studies the patients in the medication and placebo groups were well balanced at baseline across demographics (age, gender, smoking history, duration of COPD) and pulmonary function (FEV1, FVC, FEV1/FVC, CAT, SGRQ). In METREX, a total of 462 (55%) patients had an eosinophilic phenotype and 374 (45%) did not.

There was no difference between groups in the primary endpoint of annual exacerbation rate in METREO (1.49/yr in placebo vs. 1.19/yr in low-dose and 1.27/yr in high-dose mepolizumab, rate ratio of high-dose to placebo 0.86, 95% confidence interval [CI] 0.7–1.05, P = 0.14). There was no difference in the primary outcome in the overall intention-to-treat analysis in the METREX study (1.49/yr in mepolizumab vs. 1.52/yr in placebo, P > 0.99). Only when analyzing the high eosinophilic phenotype in the stratified intention-to-treat METREX group was there a significant difference in the primary outcome (1.41/yr in mepolizumab vs. 1.71/yr in placebo, P = 0.04, rate ratio 0.82, 95% CI 0.68–0.98).

There were no significant differences in any secondary endpoint in the METREO study. In the METREX study, mepolizumab treatment resulted in a significantly longer time to first exacerbation (192 days vs. 141 days, hazard ratio 0.75, 95% CI 0.60–0.94, P = 0.04) but no difference in the change in SGRQ (–2.8 vs. –3.0, P > 0.99) or CAT score (–0.8 vs. 0, P > 0.99). There was no significant difference in any measures of pulmonary function between the treatment and placebo groups (FEV1, FVC, FEV1/FVC). As expected, there was a significant decrease in peripheral blood eosinophil count in both studies in the medication arm. The incidence of adverse events and safety endpoints were similar between the trial groups in METREX and METREO.

Conclusions. In this pair of placebo-controlled double-blind randomized parallel studies, there was a significant decline in annual exacerbation rate in patients with an eosinophilic phenotype treated with mepolizumab in a stratified intention-to-treat analysis of one of two parallel studies (METREX). However, there was no significant difference in the primary outcome of the other parallel study (METREO), which included only those patients with an eosinophilic phenotype. Additionally, there was no significant difference in any secondary endpoints in either study. The medication was generally safe and well tolerated.

Commentary

Mepolizumab is a humanized monoclonal antibody that targets and blocks interleukin-5, a key mediator of eosinophilic activity. Due to its ability to decrease eosinophil number and function, it is currently approved as a therapy for severe asthma with an eosinophilic phenotype [1]. While asthma and COPD have historically been thought of as separate entities with distinct pathophysiologic mechanisms, recent evidence has suggested that a subset of COPD patients experience significant eosinophilic inflammation. This group may behave more like asthmatic patients, and may have a different response to medications such as inhaled corticosteroids, but the role of eosinophils to guide prognostication and treatment in this group is still unclear [2,3].

In this study, Pavord and colleagues investigated the use of the anti-IL5 drug mepolizumab in COPD patients at risk of exacerbations who demonstrated an eosinophilic phenotype. The physiologic rationale for the study was that eosinophilic inflammation is thought to be a driver of exacerbations in COPD patients with an eosinophilic phenotype, and therefore a decrease in eosinophilic number and function should result in a decrease in exacerbations. The authors conducted a well-designed placebo-controlled double-blind study with a clearly defined endpoint, met their enrollment goals as determined by their power calculations, and used COPD patients at high risk of exacerbations to enrich their study.

There was no difference in the primary outcome in the METREO arm of the study, which included patients with baseline eosinophilia (> 150 cells/uL) or in the overall intention-to-treat analysis in METREX (which did not screen patients on baseline eosinophil count). Only when stratified on baseline eosinophil count in the METREX study was a significant treatment effect found, where patients with high eosinophil count at baseline (> 150 cells/uL) had a decreased risk of exacerbations when treated with mepolizumab. Notably there was no difference in any secondary outcome in METREO or in METREX aside from a longer time to first exacerbation in METREX in the mepolizumab group. The authors use this data to conclude that mepolizumab treatment results in a lower rate of exacerbations and a longer time to the first exacerbation in COPD patients with an eosinophilic phenotype, and the extent of the treatment effect is related to blood eosinophil counts.

The authors conducted a well-designed and rigorous study, and used robust and appropriate statistical analysis; however, significant questions remain regarding their conclusions. The primary concern is the role of mepolizumab in the treatment of COPD patients to decrease exacerbations may be overstated. When including only those with baseline eosinophilia in the METREO arm, there was no significant difference between placebo and low or high dose of mepolizumab; however, there was an appropriate and expected decrease in blood eosinophils, indicating the medication worked as intended. In the overall intention-to-treat analysis in the METREX arm, there was no difference between mepolizumab and placebo, and only in the analysis of METREX stratified to eosinophil count was there a significant difference (with an upper confidence interval rate ratio [0.98] approaching unity).

Additionally there was no significant difference between the 2 groups across a number of clinically important secondary endpoints, including pulmonary function measurements and symptomatic scores. Only the time to exacerbation was significantly longer in the mepolizumab group in METREX.

Taken together, this calls into question the conclusion that a decrease in eosinophil counts due to mepolizumab has resulted in a lower rate of exacerbations, particularly as a higher dose of mepolizumab did not result in a stronger effect. The lack of difference between groups in secondary endpoints is also concerning, as those would be expected to improve with a decrease in exacerbations [4,5]. As the authors point out, their evidence suggests that eosinophils may be an important biomarker in COPD and may aid in the therapeutic decision-making process. However, given the inconsistencies in the data as noted above, it would be difficult to rely on the evidence from this study alone to support their conclusion regarding the clinical utility of mepolizumab in COPD.

The authors discuss a number of limitations that may account for the lack of consistent effect seen in this study. Aside from the standard limitations applicable to any clinical trial, they note the potential confounding effect of previous oral glucocorticoid therapy in reducing eosinophil counts. This may have masked the eosinophilic phenotype in some study patients, leading to the attenuated effect of mepolizumab seen in this study.

The authors also note that information that might be potentially valuable for identifying treatment responders, such as a history of allergies and atopy, were not available. Inclusion of those patients may be helpful in enriching the trial with potential treatment-responders, and future studies may benefit from focusing on COPD patients with a more atopic phenotype who more closely resemble those with the asthma-COPD overlap syndrome.

A final limitation to discuss is the focus on blood eosinophilic counts. Due to the difficulty of measuring sputum eosinophils, and the reasonable degree of correlation between blood and sputum in asthmatic patients, blood eosinophils have largely supplanted sputum eosinophils as markers of TH2 CD4 T-cell activity in the pulmonary system [6]. This substitution is also used in the COPD population, however, due to the differences in pathophysiology it is unclear if eosinophils in asthmatic patients behave similarly to those in COPD patients [7]. Additionally, the cutoff of 150 cells/uL has been obtained primarily from sub-group analysis of previous studies on COPD patients, but it is unclear if this cutoff truly reflects elevated sputum eosinophilia. While there is likely some degree of correlation between blood and sputum eosinophilia in COPD patients, a lack of significant effect seen in this study may be due to an incorrect cutoff for elevated eosinophilia and a reliance on blood eosinophils over sputum counts. Further studies utilizing sputum eosinophils may be of value in addressing this limitation.

Applications for Clinical Practice

In this study, Pavord and colleagues found a potential benefit of mepolizumab treatment for reducing exacerbations in COPD patients with an eosinophilic phenotype. The conflicting results regarding the underlying physiology and the weak treatment effect suggest this medication may not be ready for use in clinical practice without additional supporting evidence. From a practical standpoint, the high cost of medication (~$2500 per month) and marginal benefit of treatment imply that treatment with mepolizumab in COPD patients may not be cost-effective, and even treatment in individual patients on a trial basis should be discouraged until additional supporting data becomes available. Of primary concern are the optimal selection of COPD patients that will achieve benefit with mepolizumab treatment, and the optimal dose of medication to achieve that benefit. The results presented here do not satisfactorily answer these questions, and additional studies are required.

—Arun Jose, MD, The George Washington University, Washington, DC

References

1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.

2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.

3. Roche N, Chapman KR, Vogelmeier CF, et al. Blood eosinophils and response to maintenance chronic obstructive pulmonary disease treatment. Data from the FLAME trial. Am J Respir Crit Care Med 2017;195:1189–97.

4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.

5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.

6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.

7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.

Article PDF

JCOM02501007.PDF

Issue

Journal of Clinical Outcomes Management - 25(1)

Publications

Journal of Clinical Outcomes Management

Topics

Pulmonology

Read more about Mepolizumab for Eosinophilic Chronic Obstructive Pulmonary Disease

Sections

Outcomes Research in Review

Author(s)

Author(s)

Article PDF

Article PDF

Study Overview

Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.

Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).

Commentary

Applications for Clinical Practice

—Arun Jose, MD, The George Washington University, Washington, DC

Study Overview

Objective. To determine the effect of mepolizumab on the annual rate of chronic obstructive pulmonary disease (COPD) exacerbations in high-risk patients.

Design. Two randomized double-blind placebo-controlled parallel trials (METREO and METREX).

Commentary

Applications for Clinical Practice

—Arun Jose, MD, The George Washington University, Washington, DC

References

1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.

2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.

4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.

5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.

6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.

7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.

References

1. Pelaia C, Vatrella A, Busceti MT, et al. Severe eosinophilic asthma: from the pathogenic role of interleukin-5 to the therapeutic action of mepolizumab. Drug Des Devel Ther 2017;11:3137–44.

2. Kim VL, Coombs NA, Staples KJ, et al. Impact and associations of eosinophilic inflammation in COPD: analysis of the AERIS cohort. Eur Respir J 2017;50:pii:1700853.

4. Halpin DMG, Decramer M, Celli BR, et al. Effect of a single exacerbation on decline in lung function in COPD. Respir Med 2017;128:85–91.

5. Rassouli F, Baty F, Stolz D, et al. Longitudinal change of COPD assessment test (CAT in a telehealthcare cohort is associated with exacerbation risk. Int J COPD 2017;12:3103–9.

6. Gauthier M, Ray A, Wenzel SE. Evolving concepts of asthma. Am J Respir Crit Care Med 2015;192:660–8.

7. Negewo NA, McDonald VM, Baines KJ, et al. Peripheral blood eosinophils: a surrogate marker for airway eosinophilia in stable COPD. Int J COPD 2016;11:1495–504.

Issue

Journal of Clinical Outcomes Management - 25(1)

Issue

Journal of Clinical Outcomes Management - 25(1)

Publications

Journal of Clinical Outcomes Management

Publications

Journal of Clinical Outcomes Management

Topics

Pulmonology

Article Type

Article

Sections

Outcomes Research in Review

Disallow All Ads

Content Gating

No Gating (article Unlocked/Free)

Alternative CME

Disqus Comments

Default

Consolidated Pubs: Do Not Show Source Publication Logo

Use ProPublica

Article PDF Media

JCOM02501007.PDF

Teambase ID

18000D00.SIG

Teens with PID underscreened for HIV, syphilis

Article Type

News

Changed

Fri, 01/18/2019 - 17:18

Author(s)

Kari Oakes

CHICAGO – Adolescents with pelvic inflammatory disease (PID) were unlikely to be screened for HIV or syphilis, and many didn’t receive an appropriate antibiotic regimen, according to a recent study reported at the annual meeting of the American Academy of Pediatrics.

Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.

Although PID is known to be associated with an increased risk for HIV and syphilis, fewer than one in three PID patients aged 12-21 years received the tests. Over 80% of patients were tested for gonorrhea and chlamydia, and a similar number received a pregnancy test, according to cross-sectional data drawn from a national database over a 5-year period.

The Centers for Disease Control and Prevention strongly recommends that all women diagnosed with PID be tested for HIV, and that high-risk individuals also be tested for syphilis, wrote Amanda Jichlinski, MD, and her coauthors at Children’s National Health System, Washington.

The study, presented during a poster session, used data from the national Pediatric Health Information System database from 2010 to 2015. A total of 10,698 records with a diagnostic code for PID were included; patients were females aged 12-21 years seen in a pediatric emergency department.

In addition to the primary outcome of syphilis and HIV testing, the authors also looked at whether antibiotic administration for PID was in line with CDC recommendations – and it wasn’t. “Fewer than half of patients in the ED received antibiotic regimens adherent to CDC guidelines,” wrote Dr. Jichlinski and her coauthors.

Forty-six percent of patients received ceftriaxone and doxycycline, 21% received ceftriaxone and azithromycin, and 6% received ceftriaxone and metronidazole. Ceftriaxone monotherapy was given to 15% of patients. One in 10 patients with a PID diagnosis received no antibiotic at all; 2% of patients received some other regimen.

The researchers used multivariable analysis to examine separately which patient and hospital characteristics were associated with an increased likelihood of testing for both HIV and syphilis. With white, non-Hispanic adolescents used as the referent, Hispanic females with PID were less likely to receive screening for either HIV or syphilis (adjusted odds ratio, 0.8 for both; 95% confidence interval, 0.7-1.0 for both).

In contrast, black non-Hispanic females were screened more often; the aOR for HIV screening was 1.4 (95% CI, 1.2-1.6), and the aOR for syphilis screening was 1.8 (95% CI, 1.6-2.0) for this group of adolescents.

Patients were dichotomized into older (17-21 years of age; n = 4,737, 44%) and younger (12-16 years of age; n = 5,961, 56%) age groups; younger patients were slightly more likely to receive HIV (aOR, 1.2) and syphilis (aOR, 1.1) screening.

Just under a third of patients in the study were seen in a hospital with fewer than 300 beds, and these facilities were more likely to screen for HIV (aOR, 1.4) and syphilis (aOR, 1.1) than the larger hospitals.

This image is a 3D illustration of the HIV virus.

Over two-thirds of patients had public insurance, and these females also were more likely to be screened for HIV and syphilis than patients with private insurance (aOR, 1.3 and 1.4, respectively). Being uninsured further upped the odds for screening to an aOR of 1.5 for HIV and 1.6 for syphilis, compared with privately insured patients.

By far the largest predictor of whether HIV and syphilis screening was done, though, was a hospital admission. Patients who were admitted (n = 4,043, 38%) were 7 times more likely to be screened for HIV and 4.6 times more likely to be screened for syphilis than those who were sent home from the emergency department.

Although the large, nationally representative study had many strengths, Dr. Jichlinski and her coauthors acknowledged that the data they were provided couldn’t account for medication that was prescribed, rather than administered in the emergency department. Also, the results may not be generalizable to adolescents treated in nonpediatric emergency departments or other facilities, such as urgent care centers.

“Adolescents with PID are underscreened for HIV and syphilis,” wrote Dr. Jichlinski and her coauthors. They called for pediatricians to receive more education about management of PID in adolescents. From a practical perspective, the investigators also suggested incorporating order sets for sexually transmitted infection testing and antibiotic administration into electronic medical records; in this way, a PID diagnosis code would trigger simplified testing and treatment choices.

Dr. Jichlinski reported no conflicts of interest. Dr. Monika Goyal, MD, senior author on the study, reported funding support by the National Institute of Child Health and Human Development. Dr. Goyal also holds an appointment at the George Washington University, Washington.

koakes@frontlinemedcom.com

SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.

Meeting/Event

American Academy of Pediatrics (AAP): 2017 National Conference and Exhibition

Publications

Pediatric News

Family Practice News

Infectious Disease Practitioner

Ob.Gyn. News

MDedge ObGyn

MDedge Infectious Disease

MDedge Pediatrics

MDedge Family Medicine

Topics

Read more about Teens with PID underscreened for HIV, syphilis

Sections

Author(s)

Author(s)

Meeting/Event

American Academy of Pediatrics (AAP): 2017 National Conference and Exhibition

Meeting/Event

American Academy of Pediatrics (AAP): 2017 National Conference and Exhibition

Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.

koakes@frontlinemedcom.com

SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.

Patients who were sent home rather than admitted were especially likely to miss screening, as were Hispanic patients and those with private insurance.

koakes@frontlinemedcom.com

SOURCE: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine.

Publications

Pediatric News

Family Practice News

Infectious Disease Practitioner

Ob.Gyn. News

MDedge ObGyn

MDedge Infectious Disease

MDedge Pediatrics

MDedge Family Medicine

Publications

Pediatric News

Family Practice News

Infectious Disease Practitioner

Ob.Gyn. News

MDedge ObGyn

MDedge Infectious Disease

MDedge Pediatrics

MDedge Family Medicine

Topics

Article Type

Sections

Article Source

REPORTING FROM AAP 2017

Disallow All Ads

Content Gating

No Gating (article Unlocked/Free)

Alternative CME

Vitals

Key clinical point: Fewer than one in three adolescent patients with pelvic inflammatory disease in a national dataset received appropriate STD screening.

Major finding: Hispanic females were least likely to be screened (adjusted OR, 0.8), compared with non-Hispanic white females.

Study details: Retrospective study of 10,698 adolescent patients with PID from a national database.

Disclosures: The study was funded in part by the National Institute of Child Health and Development. The authors had no relevant financial disclosures.

Source: Jichlinski A et al. AAP 2017 Abstract 5, AAP Section on Emergency Medicine

Disqus Comments

Default

Teaser Media

Simple Patient Care Instructions Translate Best: Safety Guidelines for Physician Use of Google Translate

Article Type

News

Changed

Wed, 04/29/2020 - 11:35

Author(s)

Joseph M. Miller, MD, MPH

Erin M. Harvey, PhD

Steven Bedrick, PhD

Prashanthinie Mohan, MBA

Elizabeth Calhoun, MEd, PhD

on behalf of the Clinical Machine Translation Study Group of Banner University Medicine and the University of Arizona College of Medicine – Tucson

From the University of Arizona College of Medicine – Tucson, Tucson, AZ.

Abstract

Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.

Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.

Acore measure of the meaningful use of electronic health records incentive program is the generation and provision of the after visit summary (AVS), a mechanism for physicians to provide patients with a written summary of the patient encounter [1,2]. Although not a required element for meaningful use, free text patient care instructions (PCIs) provide the physician an opportunity to improve patient engagement either at the time of service or through the patient portal [3] by providing a short written summary of the key points of the office visit based upon the visit’s clinical discussion. For patients who do not speak English, a verbal translation service is required [4], but seldom are specific patient instructions provided in writing in the patient’s preferred language. A mechanism to improve communication might be through translation of the PCI into the patient’s preferred language. Spanish is the most common language, other than English, spoken at home in the United States [5,6]. For this reason, we chose to investigate if it is feasible to use machine translation (Google Translate) to safely and reliably translate a variety of PCIs from English to Spanish, and to assess the types of translation errors and ambiguities that might result in unsafe communication. We further investigate if machine back translation might allow the author of patient care instructions to evaluate the quality of the Spanish machine translation.

There is evidence to suggest that patient communication and satisfaction will improve if portions of the AVS are communicated in Spanish to primarily Spanish-speaking patients. Pavlik et al conducted a randomized controlled trial on the association of patient recall, satisfaction, and adherence to the information communicated in an AVS, in a largely Hispanic (61%) primary care clinic setting [7]. The AVS was provided in English. They noted that Spanish speakers wished to receive information in Spanish, although most had access to translation by a family member. They also noted that a lack of ability to provide an AVS in Spanish was a concern among providers. There was no difference in recall or satisfaction between English and Spanish speakers with respect to medications and allergies, suggesting that not all portions of the AVS might need to be translated.

Machine translation refers to the automated process of translating one language to another. The most recent methods of machine translation, as exemplified by Google Translate (Google Inc., Mountain View, CA), do not use rules of grammar and dictionaries to perform translations but instead use artificial neural networks to learn from “millions of examples” of translation [8]. However, unsupervised machine translation can result in serious errors [9]. Patil gives as an example of a serious error of translation from English (“Your child is fitting”) to Swahili (“Your child is dead”). In British parlance, “fitting” is a term for “having a seizure” and represents an example of a term that is context sensitive. However, others note that there is reason to be optimistic about the state of machine translation for biomedical text [10].

One method of assessing translation quality is through back translation, where one translator takes the author’s work into the desired target language, and then a different translator takes the target language back to the language of the author. Like the children’s game Chinese Whispers (Telephone in the United States) [11], where a “secret message” is whispered from one child to the next and spoken aloud at the end of the line of children, back translation can test to see if a message “gets through.” In this analogy, when information is machine translated from English to Spanish, and then machine translated from Spanish to English (Figure), we can compare the initial message to the final translation to see if the message “gets through.” We further investigate if machine back translation might allow a non-Spanish speaking author of PCIs to evaluate the quality of the Spanish translation.

Our intention was to determine if machine back translation [12] could be used by an English-only author to assess the quality of an intermediate Spanish translation. If poorly worded Spanish translated back into poorly worded English, the author might choose to either refine their original message until an acceptable machine back translation was achieved or to not release the Spanish translation to the patient. We were also concerned that there might be instances where the intermediate Spanish was unacceptable, but when translated back into English by machine translation, relatively acceptable English might result. If this were the case, then back translation would fail to detect a relatively poor intermediate Spanish translation.

Methods

Patient Care Instructions

Original English PCIs

Example original English PCIs were solicited from the clinical faculty and resident staff of the University of Arizona College of Medicine by an email-based survey tool (Qualtrics, Inc, Provo UT). The solicitation stated the following:

We are conducting a study to assess how well Google Translate might perform in translating patient instructions from English to Spanish. Would you please take the time to type three sentences that might comprise a typical “nugget” of patient instruction using language that you would typically include in an After Visit Summary for a patient? An example might be: “Take two Tylenol 325 mg tablets every four hours while awake for the next two days. If you have a sudden increase in pain or fever, or begin vomiting, call our office. Drink plenty of fluids.”

A total of 100 PCIs were collected. The breadth of the clinical practice and writing styles of a College of Medicine faculty are represented: not all were completely clear or were well-formed sentences, but did represent examples provided by busy clinicians of typical language that they would provide in an AVS PCI.

Machine Translation into Spanish

The 100 original English (OE) PCIs were submitted to the Google Translate web interface (https://translate.google.com/) by cutting and pasting and selecting “Spanish,” resulting in machine Spanish. The translations were performed in January 2016. No specific version number is provided by Google on their web page, and the service is described to be constantly evolving (https://translate.google.com/about/intl/en_ALL/contribute.html).

Machine Back Translation into English (MBTE)

Google Translate was then used to translate the machine Spanish back into into English. MBTE represents the content that a monolingual English speaker might use to evaluate the machine Spanish.

Ratings of Translation Quality and Safety

Two panels of 6 raters evaluated machine Spanish and MBTE quality and safety. A bilingual English/Spanish speaking panel simultaneously evaluated the machine Spanish and MBTE compared to OE, with the goal of inferring where in the process an undesirable back translation error occurred. Bilingual raters were experienced bilingual clinicians or certified translators. A monolingual English speaking panel also evaluated the MBTE (compared to OE). They could only infer the quality and safety of the machine Spanish indirectly through inspection of MBTE, and their assessment was free of the potential bias of knowledge of the intermediate Spanish translation.

The raters used Likert scales to rate grammar similarity and content similarity (scale from 1 to 5: 1 = very dissimilar, 5 = identical). For each PCI, grammar and content scores for each rater were summed and then divided by 10 to yield a within-rater quality score ranging from 0 to 1. A panel-level (bilingual or monolingual) quality score was calculated by averaging the quality scores across raters.

Safety of translation was rated as 0 or Safe (“While the translation may be awkward, it is not dangerous” or 1 or Unsafe (“A dangerous translation error is present that might cause harm to the patient if instructions were followed”). If any panel member considered an item to be unsafe, the item as a whole was scored as unsafe.

Data Analysis

Descriptive Summary of PCI Contributions

The 100 PCIs were summarized in terms of volume (word count), complexity (Flesch-Kincaid Grade Level index [13]), and content (medication names, references, formatting) (Table 1). Word count and grade level were calculated using Microsoft Word (Microsoft Corp, Redmond WA).

Safety Analysis

Concordance analysis. A safety translation concern as defined in this study (“might cause harm”) is very subjective. To reduce some of the variation in assessment of safety, we identified 4 members of the bilingual panel whose safety assessments of MBTE were most similar to the most concordant 4 monolingual raters’ assessment of MBTE safety. The goal was to select the bilingual panel of 4 that was most “typical” of the behavior of a “typical” monolingual individual with respect to assessing the safety of an individual MBTE translation. We then used this bilingual panel to identify 2 sets of “unsafe” machine Spanish and MBTE PCI translations: PCIs where ANY of the 4 bilingual raters identified a safety concern in machine Spanish or MBTE, and PCIs where MOST (at least 3) of the 4 bilingual raters agree that PCI translation was “unsafe”.

An expansion of Cohen’s kappa was used to identify the most concordant pairing of 4 bilingual panel members and 4 monolingual panel members [14]. All pairwise comparisons of monolingual and bilingual panel members were coded as follows: +1 was scored when 2 raters were concordant (both scored safe or unsafe) and –1 was scored for discordant pairs. For the 225 possible pairings of 4 panel members (15 combinations of 4 of 6 bilingual, 15 combinations of 4 of 6 monolingual raters), the 100 PCI items scores ranged from +16 (absolute agreement of the 2 panels of 4) to –16 (absolute discordance). For each pairing, we summed the scores for the 100 PCIs to determine the most concordant 4 monolingual and 4 bilingual raters (highest summed scores), which were then used for all subsequent analyses of safety and quality.

Original English characteristics of unsafe translation.
A logistic regression was performed with safety as the dependent variable (safe/unsafe defined by bilingual raters) with explanatory variables of word count, grade level, and reference to medication in OE.

Quality Assessment

Bilingual and monolingual raters assessments of translation quality. We assessed the correlation between the bilingual quality ratings of machine Spanish vs. MBTE and conducted paired t tests comparing mean bilingual machine Spanish and MBTE ratings. High correlation and absence of a significant difference in means would support the notion that MBTE could be used to reliably assess machine Spanish quality.

We also assessed the correlation between bilingual quality assessments of MS vs. monolingual raters’ assessments of MBTE, and conducted paired comparison t tests comparing bilingual machine Spanish and monolingual MBTE quality ratings. These analyses assess the ability of an English-only reader of MBTE to predict the quality of machine Spanish, as determined by a bilingual rater. High correlation and absence of a significant difference in means would support the notion that MBTE could be used by an English-only speaker to reliably assess machine Spanish quality.

Associations between original English content and translation quality. Objective measures of original English were correlated via stepwise linear regression with bilingual assessment of machine Spanish quality.

Results

PCI Contributions

Example PCIs were contributed by 88 individuals and are summarized in Table 1. The 100 original English PCIs and the machine Spanish and MBTE translations obtained via Google Translate are available from the authors upon request.

Safety

Concordance Analysis

The 6 monolingual and bilingual raters agreed on the safety of 73 MBTE PCIs. The most concordant pairings of 4 agreed on 81 items. The least and most concordant pairings had concordance values of 0.68 and 0.84, respectively. Subsequent analyses include data from only the 4 most concordant monolingual and bilingual raters.

Bilingual and Monolingual Safety Ratings

Both bilingual and monolingual raters assessed MBTE. On average, bilingual ratings of MBTE of safety were higher (0.987) than monolingual ratings (0.925) (t = –3.897, P = 0.0002).

Identification of Unsafe Translations in Machine Spanish and MBTE

The bilingual panel identified 11 translations (either machine Spanish or MBTE) as unsafe: MS translation was unsafe for 9 items, MBTE unsafe for 5 items, with some items identified as unsafe in terms of both machine Spanish and MBTE. The original English, machine Spanish, and MBTE for these PCIs are listed in Table 2. One item (#93) revealed a machine Spanish drug dosing ambiguity that was not present in the MBTE, with safety concern expressed by 3 of 4 bilingual raters.

Original English characteristics of Unsafe Translation

A stepwise logistic regression was performed to evaluate whether characteristics of the original English text predicted the PCI being judged as having a safe or unsafe machine Spanish translation. The explanatory variables (listed in Table 1) evaluated were word count, reading grade level, inclusion of reference to a specific medication, inclusion of numbers (as in "take 2 tablets"), and inclusion of numbered statements (as in "1. Call if your cough worsens"). The stepwise selection procedure dropped number references and numbered sentences, although post hoc analysis showed that number references and medication references occurred so commonly together that they were essentially interchangeable. The final regression model included word count, reading grade level, and medication reference. The significant factors of reading grade level and medication reference had odds ratio (95% confidence interval) of 1.12 (1.01 to 1.41) and 4.91 (1.07 to 22.7) respectively (P = 0.042 each). As reading grade level includes word count per sentence and syllable count per word as linear predictors, the inclusion of word count in the model is likely to increase the discrimination of complex words of many syllables in predicting the occurrence of unsafe machine Spanish.

Quality

Bilingual and Monolingual Raters Assessments of Quality

The bilingual evaluators found similar mean quality for machine Spanish (mean 0.855, SD 0.0859) and MBTE (0.857, SD 0.0755) (P = 0.811). However, the correlation of R²=0.355 (P = 0.000) suggests that despite similarity in mean ratings, a good forward translation from original English to machine Spanish did not assure a good back translation from machine Spanish to MBTE. No difference in mean MBTE quality was identified between bilingual (0.857, SD 0.0754) and monolingual (0.852, SD 0.126) raters (P = 0.598), with correlation R²=0.565 (P = 0.000).

Discussion

In this article, we have collected a corpus of example PCIs across a large number of authors, and investigated how well Google Translate was able to translate the example instructions first to Spanish, and then back again to English. We learned that one can not always spot a problem in the intermediate Spanish by inspection of the back-translated English. We also learned that simple sentences were least likely to be associated with troublesome translations, and that specific instructions about medication usage should probably be approached with great care.

We learned that some authors readily use simple language (eg: “Have your blood work drawn in the lab in the next two weeks,” reading level 1.2) while others gravitate to very complex language (“If you develop headache, chest pain, abdominal pain or back pain, or if you have any spontaneous bleeding please go to the emergency department, advise them that you were recently treated for rattlesnake envenomation and have them call the poison center,” reading level 20.2).

The development in confidence in machine translation can be compared to development of self-driving cars. At early stages of development, the self-driving cars had drivers with a foot near the brake and hands near the steering wheel, ready to take over at any instant. Now, after much data has been collected, there is evidence that the machine may operate more predictably and safely than some human drivers [15,16]. Should the self-driving cars always have an operator behind the wheel, supervising the function of the software, and ready to take over at any instant, or is the purpose of the self-driving car to allow non-drivers to be transported in an automobile that they either cannot operate or choose not to operate at that time?

The benefit of using professional interpreters in communicating clinically significant data is unquestioned, especially when compared to ad-hoc interpreters who lack professional understanding of context [4]. Like a good human driver (as compared to a self-driving car that is operated by a program that is still learning), a qualified human translator will outperform machine translation in complex tasks. Similarly, for relatively simple translations that are meant to be generated by human speakers to be understood by individuals with a grammar school education and vocabulary, is the state of machine translation such that less human translation is now required?

Our use of 2 teams of evaluators allowed us to use the game of Telephone analogy to provide insight into how well the machine translation proceeded, first to Spanish, then back to English. Mostly (90 times in 100), an acceptable Spanish translation resulted in an acceptable English back translation. In 2 instances (Samples 7 and 32), the first translation into Spanish was unacceptable, and a subsequent translation back to English was also unacceptable, as might be expected. In 2 instances (Samples 60 and 92), the Spanish translation was acceptable, but the translation back to English was unacceptable. The rules of Telephone worked 94 times in 100.

Still, 6 times in 100, the unexpected occurred, where a relatively poor Spanish translation returned a relatively acceptable English back translation. The rules of Telephone were not followed. The Spanish in the middle was garbled, but became acceptable when translated back to English. A fluent Spanish speaker found the intermediate Spanish to be of concern, and the back translation did not identify the concern. This argues against widespread adoption of machine back translation for quality assessment, at least until better understanding of the limitations of machine back translation are better understood. Looking at examples where back translation “worked” is useful. In the 6 instances where the intermediate Spanish was judged to be unacceptable, but the English back translation acceptable, complex sentence structures were found, along with medication instructions.

Not tested was if the raters found the original English instructions to be unclear or unsafe as a starting point. Here is where we find the potential benefit of the present study, as it provides insight into the type of content that seems to translate well in this set of data. where the machine Spanish error was not present in MBTE. Overall, ratings of translation quality by bilingual and monolingual raters was high, suggesting that there may be some utility in the machine translation with safeguards other than, or in addition to, inspection of machine back translation of machine Spanish. We found there was an astonishing range in reading difficulty across the contributed samples. While the average estimated grade level for comprehension of the original English contributions was the 8th grade, the maximum was 22, indicating extreme complexity of both words used and sentence length.

In gathering the example PCIs, we did not give any additional instructions to the authors to limit complexity, we only asked for their “typical” language, and if the examples received are indeed typical, the instructions we provide are often quite complex. Wu [17] explored the readability of medical information intended for the public and found that on average, 18 years of education would be required to read and understand the clinical trial descriptions available at ClinicalTrials.gov. It seems apparent that the first step to improving the safety of machine translation is to simplify the task of the translator, by making the language that is used for translation as unambiguous and straightforward as possible. The article by Patil and Davies on the use of Google Translate in the clinic [9] generated a considerable number of rapid responses (similar to letters to the editor) [18]. The responses emphasized the need to keep the language used simple, the sentences short, and the communication direct.

A simple and straightforward suggestion to improve all patient care instructions (not just those anticipated to be translated) would be to display the Flesch-Kincaid reading level in real time as the content is generated. The computer resources required to perform reading level analysis are nearly identical to those required for real-time spell checking: a dictionary that breaks words into syllables. Showing authors the reading level in real time would provide a tool to improve all instructions, not just those intended for translation. Limiting the dictionary to specifically exclude potentially dangerous, complex, or confusing words as well as forbidden abbreviations would further identify troublesome language to the author, and would improve communication overall. Implementing such real-time feedback to authors of patient instructions is a logical next step in adding utility to the electronic health record.

It is important that culture and contextual understanding is taken into consideration while organizations use interpretation services. In the United States, federal law requires that language interpreters employed by health care organization receiving federal funds are not only bilingual but also bicultural [16]. We did not find examples of dangerous synonyms being misapplied in translation, but we cannot rule out the possibility that such errors can occur. This is beyond the scope of typical machine translation software.

Our data suggest that use of medication names and dosing frequencies should not be repeated in the PCI where confusion can arise from imprecise language translation. Translation ambiguities that generate safety concerns in PCI might be mitigated by moving such content into structured areas of the AVS.

Conclusion

This study suggests that 9 times out of 10, the quality of machine translation using Google Translate is acceptable in terms of quality and safety. Currently, machine back translation may fail to reveal a relatively poor translation from English to Spanish. This study showed that increasing sentence complexity, as measured by the reading level index, was associated with a significant (P < 0.05) increase in unsafe machine translation. Similarly, including medication instructions in machine translations were associated with increased risk (P < 0.05) of machine translation safety error in this study.

A simple way to improve communication now would be to display the reading level to authors of patient communication content in real time, and limit the dictionary of acceptable words to forbid the use of known ambiguous terms or forbidden abbreviations. This would teach authors to use simple language, and increase the chance that translation (either human or machine) would be effective. This preliminary study suggests that keeping medication dosing instructions in a structured format is advisable, as is keeping sentences simple. As with spoken language [4], starting with clear, simple to understand English instructions provides the best machine translations into Spanish.

The Clinical Machine Translation Study Group: Todd W. Altenbernd, Steven Bedrick, Mark D. Berg, Nerida Berrios, Mark A. Brown, Colleen K. Cagno, Charles B. Cairns, Elizabeth Calhoun, Raymond Carmody, Tara F. Carr, Clara Choo, Melissa L. Cox, Janiel Cragun, Rachel E.M. Cramton, Paola Davis, Archita Desai, Sarah M. Desoky, Sean Elliot, Mindi J. Fain, Albert Fiorello, Hillary Franke, Kimberly Gerhart, Victor Jose Gonzalez, Aaron John Goshinska, Lynn M. Gries, Erin M. Harvey, Karen Herbst, Elizabeth Juneman, Lauren Marie Imbornoni, Anita Koshy, Lisa Laughlin, Christina M. Laukaitis, Kwan Lee, Hong Lei, Joseph M. Miller, Prashanthinie Mohan, Wayne J. Morgan, Jarrod Mosier, Leigh A. Neumayer, Valentine Nfonsam, Vivienne Ng, Terence O'Keeffe, Merri Pendergrass, Jessie M. Pettit, John Leander Po, Claudia Marie Prospero Ponce, Sydney Rice, Marie Anoushka Ricker, Arielle E. Rubin, Robert J. Segal, Aurora A.G. Selpides, Whitney A. Smith, Jordana M. Smith, William Stevenson, Amy N. Sussman, Ole J. Thienhaus, Patrick Tsai, J. Daniel Twelker, Richard Wahl, Jillian Wang, Mingwu Wang, Samuel C. Werner, Mark D. Wheeler, Jason Wild, Sun Kun Yi, Karl Andrew Yousef, Le Yu.

Corresponding author: Joseph M. Miller, MD, MPH, Department of Ophthalmology and Vision Science, University of Arizona, 655 North Alvernon Way, Suite 108, Tucson AZ 85711, jmiller@eyes.arizona.edu.

Financial disclosures: None.

References

1. Hummel J, Evans P. Providing clinical summaries to patients after each office visit: a technical guide. Qualis Health 2012. Accessed 14 Mar 2016 at http://hit.qualishealth.org/sites/default/files/hit.qualishealth.org/Providing-Clinical-Summaries-0712.pdf.

2. Neuberger M, Dontje K, Holzman G, et al. Examination of office visit patient preferences for the after-visit summary (AVS). Persp Health Infor Manage 2014;11:1d.

3. Kruse CS, Bolton K, Freriks G. The effect of patient portals on quality outcomes and its implications to meaningful use: a systematic review. J Med Internet Res 2015;17:e44.

4. Schoonover, K. Using a medical interpreter with persons of limited English proficiency. J Clin Outcomes Manage 2016;23:567–75.

5. Shin HB, Bruno R. Language use and English-speaking ability: 2000. Census 2000 Brief. Accessed 9 Nov 2017 at https://census.gov/content/dam/Census/library/publications/2013/acs/acs-22.pdf.

6. Lewis MP, Simons GF, Fennig CD, editors. Ethnologue: languages of the Americas and the Pacific. 19th ed. Dallas: Sil International; 2016.

7. Pavlik V, Brown AE, Nash S, et al. Association of patient recall, satisfaction, and adherence to content of an electronic health record (EHR)-generated after visit summary: a randomized clinical trial. J Am Board Fam Med 2014;27:209–18.

8. Johnson M, Schuster M, Le QV, et al. Google’s multilingual neural machine translation system: enabling zero-shot translation. Accessed 9 Nov 2017 at https://arxiv.org/pdf/1611.04558.pdf.

9. Patil S, Davies P. Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392.

10. Kaliyadan F, Gopinathan Pillai S. The use of Google language tools as an interpretation aid in cross-cultural doctor-patient interaction: a pilot study. Inform Prim Care 2010;18:141–3.

11. Zhang Y, Zhou S, Zhang Z, et al. Rumor evolution in social networks. Physical Review E 2013;87.

12. Shingenobu T. Evaluation and usability of back translation for intercultural communication. In: N. Aykin, editor. Usability and internationalization. Global and local user interfaces. UI-HCII 2007, Lecture Notes in Computer Science, vol 4560. Springer, Berlin, Heidelberg.

13. Kincaid JP, Fishburne Jr RP, Rogers RL, et al. Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. 1975. Accessed 7 May 2016 at http://www.dtic.mil/dtic/tr/fulltext/u2/a006655.pdf.

14. Kwiecien R, Kopp-Schneider A, Blettner M. Concordance analysis—part 16 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011;108:515–21.

15. Goodall N. Ethical decision making during automated vehicle crashes. Transportation Research Record: Journal of the Transportation Research Board 2014;2424:58–65.

16. Kalra N, Groves D. The enemy of good: estimating the cost of waiting for nearly perfect automated vehicles. Santa Monica, CA: RAND Corporation, 2017.

17. Wu DT, Hanauer DA., Mei Q, et al. Assessing the readability of ClinicalTrials.gov. J Am Med Inform Assoc 2016;23:269–75.

18. Responses to: Use of Google Translate in medical communication: evaluation of accuracy. BMJ 2014;349:g7392 Accessed 10 Dec 2017 at www.bmj.com/content/349/bmj.g7392/rapid-responses.

19. Nápoles AM, Santoyo-Olsson J, Karliner LS, et al. Inaccurate language interpretation and its clinical significance in the medical encounters of Spanish-speaking Latinos. Med Care 2015;53:940–7.

Article PDF

JCOM02501018.PDF

Issue

Journal of Clinical Outcomes Management - 25(1)

Publications

Journal of Clinical Outcomes Management

Topics

Practice Management

Business of Medicine

Read more about Simple Patient Care Instructions Translate Best: Safety Guidelines for Physician Use of Google Translate

Sections

Reports From the Field

Author(s)

Joseph M. Miller, MD, MPH

Erin M. Harvey, PhD

Steven Bedrick, PhD

Prashanthinie Mohan, MBA

Elizabeth Calhoun, MEd, PhD

on behalf of the Clinical Machine Translation Study Group of Banner University Medicine and the University of Arizona College of Medicine – Tucson

Author(s)

Joseph M. Miller, MD, MPH

Erin M. Harvey, PhD

Steven Bedrick, PhD

Prashanthinie Mohan, MBA

Elizabeth Calhoun, MEd, PhD

on behalf of the Clinical Machine Translation Study Group of Banner University Medicine and the University of Arizona College of Medicine – Tucson

Article PDF

JCOM02501018.PDF

Article PDF

JCOM02501018.PDF

From the University of Arizona College of Medicine – Tucson, Tucson, AZ.

Abstract

Objective: To determine predictors of quality and safety of machine translation (Google Translate) of patient care instructions (PCIs), and to determine if machine back translation is useful in quality assessment.
Methods: 100 sample English PCIs were contributed by 88 clinical faculty. Each example PCI was up to 3 sentences of typical patient instruction that might be included in an after visit summary. Google Translate was used to first translate the English to Spanish, then back to English. A panel of 6 English/Spanish translators assessed the Spanish translations for safety and quality. A panel of 6 English-speaking health care workers assessed the back translation. A 5-point scale was used to assess quality. Safety was assessed as safe or unsafe.
Results: Google Translate was usually (> 90%) capable of safe and comprehensible translation from English to Spanish. Instructions with incresed complexity, especially regarding medications, were prone to unsafe translation. Back translation was not reliable in detecting unsafe Spanish.
Conclusion: Google Translate is a continuously evolving resource for clinicians that offers the promise of improved physician-patient communication. Simple declarative sentences are most reliably translated with high quality and safety.

Keywords: translation; machine translation; electronic health record; after-visit summary; patient safety; physician-patient communication.