What's New

ATS: Bayesian Network Modeling: The Future of Pulmonary Arterial Hypertension Risk Stratification Through the PHORA Initiative

Scott, J.V., Kraisangka, J., Kanwar, M., Druzdzel, M., Antaki, J., Vizza, D., Simon, M., Correa-Jaque, P., Benza, R.L. Bayesian Network Modeling: The Future of Pulmonary Arterial Hypertension Risk Stratification Through the PHORA Initiative. American Thoracic Society; 2020:A4245-A.

Introduction: Pulmonary arterial hypertension (PAH) is a fatal and difficult to treat disease due to patient inter-variability. Accurate patient risk stratification is necessary for guiding treatment, but current PAH risk calculators require improvement. The goal of this study was to explore use of Bayesian network modeling to predict one-year survival in PAH patients.

Methods: Patient-level data was aggregated and harmonized across five contemporary PAH clinical trials (AMBITION, PATENT-1 and PATENT-2, GRIPHON, SERAPHIN, FREEDOM-EV). All patients were assessed at baseline (for cross-over placebo patients from PATENT, baseline was start of extension). Forty-one clinical variables were initially considered, based on their p-value ranking from meta-analysis, availability across trials, and expert opinion. Training data was created by random sampling of 80% of the harmonized dataset, dropping early censored patients (N = 2483), leaving 20% of the data as a hold-out set (N = 707), which was only used for final validation. A Tree-Augmented Naïve Bayes (TAN) classifier was trained to predict one-year survival. Continuous variables were discretized through individual supervised decision trees using leave-one-out crossvalidation. Dimensionality reduction was performed by creating a set of “dummy” variables, i.e. uniformly distributed integers assigned at random to patient. Clinical variables that did not exceed median J-divergence value of their dummy variable counterparts in an initial Naïve Bayes model were dropped. Structure learning for TAN was performed on a subset of training set patients with no missing variables, followed by parameter learning with the full training set. Receiver-operating curves were generated on the hold-out set as a final validation step. Hold-out data was validated both by imputing early censored patients as “alive” (N=707) and by removing early censored patients (N = 608). Results: Twenty-one key variables made it to the final classifier (Table 1). The final TAN model (Figure 1) achieved an AUC of 0.83 for predicting death in the hold-out set if early censored patients were imputed as alive, and an AUC of 0.82 for predicting death if early censored patients were removed. Relative variable importance by J-divergence showed that NT-proBNP, six-minute walk distance, alkaline phosphatase, use of diuretics, and stroke volume were the top five predictors of risk, respectively.

Conclusion: Bayesian network modeling demonstrates compelling performance improvements upon the published performance of traditional PAH risk calculators. Further studies will optimize model performance and validate model on real-world registry data.

This abstract is funded by: NIH