The role of predictive modeling is to synthesize the patient-specific information (clinical, pathological, dosimetric, and biological) into a representable, generalizable, and accurate model of the patient response. This includes models of both tumor control and normal tissue toxicity, the so-called tumor control probability (TCP) and normal tissue complication probability (NTCP) models. In OSRT, an important component of any predictive model is its flexibility to account for the ever-changing information without the need for expensive and time-consuming re-training. Additionally, transparency in design and interpretability of the results are highly desirable characteristics. The latter is especially important in the context of medical decision making, where some level of trust must be established between the user (i.e., physician) and the model in order to facilitate the decision-making process in a highly sensitive setting.

Using the predictive biomarkers developed in the "Biomarker Discovery", along with other known clinical and pathological predictors of response, we are actively working to develop flexible, dynamic, and accurate predictive models of RT response for tumor control and radiation-induced toxicity. Specifically, we rely on advanced statistical and machine learning algorithms to build these models. These include various longitudinal survival models, random forests, support vector machines, deep learning, and Bayesian networks, as well as classical regression-based models such as Lasso, Ridge, and Elastic Nets.

Another topic of interest is incorporation of the vast but rarely used medical expertise available from the user (i.e., experienced physician) and/or in the medical literature to help the model-building process. Such additional "outside" knowledge could complement the otherwise data-driven approach and lead to, (i) simplifying model's complexity by reducing the search-space (i.e., model's architecture, candidate features, etc.) and (ii) increase the model's acceptability and users' trust in the model's predictions. As such, we are actively working to develop novel "prior-enhanced" versions of all the aforementioned models by including existing expert's or community-based knowledge.

Hybrid Random Forest - Bayesian Networks Model

Example 1: Predicting Lung radiation-induced toxicity using a hybrid Random Forest-Bayesian Networks

Although traditional machine learning (ML) algorithms such as random forest (RF) and support vector machine (SVM) can be very powerful in classifications, they generally suffer from two drawbacks: (i) they are not transparent or interpretable, and (ii) they need relatively large training datasets to reach generalizable models, otherwise they, due to their highly complex architecture, are prone to overfitting. However, both these drawbacks are critical in the medical fields. On the other hand, Bayesian Networks (BN) are probabilistic graphical models which are not only interpretable by nature, but, if appropriately design, they can be very powerful in discovering hidden links among the data features. On the down side, appropriately setting up their structure can be very challenging due to the extremely large feasible space which grows super-exponentially with number of features. Using the novel biomarkers of RP (see the "Biomarker Discovery" section) and other clinopathological factors, in this work, we designed a simple but efficient hybrid RF-BN algorithm for predicting RP risk in a lung cancer dataset. The algorithm takes advantage of the relatively simple setup of RFs and exploits their decision tree (DT)-like structure to guide the structural learning phase of BN and boost their predictive abilities. Specifically, an initial structure is estimated for the BN by aggregating the overlapping "sub-structures" determined by the "top contributing trees" of a pre-trained RF. The results demonstrate that the hybrid model can improve the classification accuracy of both standalone RF and BN models, and reach a rather high area under the curve (AUC) of 0.82 with a relatively small sample size (n=74). 

Future and Ongoing Projects

We are currently working to extend our findings on three main fronts: 

I. Addressing the small sample size problem often encountered in medical datasets by taking advantage of Bayesian Inference capability of BNs and exploiting local structures among the datasets.

II. Including expert- and community-based knowledge to improve the structural learning capabilities of BNs.

III. Distributed learning, also known as federated learning, in which instead of one central training dataset, the model is learned across many dis-jointed "local" datasets. 

Additionally, we plan to extend these "static" models to more dynamic ones which are capable of including the temporal aspect of disease evolution directly into account, without the need for multiple models. Two particular areas of interest include Dynamic Bayesian Networks (DBN) and Partially Observable Markov Decision Processes (POMDP).  


Interested in Collaborations?

We are continuously looking for interested collaborators and curious students to brainstorm, start new collaborations, and exchange knowledge. IF you are interested in one of the above-mentioned areas, please send us an email at This email address is being protected from spambots. You need JavaScript enabled to view it. or This email address is being protected from spambots. You need JavaScript enabled to view it..


Selected Publication

[1] A. Ajdari, N. Shusharina, Z. Liao, R. Mohan, T. Bortfeld (2019). A novel machine learning-Bayesian network model for prediction of radiation pneumonitis: Importance of mid-treatment information. International Conference on the Use of Computers in Radiation Therapy. Montreal, Canada, June 17-21.