Pitfalls of predictive models in healthcare

This article investigates concisely the dangers and pitfalls of ML/AI in medicine revealed in the recent meta-analysis of research around the COVID-19 pandemic: https://lnkd.in/gyn7MFq

Some of the issues can be resolved, e.g., methodology flaws that can be avoided. Some issues however are intrinsic: it is often impossible to get access to the medical datasets used in a study and the external validation is often lacking because it is also often very hard to do.

The article notes that “A recent review of 511 machine learning studies across multiple fields found that the ones produced in health care were particularly hard to replicate, because the underlying code and datasets were seldom disclosed. The review, conducted by MIT researchers, found that only about 23% of machine learning studies in health care used multiple datasets to establish their results, compared to 80% in the adjacent field of computer vision, and 58% in natural language processing.”

Using multiple datasets does not ensure reproducibility either if the datasets are poorly chosen as was the case in one of the studies discussed in this article.

The promise is there, but we also need to keep our eyes open.