Clearing Opacity on Machine Learning Models for Health Data
Knowledge is power. Ask an owner of a valuable trade secret, and they will provide multiple reasons for taking their secret to the grave. Trade secret law is credited with fostering innovation in the market and allowing entities to include a vast array of intellectual property rights under its umbrella. However, broad claims for trade secret protection have become an unfortunate default, leading to concerns about opacity. This is particularly true in the field of machine learning (“ML”). Since ML models are based on algorithms intuitively difficult to interpret, they are colored with the reputation of being “black box” algorithms. It is likely that humans, including data scientists, may not completely understand the complicated functions of predictive ML models.
Algorithmic opacity can be particularly harmful with ML models becoming the norm in areas holding maximum public interest, such as healthcare. This harm can happen in several ways. An erroneous treatment recommendation based on an ML model can lead to potentially disastrous consequences for a patient or a group. The opacity of ML models means the reason for the mistake can be untraceable. Deleterious effects of ML models can also result from opacity around the data used to train these models. Bias and lack of diversity in data can result in biased ML models. For example, a model which subsumes racial and gender biases in its source code will lead to inaccurate predictions, blurring its potential usefulness.
How do we then clear the opacity without compromising innovation incentives? One way is to allow limited disclosure of the trade secret to a select group of people. Accountability strengthens public trust. And algorithmic accountability can come from transparency. This can occur at two levels of disclosure. The first is limited disclosure to potential users of the technology, through their participation in the design and assessment process, to help build ML models that diverse groups of people can use. For example, a data subject who understands how a model works is more likely to cooperate and share valuable knowledge that might not even come up otherwise. From an entity’s perspective, public participation enhances trust and helps them reach a larger and diverse market. Second, limited disclosure to select medical professionals, clinical researchers, and other relevant parties can help develop a clinically interpretable tool that the clinical community will accept more broadly. Both these disclosure strategies lead to an overall fairer, more accurate, and more reliable technology.
Of course, it is not easy to demarcate what components of an ML model can be disclosed and what must remain a secret. Since the law relating to trade secrecy encompasses many things as ‘secrets’, it could include ML source code, methodologies, research reports, data, and anything of value that is capable of being protected. Also, there must be recourse to address any privacy concerns of data subjects. Informed consent of all parties is equally important. There must be air-tight contracts between the data subject and those working on the ML model to allocate responsibilities. Such contracts should also have the scope for amendment of the terms as the research progresses to create a responsible ML model.
We want to limit situations in which ML models are arbitrary or opaque to evade disproportionate burden in case of malfunction. Limited disclosure allows poor, under-resourced data subjects to have some bargaining power. On the other hand, a data scientist can build a predictable, transparent, and rational model when each data subject is aware of what information is required, how it is used in the model, and what its ultimate objective is. This also empowers affected parties to push back, if necessary, and demand better ML models, thereby reducing social costs. Limited disclosure also works best as automated systems can be ‘gamed’ or subject to premature attacks if they are too open too soon. We want more objectivity and less uncertainty in the health-tech industry. Empowering affected parties to have sufficient information to understand the system results in a comprehensive ML model, with more room for accountability. After all, if you are not privy to how a system makes decisions about your health, how can you trust the decisions it makes?