Volume of Data
The processes that may benefit the most from AI technologies are those involving a high volume of data. Whether they are highly repetitive processes, such as quality review processes for laboratories and batch records, or they generate large amounts of data, AI can help across the spectrum of drug development and healthcare delivery in many different ways.
We are starting to understand with great precision the conditions that predispose subjects to adverse effects. AI will be able to much faster and more accurately compile, interrogate, rank, weigh and produce personalized response profiles that are not only useful at the patient level, but upstream at the development level because visibility to variables that need to be accounted is now available. That ultimately means fewer adverse effects and the ability to better target therapies to subjects that will respond, and hopefully, will enable faster development. AI is an accelerator and enabler in these cases.
What are Effective Ways to Ensure Data is Protected?
Use of Appropriately Licensed Public Data
Using data made publicly available for this specific purpose guarantees that private, critical, and intellectually protected data is safe. However, there are times when public data is insufficient. In this scenario, an avenue to explore is data creation, which can include creating real data specifically for use in AI and creating fake data that captures the important patterns of the problem domain. Similarly, an organization can transform real data into “fake” data through appropriate processes including anonymization.
Use of Private Data
If all of these avenues are insufficient then private data may be required. But private data management comes with many questions. What are the appropriate policies to put in place? Where can the data be stored? Who has access to the data? When must it be deleted, including backups? Furthermore, if no legal agreements exist between the organization owning the data and the organization using the data then one should be created. In some cases, it may be feasible to have this agreement through an opt-in process.
Managing Models Learnings
Data management may also be approached from an AI point-of-view. In this case, it is important to understand the AI’s capacity to memorize data. For example, modeling a normal distribution requires means and variance which, in most contexts, protects the original data through abstraction of the individual data points. On the other hand, large language models (LLM) such as ChatGPT have an immense capacity to memorize and verbatim reproduce data. These models could easily divulge private information (even within closed corporate environments, if the LLM is trained on sensitive information such as accounting records or HR data). A preference for models that cannot memorize data can be an important piece of data management and protection.
What are the Top Three Elements that are Especially Important to Consider for AI Related Contracts?
Data and Intellectual Property (IP) Ownership
Your contract should be very clear regarding what the data can be used for and who owns the results after data processing. Can the data be used to train models? Are models made available to other parties after training? AI built specifically from a consulting agreement will leave the IP with the paying customer. However, if the AI is presented as a product in itself, then the code and models will remain the supplier’s property. In the latter, you should have the choice to opt in/out of having your data used to train AI models.
Technical Support and Update Process
As an AI technology must be validated, it also needs to be versioned. The update process should be made clear. How often will updates be made available? What is the process to to prevent model regression upon retraining with additional data? Ensuring the contract is clear such that you know the rate at which the AI will evolve will help you prepare for key testing and validation.
Data Security Measures
As most AI systems rely extensively on processing data to produce results and value, the way the data is handled should be made clear. Will the data be hosted on the cloud? Is the data delocalized for processing? The cloud ensures efficiency and scalability at a great cost. However, patient data should be anonymized before processing and purged once processing is complete. The contract should clearly establish that the data should not transit outside your region for processing.