The United Kingdom’s Information Commissioner’s Office has issued guidance on the accuracy of artificial intelligence
Some key points:
- For generative AI models, both developers and deployers must consider the impact that training data has on the outputs and how the outputs will be used.
- If inaccurate training data contributes to inaccurate outputs, and the outputs have consequences for individuals, then it is likely that the developer and the deployer are not complying with the accuracy principle
- Once the organization deploying the model has established the purpose for it, and ensured with the developer that the model is appropriate for that purpose, it can then decide whether the purpose requires accurate outputs
- The more a generative AI model is used to make decisions about people, or is relied on by its users as a source of information rather than inspiration, the more that accuracy should be a central principle in the design and testing of the model
- For example: A model used to summarize customer complaints must have accurate outputs in order to achieve its purpose. This purpose requires both statistical accuracy (the summary needs to be a good reflection of the documents it is based on) and data protection accuracy (output must contain correct information about the customer).
- If a model is not sufficiently statistically accurate because the purpose that the developer envisaged for it does not necessarily require accuracy, developers should put in place technical and organizational controls to ensure that it is not used for purposes which require accuracy. This could involve, for example, contractual requirements limiting types of usage in customer contracts with deployers or analysis of customer usage (when the model is accessed through an API).
- Developers should also assess and communicate the risk and impact of so-called “hallucinations”, ie incorrect and unexpected outputs
- Organizations who make the application available to people would need to carefully consider and ensure the model is not used by people in a way which is inappropriate for the level of accuracy that the developer knows it to have
This could include:
- Providing clear information about the statistical accuracy of the application, and easily understandable information about appropriate usage
- Monitoring user-generated content, either by analysing the user query data or by monitoring outputs publicly shared by users
- User engagement research, to validate whether the information provided is understandable and followed by users
- Labelling the outputs as generated by AI or not factually accurate. (e.g “watermarking” and “data provenance”)
- Providing information about the reliability of the output, for example through the use of confidence scores.
For more information, click here.