Apr 10

Apr 10 Can LLMs be taught to forget?

Accountability, AI Governance, Artificial Intelligence, Content Management, Content Generation, Copyright, Software, Privacy, LLM, Knowledge Management, Editing

By Dennis D. McDonald

The Science article “AIs can ‘memorize’ data they shouldn’t. Can they be forced to forget?” describes newly available open source software designed to study how large language models can be tested for their ability to “forget” what they have learned.

Why should an LLM be able to “forget” what it has learned? Several reasons:

training input later found to be inaccurate
potential for release of private or personal information
data output too closely copying or mimicking protected intellectual property

An important question is whether it is even possible for LLMs to ever forget everything they are trained on. While tools might be able to test whether targeted data have actually been removed, I can imagine that the resource cost of such removal/forgetting would be substantial, especially if it has to be done on a regular basis and also addresses any downstream copying or ripple effects occurring since introduction of the targeted data.

Predicting impacts of such “forgetting”—and the potential costs of impacted decisions made or actions taken before the forgetting took place—could be substantial and perhaps folded into the ongoing maintenance costs of the LLM.

This raises some legal issues of liability that might keep the lawyers busy and well fed, but they could be real nuisances for users who base decisions on outdated, illegal, or inaccurate output.

One might logically ask if this question is any different from what any question-answering system—or any medium for that matter—can be held “liable” for actions taken or decisions made based on use of that medium, regardless of the medium in question.

That question is beyond the scope of this piece. We already know that there are businesses devoted to expunging inaccurate or embarrassing information from the web. I would not be surprised if similar entities emerge to address the training and output of LLMs; again, this is an issue that will most likely keep the lawyers busy for years to come.