Without a doubt, Generative Artificial Intelligence (GenAI) has been the most significant new technology to enter the mainstream of the last 20 years. While the iPhone revolutionised personal computing, Google became all powerful in search and Facebook/Meta changed the world through social networking, GenAI has the potential to impact us in everything from the news we read, moves we watch and even conversations we have with online services.
A brief primer on AI
It’s easy to ascribe intelligence to GenAI. It can, on the face of it, answer quite complex questions in a form that reads as if it’s written by a person. It does this by applying complex algorithms to a massive pool of data.
All AI models rely on two elements; a pool of data and algorithms. Here’s a simple example.
Let’s say you wanted to create a software program that could tell the difference between cats and dogs from a pool of photos. The algorithm would be told to look for a bunch of features that are descriptive of dogs and other attributes that describe cats. For people, this is a relatively simple task. Most children can tell the difference between cats and dogs not long after they learn how to speak.
But ‘teaching’ software to do this is much more complex. Cats and dogs are actually very alike. They both have fur, walk on four legs, have tails (mostly), longish snouts, and dozens of other similar features. We could say that dogs are generally larger than cats but tigers are larger than most dogs and chihuahuas are smaller than domestic cats.
Models are trained on millions of images to know the difference. And when the models get it wrong, they need to corrected.
What really happens inside these algorithms is that the model doesn’t know the difference between a cat and dog. What it does is ascribe a probability, based on the attributes it has, that the image is either a cat or a dog. When the probability of the image being of a dog is only marginally higher than a cat, it makes what most of us would call a best guess.
If it’s wrong and not corrected, it can repeat the mistake.
GenAI problem 1 – accuracy
Clearly, GenAI is far more complex than a tool that can tell the difference between two animals. But, in principle, it’s not that different.
When we give GenAI a prompt, we are telling it to look for words that are generally associated with each other. For example, if we asked ChatGPT, the most widely known and used GenAI tool today, “Is Anthony Caruana a journalist?” or asked it to write a bio for me, it would come back with information based on information that puts my name adjacent to information the system’s algorithm had tagged as being biographical.
You can even ask it to emulate particular writing styles. I had a good chuckle at asking it to write my bio in the form of a Monty Python skit.
The software is very clever and can answer questions about all sorts of topics. But it’s not perfect.
A friend of mine, who is bilingual, asked ChatGPT the same question in two different languages and received wildly different answers with significant inaccuracies.
But, like our cats and dogs example, when the model lacks confidence, because the question being asked is unexpected or there is not enough data, it can fill in the blanks itself and interpolate information. These are sometimes called “AI hallucinations”.
GenAI problem 2 – recursiveness
GenAI has spawned a massive wave of AI-generated content. Publishers, large and small from all over the world are using GenAI to create content. And companies like OpenAI, Google and others are using that content to train their models further.
In other words, GenAI is using content created by GenAI to train itself. And if the content is not being checked for accuracy then inaccurate data is being used to train the models.
At some point, this problem is likely to be resolved as the algorithms are trained to look for content that is AI-generated. But, for now, it’s possible that public-facing GenAI models might be using AI generated data.
GenAI problem 3 – data licensing
GenAI models are trained using vast amounts of data – hundreds of billions words and many terabytes of data. And that data has to come from somewhere. OpenAI says that data comes from data that is accessible publicly over the internet.
But content creators are fighting back. The New York Times is suing OpenAI for copyright infringement as are many authors.
OpenAI has, arguably, become an incredibly valuable company by leveraging information created by others. In one of the lawsuits the company faces, the plaintiffs say OpenAI’s platform is “nothing less than the rampant theft of copyrighted works”.
The creators of GenAI models will find that fewer and fewer people will tolerate their content being used to train a profit-making venture without being appropriately compensated.
And it’s not just content. If we take the example I used earlier of asking CHatGPT to write my bio in the form of a Monty Python skit, it’s also emulating someone’s distinctive style.
As a good friend of mine says, “Never be the only person in the room not getting paid”. If GenAI tools are making money using content you create, then you deserve to be appropriately compensated.
GenAI is facing an existential crisis. It must resolve these issues in order to persist as a valuable, sustainable technology asset. The data used to train the algorithms must be accurate and from reputable sources that are fairly compensated.
GenAI is a relatively new tool. But in less than two years it has changed our perception of what a computer program can do. But, like any technology, we must recognise its strengths and weaknesses. And if the creators of GenAI services don’t address these issues they risk losing public trust and sliding down Gartner’s ‘Trough of Disillusionment’.
Anthony is the founder of Australian Apple News. He is a long-time Apple user and former editor of Australian Macworld. He has contributed to many technology magazines and newspapers as well as appearing regularly on radio and occasionally on TV.