Introduction
Well, not exactly, fellow writers. But with all the hype over the past couple of years, you’d think the robot invasion has taken place, particularly in the realm of generative art. Everywhere you look—news articles, social media posts, and so-called experts are touting the rise of machines creating content that once only humans could craft.
Amidst this buzz, a flood of misconceptions and misinformation has swirled, painting a picture that’s not entirely accurate. In the writing community, particularly, these misconceptions have been amplified by biases. I’m not here to defend the AI industry—I believe there was wrongdoing on their part, like downloading illegally from overseas pirate sites. However, I’ve seen tons of misinformation in writers’ forums and even in writers’ meetings.
My objective in this series is to illuminate the realities of present-day AI for writers. If after reading these articles you, as a writer, still hate and detest AI, that’s your choice. But at least you’ll be hating AI itself—not some distorted perception of it.
But first, a few preliminaries. Who am I to speak to this technology? Well, I’m an author and sometimes a practitioner of neural networks, the very foundation of today’s generative AI systems. I may not be a state-of-the-art researcher, but back in the 1990s, I was programming my own neural networks in a programming language called C—long before TensorFlow or PyTorch, now common AI frameworks—were a twinkle in AI developers’ eyes. To demonstrate that I’m not merely piling on the bandwagon, I co-authored a paper in 1993 for Neural Computation titled “On the Geometry of Feed Forward Neural Network Error Surfaces” with An Chen and Robert Hecht-Nielsen, delving into the theoretical and mathematical issues of neural networks. More recently, I authored a 2020 paper for the Machine Learning and Data Mining conference, titled “Multiple Imputation with Denoising Autoencoder using Metamorphic Truth and Imputation Feedback,” with José Unpingco and Giancarlo Perrone, employing a deep neural network in an outside-the-box configuration. These credentials show that while I’m not a cutting-edge researcher, I’m certainly not just some yutz jumping on the generative AI craze.
A Little History
The term artificial intelligence has been around for at least half a century. I recall in the 1980s, Professor Marvin Minsky—a foundational figure in the field—quoting a prevailing sentiment, “If it works, it isn’t artificial intelligence.” Times have changed. But AI is a Fast forward to this century, and the focus has landed on neural networks, deep learning networks (aka deep neural networks). The concept of a neural network predates even artificial intelligence. This modern notion culminates in a perceptron, a computational unit inspired by a biological neuron, which Frank Rosenblatt developed back in the 1950s. Even today, the heart of a neural network, including state-of-the-art generative AI, remains the perceptron.
Generative AI hit its stride in 2014 with the advent of Generative Adversarial Networks. The goal was to generate outputs indistinguishable from those created by humans. The concept was simple: two neural networks pitted against each other, one attempting to generate a human-like result and the other tasked with determining whether it was human or machine-generated, each operating with adversarial goals. The most dramatic change occurred with the introduction of attention in 2017, which enabled neural networks to model long streams of data such as text more efficiently, culminating in an architecture known as a transformer—the ‘T’ in GPT.
Before ending the history lesson, I’d like to mention that among generative AI models today, they are either based on transformers (most LLMs fall into this category and some generative image models) or latent diffusion models, giving rise to frameworks like Stable Diffusion.
Talk the Talk
There are a ton of terms swirling around generative AI, and I’m sure it can be quite confusing. In my “history lesson,” I touched on AI and neural networks. Some terms, like transformers and latent diffusion models, might not be familiar to those outside of AI circles but are useful to understand. First, let’s clarify the distinction between a model and an architecture. A model is used for making predictions or, in the case of generative AI, generating outputs. The architecture, on the other hand, is the underlying structure on which the model runs. For instance, GPT is a model—or rather, a family of models—while the underlying architecture is a transformer. It’s easy for anyone outside the field to conflate the two.
Because the field of artificial intelligence strives to replicate human reasoning, many terms of art perhaps presumptuously mimic common language terms such as learning, training, reasoning, and understanding. The connotations of these terms has lead to a lot of misconceptions even fear. To better grasp what these terms really mean in the context of AI, here’s a very brief primer on how generative AI operates.
Generative AI models, which are statistical models, fundamentally operate no differently than the linear models you might have encountered in college, though on a much larger scale and complexity. The coefficients in these models are referred to as parameters. To give you an idea of the scale, a model like GPT-4 is reported to have around 175 billion parameters. The use of a model typically falls into two categories: training and prediction. During training, the parameters are adjusted using samples of data known as training data or training sets. During prediction, which occurs when you, say, prompt ChatGPT, the parameters remain unchanged and only the input data changes.
The process of weight adjustment, referred to by AI practitioners as learning, is markedly different from what a human or biological organism would call learning. Mathematically, it is governed by formulas known as learning laws. The takeaway for the layperson is that learning, in the context of these models, is actually a mathematical adjustment to the coefficients of the model, nothing more.
Models operate in these two phases (though there are others in some frameworks, all have these two). There is a common misconception that models continuously train. To improve predictions, many companies will retrain the models on increasingly larger datasets. This practice gives rise to the myth of continuous learning, but in reality, it is a controlled and finite process that occurs in distinct phases. AI systems do not learn continuously and autonomously. In fact, the training is completely under the operator’s control.
Here are three terms you might encounter when discussing the performance of LLMs. Along the way, I’ll touch on a few other common terms. We’ve already looked at “parameters” and “training sets,” which are crucial for assessing the quality of an LLM. The third of these terms is “context” or “context length.” This term refers to the length of the input stream the model can process at one time—oops, almost slipped and used “understand” instead of “process.” Context length is measured in “tokens.”
Remember, all these AI models are statistical models, and statistical processing, even with non-numeric data, requires conversion to numbers, a process known as encoding. The models cannot natively handle bare text; therefore, they encode this text into tokens, which are fragments of text roughly three-quarters of a word long.
Alright, we’ve got context, parameters, and training sets. How do these relate to the capabilities of an LLM? In an anthropomorphic nutshell, parameters can be thought of as how well the model can “think,” the size of the training set as how much it has been “taught,” and context length as how much it can “remember” at one time. An LLM with 200 billion parameters might seem smart, but it could know nothing if it’s only trained on a small amount of data. Generally, the larger any of these three aspects are, the better—but it’s best if all three increase together. Of course, this is an oversimplification, as these models might have different underlying architectures. Still, it’s a good rule of thumb for the layperson.
Before we close out the terminology section, let’s touch on two other terms that are incredibly useful to understand: fine-tuning and retrieval-augmented generation (RAG). Fine-tuning is a process where a general-purpose model is initially used as a starting point and then retrained with domain-specific text. The general idea is that while general-purpose training can be time-consuming, often costly, and onerous, fine-tuning an existing model is quicker and more cost-effective than training from scratch. Take, for instance, MedLLAMA2, which is a fine-tuned version of the LLAMA2 model, retrained on an open-source medical dataset. The model is designed specifically to answer medical questions. Think of the LLM like a high school or freshman college student who has a solid foundation in general education. Entering a major to focus their studies is akin to fine-tuning their knowledge.
To date, as far as I know, there hasn’t really been an LLM of suitable quality specifically fine-tuned to cater to the needs of the creative writer. The models fine-tuned so far have been more experimental than utilitarian. My suspicion is that a broader corpus of work than what is currently available would be needed. Because of the general pushback from the writing community towards AI, this isn’t likely to happen any time soon.
RAG is a term you might not come across unless you dive deep into the mechanics of LLM frameworks. Technically, it’s not a part of an LLM, but it’s a technique used to stretch the LLM beyond its native context length. Essentially, by some process, relevant sections of a large body of text are selected—for example, from a conversation in ChatGPT. While it would be interesting to see how effective an LLM could be in this selection process, it’s typically handled by some sort of text similarity search, which you might call a fuzzy search. These relevant text snippets are then added to your prompt to better frame the context of your request to the LLM.
AI and Creative Arts
Now that you’ve slogged through 8–10 minutes of background on AI—and probably ended up going down a rabbit hole you didn’t intend to—we’re finally getting to the good stuff: what these generative AI models can do, and more importantly, what they can’t do.
First things first, remember that these models are just statistical engines designed to predict the most probable response to a given prompt. Yes, they are incredibly advanced, but at their core, they are just predictive models. And if you’ve used them as much as I have, a dumb model at that. They don’t come up with answers by understanding or analyzing information; rather, they statistically predict likely responses based on their vast training set. Additionally, they are stochastic models, meaning randomness plays a role in their responses. Give the same prompt twice in different random states, and you will get two different answers.
So, what does that actually mean? Well, for starters, it can’t reason. It can’t understand. It doesn’t actually possess knowledge. What it can do is mimic language very well. However, any illusion to the contrary is simply an artifact of the nature of language. For example, its reputation for “hallucinating” likely stems from patterns found in language, not from actual knowledge. In fact, it doesn’t even have programming like traditional software, which follows explicit commands and logic. There is no logical rule within these models that the sun rises in the east; instead, the model has seen enough language examples that the phrase, “Which direction does the sun rise?” prompts it to output “East.” However, due to its stochastic nature, the model has a finite probability that it might respond “West,” either due to a statement in the training set that says the sun rises in the west or out of sheer randomness.
The takeaway here is that there’s no thinking, reasoning, or logic happening under the hood. What you have is an incredibly sophisticated mimicry of language, but no intelligence behind it. For all the technical sophistication, a language model is just that—a predictive model of patterns in language. Never lose sight of that.
As a language model, you might expect it to excel at writing. However, this isn’t necessarily the case. The primary goal of these models is to produce human-like output, which is a somewhat open-ended goal. Similar to AI image generation, language models are very adept at producing content that appears believably human—whether that’s text, images, or even videos now. But producing human-like output and producing your desired output are two completely different problems. A new breed of skilled artists, however, has begun to leverage AI’s capabilities by combining their own artistic abilities with the AI’s strengths to produce art.
Similarly, language models aren’t just magic boxes where you can simply provide character profiles and a rough outline, expecting them to churn out a novel or even a single chapter exactly as you envision. But, as with image generation, there will be writers who learn how to work with LLMs to fuse their own creativity with the AI’s capabilities. We’re already seeing a significant shift in the industry, with coders leveraging these models to write software more efficiently.
As for the quality of the writing, consider that the training data is often the content found across the Internet. And on average, that’s mediocre writing. Left to its own devices, an LLM will give you, on average, mediocre writing. The benefit is that it can turn a poor writer into a mediocre one. Conversely, it can also drag a good writer down to mediocrity.
Hopefully, this level sets the true capabilities of LLMs. For creative writing they are extremely flawed. Yet, they can be powerful aids if handled properly. I don’t pretend to know how to exploit them fully, but in the next installment, I hope to share some insights and pitfalls I’ve discovered along the way.
Conclusion
My hope is that by clarifying what these generative AI models actually are, the hysteria over the use of generative AI—particularly LLMs—can be tempered by a healthy dose of reality. The truth is, these models are nowhere close to being a substitute for human writing. Whether they can be used productively in creative writing? The jury’s still out.
