Modern day Generative AI applications are build on a statistical process that takes in text and represents it as numbers in such a way that similar words have similar numbers. This is done as a way of preserving some of the meaning of the underlying document. From there, all sorts of cool information processing can happen: text generation, information retrieval, classification and recommender systems, etc.
What started as text is spreading to other "modes" as they're called in machine learning. So now, this same approach work with images, sound, and video as well. Similar words, pictures, sound and video can be coded to have similar numbers, assuming you've got a good enough dataset to train the model. OpenAI claims it does. You can type the word fluffy dog and pull up a picture of one. Have the model write a Ken Burns-style narrative for a silent video and suggest accompanying music? Sure, no problem. It's here and this post will soon feel outdated as as result.
If we step back, what's the idea behind this technique? The idea is that the meaning something carries has a lot to do with the context the thing is in. The American linguist Zellig Harris noticed that word existing in similar contexts tend to have similar meaning. Soon after, the Scottish linguist John Firth arrived at his idea theory of relatedness: often summarized as: "you can know a lot about a thing by the company it keeps." It was formalized mathmatically as the Distributional Hypothesis.
Now, this theory is being applied to text to understand the relationship of words to each other. You can model these relationships using Machine Learning and apply it using Generative AI. You can take any dataset of words sufficient size to create probabilities and make predictions with them. But context in writing is about more than just the position of words to each other. It's about where the word appears in the document. A word in the headline often carries more meaning than one in the footnote.
Or not. The footnotes of an infomercial are quite interesting, actually. Tom Waits once crooned that "the big print giveth and the small print taketh away."
Meaning could also depends on what type of document we are dealing with (novel versus medical report). It could matter who wrote it. And on and on.
So, context is about much more than the words. Marketers know that you can drive human behavior by when you present the right message at the right time at the right place. Brand marketers go to great lengths to place their messages in contexts where other things might lend their brand some equity. Meanwhile, performance marketers optimize a context called media. Direct marketers made a science of aligning the message, format and offer.
Imagine if you could train a model with time and place data along with the message. What if you tied generative models with some data about performance and asked them to learn? You can know a lot about a thing from the company it keeps. Turns out we already do and fully expect the next generation of AI models will utilize this information in some interesting ways.