(Caution: This is yet another ChatGPT-4 experience report that everyone has been writing these days. If you don’t like those, you’d better stop reading.)
My spouse had an idea: When we travel by airplane, it might be a good idea to record our reading-aloud of children’s books. Although some audiobooks are available, the selection of Japanese child books is limited in the audio market. And we read these books every night anyway, so why not record them?
Inspired by that, I got another idea: Maybe we can generate some personalized stories for my daughter using ChatGPT and read it to her. My father once wrote one for me when I was a kid of, and I remember loved it.
Even better, I can possibly make a podcast based on these “AI stories”. (What a stupid dad – you might think, and you’re probably right.)
So I signed up for ChatGPT Plus, enabled GPT-4, and gave it a try.
It was fascinating and disappointing at the same time: It does generate some short stories. What it spits out might be too short with a vanilla prompt, but you can tell it to generate “chapter N” to split it up.
Besides the length, I have encountered a few other problems:
The first problem is that it stutters: It stops generating the text in the midst of a sentence. It feels like a bug. I hope they implement some backtracking to keep it moving.
The second problem – although I’m not sure if I should call it a problem – is the boredom of the story: What it writes is extremely stereotypical and boring. It is, however, probably by design. The role of the language model is to generate “most likely text”, and that basically means being stereotypical and boring. In theory, OpenAI can “align” the model towards interestingness. But I’m not sure that’s the direction they would like. (This is not about the “temperature” – the disturbance of the generated text. I don’t think tuning the result less statistically likely makes the text more interesting. It would just make it more unintelligible.)
There is an escape hatch: You can mitigate the problem by explicitly directing the plot: Reject the boring part and ask for a retake. Give some clues on where you’d like to go.
Also, you can maybe ask for options for the plot first, then pick one of them to proceed. This “breast-first search” is something I’d like to try on the next iteration.
The third problem is more serious and for now a deal breaker to me: Its writing style is so cheap!
It’s known that you can ask for the writing style of famous writers, like Haruki Murakami, if the author is famous enough that the crawler has a good amount of data to mimic. But our (Me and Yuko’s) favorite writer of children’s books, Miyoko Matsutani, doesn’t reach that threshold. GPT-4 knows something about Matsutani and it tweaks the style a bit, but it’s far from satisfying. After all, I’m very picky about Japanese writing style and probably don’t care for “Internet Japanese” that GPT is based. I long for more boldness, beauty and authenticity. (What an old grumpy guy – you might think, and you’re probably right.)
The situation will change once OpenAI opens up the fine-tuning capability of their new models, and once it does, I’ll spend days and nights typing in Matutani’s work for training… well, maybe scanning all the books to OCR or cracking the ebooks to scrape, or whatever.
One thing I haven’t explored is the longer prompts: GPT-4 is known to be capable of ingesting much longer prompts than its older versions. My current prompts are a few hundred characters or less. It should go up to like a few thousand. (I’m not sure how byte pair encoding ingests Japanese text. My conservative guess is about one Japanese character per token.)
What should I ask to align the output to the Matsutani style? Is it even possible given her work isn’t in the model’s memory? Should I give some snippets as an example? I have no idea. It clearly needs more experiments and creativity.
Another possibility is to accept the “Internet Japanese” and the absurdity of the generated story, and push the model to the limit to see how crazy it can go. I’d never read it to my daughter, but it can be a great podcast episode. Maybe someone has already done that, but a few more is probably fine.
Despite the technical limitations, I consider my own creativity and taste as the largest limiting factors – How can I push the model hard? How can I judge the “interesting-ness” of the story? How to give it the right direction? It’s a creative challenge that might be worth pursuing.
That said, I’m not sure if it’s my kind of challenge. I should probably read more existing stories to my daughter using that time and energy: The stories are abundant out there, and the time with the little mine is a bit more scarse.
(For the record, here is my first attempt as a GPT-powered story writer. I’ll report back if I get better.)