A Story Writer

(Caution: This is yet another ChatGPT-4 experience report that everyone has been writing these days. If you don’t like those, you’d better stop reading.)

My spouse had an idea: When we travel by airplane, it might be a good idea to record our reading-aloud of children’s books. Although some audiobooks are available, the selection of Japanese child books is limited in the audio market. And we read these books every night anyway, so why not record them?

Inspired by that, I got another idea: Maybe we can generate some personalized stories for my daughter using ChatGPT and read it to her. My father once wrote one for me when I was a kid of, and I remember loved it.

Even better, I can possibly make a podcast based on these “AI stories”. (What a stupid dad – you might think, and you’re probably right.)

So I signed up for ChatGPT Plus, enabled GPT-4, and gave it a try.

It was fascinating and disappointing at the same time: It does generate some short stories. What it spits out might be too short with a vanilla prompt, but you can tell it to generate “chapter N” to split it up.

Besides the length, I have encountered a few other problems:

The first problem is that it stutters: It stops generating the text in the midst of a sentence. It feels like a bug. I hope they implement some backtracking to keep it moving.

The second problem – although I’m not sure if I should call it a problem – is the boredom of the story: What it writes is extremely stereotypical and boring. It is, however, probably by design. The role of the language model is to generate “most likely text”, and that basically means being stereotypical and boring. In theory, OpenAI can “align” the model towards interestingness. But I’m not sure that’s the direction they would like. (This is not about the “temperature” – the disturbance of the generated text. I don’t think tuning the result less statistically likely makes the text more interesting. It would just make it more unintelligible.)

There is an escape hatch: You can mitigate the problem by explicitly directing the plot: Reject the boring part and ask for a retake. Give some clues on where you’d like to go.

Also, you can maybe ask for options for the plot first, then pick one of them to proceed. This “breast-first search” is something I’d like to try on the next iteration.

The third problem is more serious and for now a deal breaker to me: Its writing style is so cheap!

It’s known that you can ask for the writing style of famous writers, like Haruki Murakami, if the author is famous enough that the crawler has a good amount of data to mimic. But our (Me and Yuko’s) favorite writer of children’s books, Miyoko Matsutani, doesn’t reach that threshold. GPT-4 knows something about Matsutani and it tweaks the style a bit, but it’s far from satisfying. After all, I’m very picky about Japanese writing style and probably don’t care for “Internet Japanese” that GPT is based. I long for more boldness, beauty and authenticity. (What an old grumpy guy – you might think, and you’re probably right.)

The situation will change once OpenAI opens up the fine-tuning capability of their new models, and once it does, I’ll spend days and nights typing in Matutani’s work for training… well, maybe scanning all the books to OCR or cracking the ebooks to scrape, or whatever.


One thing I haven’t explored is the longer prompts: GPT-4 is known to be capable of ingesting much longer prompts than its older versions. My current prompts are a few hundred characters or less. It should go up to like a few thousand. (I’m not sure how byte pair encoding ingests Japanese text. My conservative guess is about one Japanese character per token.)

What should I ask to align the output to the Matsutani style? Is it even possible given her work isn’t in the model’s memory? Should I give some snippets as an example? I have no idea. It clearly needs more experiments and creativity.

Another possibility is to accept the “Internet Japanese” and the absurdity of the generated story, and push the model to the limit to see how crazy it can go. I’d never read it to my daughter, but it can be a great podcast episode. Maybe someone has already done that, but a few more is probably fine.

Despite the technical limitations, I consider my own creativity and taste as the largest limiting factors – How can I push the model hard? How can I judge the “interesting-ness” of the story? How to give it the right direction? It’s a creative challenge that might be worth pursuing.

That said, I’m not sure if it’s my kind of challenge. I should probably read more existing stories to my daughter using that time and energy: The stories are abundant out there, and the time with the little mine is a bit more scarse.


(For the record, here is my first attempt as a GPT-powered story writer. I’ll report back if I get better.)

It’s Time to Get Back to Work

Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

gpt-4.pdf

“GPT-4 is not easy to develop. It took pretty much all of OpenAI working together for a very long time to produce this thing. And there are many many companies who want to do the same thing, so from a competitive side, you can see this as a maturation of the field.”

OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’ – The Verge

It dawned on me that they officially ended the era of AI as a research, and switched gears towards AI as a product. For many startups, AI has been a product, not a research, and OpenAI is one such startup, but they have pretended to be a research lab of a big tech company or an elite university. Although there have been signs of this shift, like not releasing the GPT-3 model to the public, they still kept publishing research papers and people loved it. Until now.

Another iconic line from the GPT-4 “technical report” is the Authorship section (p.15) – It looks a lot like a staff roll of a movie, and it asks: Please cite this work as “OpenAI (2023)”. This would never happen if it were a research paper. This signifies that GPT-4 is a product and the whole company has worked towards building it.

What does this mean? When their researchers have done the research-y stuff, it’ll be folded into the next version of GPT and won’t become a standalone research. No published paper that is.

And I think it is already happening. Look at the paper about the Vision-Language model CLIP. This is their early iteration of visual understanding and you can consider it a building block of multimodal models like GPT-4.

This paper is two years old. If they kept doing the “research”, they should’ve been another paper for its successor. That didn’t happen. There is no CLIP-2; they have built the progress into GPT-4 instead. This is how product development works: The innovations occur inside the product. (Compare this to MSR which has up to three generations of VLP models!)

After all, there are no papers about, say, Google’s ranking algorithms (forget Page Rank), Facebook’s feed scoring, Amazon’s logistics, or Apple’s CPUs. These are all the marvels of computer science achieved by smart Ph.D. people, but all have occurred in the products. Few papers (if not none) were written. GPT has morphed into one of these monumental products.

Am I disappointed? Definitely. Do I complain? Not really. Academic research has its limitation. Sometimes innovation needs a product. It needs a user. It requires real problems. It demands money to earn. That’s why tech companies hire Ph.D. and let them write production code instead of academic papers.

The downside is that it mostly happens behind doors. This darkness however is worth it, at least for the innovating engineers and researchers themselves.

However, for us begging paper readers – It feels like the end of a great season. It’s time to get back to work.

I still hold a hope: Someday, once this war is over and the victory makes the GPT gangs rich, one of them goes back to school, write a textbook about AI, in which there is a chapter called “History”, and they share their retrospective as an insider: What they did right. What their peers overlooked. What the vision was. To what extent it is achieved, and don’t forget some technical details. It’d be a fascinating read.

Until that book arrives, I’ll keep reading. Well, I should get back to work. But after business hours, I’ll keep reading.

McDonald’s Theory Of Bug Fix

I like Jon Bell’s McDonald’s Theory and have put it to use from time to time. Last Monday was one such day.

My boss gave me a random bug. I debugged it and figured out what was wrong, but I was still unsure how to fix it. The code seemed to have some layering violations, but the right way to distribute the responsibility between the layers was unclear.

Looking at the commit history, the person who can fix this became apparent. I was wondering if I should just reassign the bugs to them, but that kind of action has had mixed results: People tend to push back, dodge or ignore. No one wants to waste their time on bug fixes.

Then, the idea of McDonald’s Theory came to my mind: Instead of reassignment, I “fixed” it anyway, and sent the PR to that person. To my own credit, I didn’t make a McChicken fix. I tried my best, and I’d consider it more like a McCrispy.

Yet, I know they will push it back. They indeed did. It was still a cheap McDonald’s fix. And it worked as McDonald’s: They thoroughly explained how wrong my “fix” was in the PR comment. So I appraised their analysis, pointed out they are much better positioned to fix it then asked to take it over.

As the theory goes, they agreed. And they gave me their version of the fix after a couple of days. It was better for sure: Instead of piling crap like mine, they removed the code in question. The layering violation seemed to be a leftover from some past refactoring, and the smelly code was not necessary after all. My faith in the theory has elevated.

I should acknowledge my luck that I hit the good engineer. The pattern I had seen was either 1) People just took the crap or 2) People pushed back but did nothing. Neither happened. They did what was supposed to be done, and I appreciate this healthy ending. People shouldn’t eat McDonald’s every day, and they need the ability to cook a good meal.

Morning Spinach: February

2023-02-15

2023-02-17

  • きのうは寝坊してできなかったが、Podcast 準備です。

2023-02-18

  • 飛行機の搭乗に間に合わない、という悪夢で目が覚め疲労困憊。05:15.
  • Podcast 準備つづき。

2023-02-20

  • ハイクの疲労で起きたら五時。
  • 引き続き podcast 準備。
  • 終了。しかし収録は来週。
  • 次なにやろうかなー・・・という前に、税金をやらねばならぬ・・・。
  • 税金。終わらすぞと思うも自社株向け証券会社のやつが来ていない。しかも来月中旬まで来ないらしい。相変わらず suck である。こいつの入力が一番大変なのに・・・。大変な理由は売れる分をこまめに全部売る設定にしているからで、これをやめえばラクになるし過去十年はこれをやめた方が圧倒的に儲かったが、もうそういう時代は終わりました・・・というか自社株保持とか怖くてできません。やれやれ。まあいいです。3/14 までは no actionable items なのでなんか他のことに時間使います。

2023-02-21

  • 04:35
  • 考え事でもするか・・・と思ったが、その前に溜まっっている Todoist で消化できるものがないか見るべし。

2023-02-22

  • 04:22
  • 引き続き考え事

2023-02-23

  • 04:38. 財布をなくす悪夢

2023-02-24

  • 雨ですねえ。
  • Accepting The Funny Mediocrity – Spinach Forest
  • やりたいことを書き出してみたが、まず子の Kindergarden のサイトを crawl して写真を集めるのが最初だな、と思い至った。なぜならサ終に伴いあと一ヶ月くらいでサイトが閉鎖してしまうからです。ひどい。
  • Python (Playwright) でやるか JS (Puppeteer) でやるか。Python の方が慣れてるが、Puppeteer の方が headless としては出来が良さそうなので悩ましい。良い機会だから JS/TS に入門しとくかなー。

2023-02-25

$ npm install --save-dev @types/puppeteer
npm WARN deprecated @types/puppeteer@7.0.4: This is a stub types definition. puppeteer provides its own type definitions, so you do not need this installed
  • 世の中は進歩しているぞ!
  • では Shutterfly のサイトにログインしてみましょう。
  • この XPath 叩ける $x ってなに?どこから来てるの?とおもったら “Console utility function” なのか。べんりじゃん・・・。
  • こういう便利機能もさー、たくさん devrel 雇ってるからこそ生まれるんだよわかってるかい社長さん・・・とか不景気なことを考え出すと幸福感が薄れるのでやめるべし。
  • 時間切れ。

2023-02-26

  • 04:30
  • つづき。
  • TypeScript なしに async/await なコードを書ける気がしない。Vanilla JS ヤベーな。がんばって TS にしておいてよかった・・・。
  • サイトの出来が悪いのか page load 終了を安定して検出できないためしょーもない workaround を沢山いれて乗り切るが、本質的にそういう仕事です・・・。
  • なにかとよくわからないページ遷移が発生するせいで Puppeteer の JS が死ぬので、かなり checkpointing を真面目にやって途中で死んでも継続できるようがんばらないといけない。時間切れ。
  • 昼。妻子外出中につき継続。
  • Hover すると表示されるメニューをクリックするとどこからともなく (Content-Disposition で) ダウンロードが開始されるというクソ仕様なのでどうすんだこれ・・・とおもったら Page.hover() method があるじゃん!Puppeteer すごい!
  • ダウンロードとかどう処理されるのだろうか・・・とおもったが headless ブラウザなので普通にダウンロードされるらしい。ダウンロード先どこやねん、というと、裏口API を使う。なるほど・・・。
  • 動いた!すごいぞ Puppeteer!
  • Node.js 16: setTimeout with async/await | LinkedIn
  • 古い方法で書いてしまったので後で書き直そう。

2023-02-27

  • 04:25. つづきやんぞ。
  • 更にいくつかの setTimeout() と retry を追加したところ、動き出した。
  • とおもいきやまたコケる。そのたびに適当なワークアラウンドを足す、の繰り返しである。この retry は higher order function で書き直したいが、それよりはさっさと片付けたい気持ちが先立ってしまうな。
  • しかし一日あたり 100-200+MB くらいのデータが 200 日分ある。Google Photos にアップロードしようかと思ってたけど、ちょっとでかい。そういえばむかし secon.dev で JPEG を WEBP に圧縮する話をみたな。全部おわったら真似してみよう。
  • PC の auto suspend を一旦 off にしておく。そして待ってる間やることもないので Google Photos の API でも調べるか・・・。
  • Library API overview  |  Google Photos APIs  |  Google Developers
  • Shutterfly scraping の 10x くらいめんどくさそうだな。CLI のツールとかないだろうか・・・。と調べるとあるにはあるが、あまり手堅い様子ではないなあ。
  • gphotosuploader/gphotos-uploader-cli: Command line tool to mass upload media folders to your google photos account(s) (Mac OS / Linux) これはどうかな?アップロードの仕方がドキュメントされてない・・・が、 API をたたくかったるさを考えるとこれを使うのが良さそうである。機能リストをよむと色々実装しなければいけないことがわかって尚更。そして上の secon.dev の記事もよく見ると同じツールを使っていた。
  • それにしても最近のプリスクールはアプリベースのサービスを使ってるところが多いけど、それで Web UI がない日には scraping とかできなそうじゃない?どうするんだろうね。というかどうもしないのだろうけれども。
  • なお学校の写真というのは別にポーズ取らせたりせず遊んだり作業したりしてるのを邪魔しないように遠くから撮ってるだけなので、そんなには面白くない。しかもつい最近まではマスク着用のため尚更である。あと割と無理にズームでよっている。電話機は Pixel 6a を使ってくれておりありがたいけれど、Pixel 7 Pro を進呈したい気になる。これは absurd でありながら過去に何度か真面目に考えたけど、もう電話機部門勤務ではなくなってしまったので、さすがにナシだな。
  • 時間切れ。ダウンロードの進捗は 1/3 くらい。途中でコケ無い限り、今日中にはおわることでしょう。
  • 勤務開始直前。終わってる。そして 50GB くらいあるなどうするべこれ・・・。圧縮して半分になってもちょっと Google Photos には置きたくないサイズである。気分的には 10GB くらいが budget なので、downsample して 1/5 くらいになってもらうかなあ・・・。Shutterfly がサ終したくなるのも無理はない。

2023-02-28

  • 04:45. ねすごし…
  • とりあえずダウンロードした写真は zip にでもして作業ディレクトリ外に避難しましょう。本来なら GCS にでも避難したいところだが、でかすぎて時間かりそうなのでまた別の機会。
  • xargs で時間を溶かし、Python に避難。シェル力が低すぎる・・・てか最初から Python で書いていれば並列化とかもできただろうに・・・むしろ TS で書くべきだったか・・・。
  • はーやんなっちゃうぜ時間切れ。
  • しかし冷静に考えて Google Photos に 200 個以上もアルバム作っちゃうのだろうか自分。狂気なのでは。
  • Delete all photos from an album from Google Photos : googlephotos そうだよねアルバム消しても写真消せないよね。やはり preschool 写真を共有目的で Photos にアップロードするのは危険すぎるな。Drive 使うほうが良いかなあ。一個くらいやってみるべきか。
  • prasmussen/gdrive: Google Drive CLI Client こういうのを使うらしい。Google Drive は directory based な permission がないせいでほんとこういうの nervous でやだな・・・。
  • ていうか皆様 iPhone で Mac なのだから Google Photos を経由しない時点で WEPB 使えないじゃん。終了。HEIF … と一瞬思ったが今度は自分が見れんわ。これが first world social divide というものです。
  • 覚悟を決めて 50GB をアップロードするしかない。まあ一時的なものなのでいいです・・・。
  • 明日は podcast の公開作業をしなければなりません。