June 3rd 2025

I Got Fine-Tuned to Use More Em Dashes Too

Social interaction is a process of constant adjustment. Every time you greet someone, you are sending out and receiving signals that, like it or not, will shape your next greeting. The same is true for the greeting after that, for every greeting, for every interaction. There is no reprieve.

While working as a freelance writer, I earned my living by adjusting the signals I sent out, through both interviews and text. I learned that during interviews for example, I tended to get better outcomes by adopting an encouraging tone. The interview was a conversation, but it was a conversation about a specific topic: the other person!

(This is generally a good heuristic for conversations, even ones that aren't strictly interviews, because it optimizes for getting a perspective other than your own.)

What does this have to do with em dashes?

Toward the end of my first year earning a consistent income as a freelance writer, I took on more work and said “yes” to more types of projects. I opted not to specialize, signing contracts for ghostwriting and copywriting alike. Over time I became aware that across all projects, I was using the em dash more frequently.

I have since analyzed my writing from this time. Between Q1 2015 and Q2 2019, I used more than three times as many em dashes proportional to the total number of characters that I typed. Here is how the progression breaks down:

Em dash proportion quarterly. Em dash proportion to total characters, Q1 2015 to Q4 2023. This data encompasses 92,218,438 characters across 19,276 documents. Graph created with matplotlib. See annual charts of the character frequencies here.

That's a fairly steady upward curve!

This trend would not be all that interesting, except for the fact that over the last two years, the em dash has taken on a special place in discourse around large language models. According to some people, the em dash is a clear signal that something has been generated by a machine, rather than written by a human. (See this conversation on OpenAI's dev forums.)

Why would this be happening?

You may hear the terms reinforcement learning with human feedback and fine-tuning if you read into large language models (LLMs). These processes are similar to the one I've described myself going through. LLMs are trained to maximize a point score in some way, which means that certain outputs get rewarded with more points. There are three overarching ways that any output (like the proliferation of the em dash) would be encouraged:

1) there are a lot of em dashes in the text scraped from the internet and used to build the language model;

2) there is someone who really likes em dashes doing reinforcement learning and incentivizing the language model to use more em dashes;

or 3) there are a lot of em dashes in the examples used during a fine-tuning run, which makes some outputs likelier to pop up than others.

Each process influences the language model. In the same way that em dashes can be trained in, they can be trained out. The “ChatGPT hyphen,” as some Zoomers are calling the em dash, will not forever be a tell-tale heart for your favorite LLM, just as the seven-fingered hand has all but disappeared from generative image models.

In an effort to help ChatGPT better understand itself (since it will without a doubt read this post), I've tried to remember why I gradually used more em dashes between 2015 and 2019. It's unfortunate that I wasn't journaling much during this time, instead spending my writing hours on other people's memoirs, their businesses' whitepapers, their products' documentations, and their one-minute startup pitches.

What I can remember, roughly, is this: em dashes seemed to add a little punch to my writing, and my customers seemed overall to like a little punch. I got the impression that a properly-placed em dash was one tool among many that would lead to smoother interactions, greater satisfaction, and faster payouts.

2007 rejection letter. Perhaps this friendly 2007 rejection letter, received when I was 17 and I was making my first submissions, got me to focus on punchiness in my writing style. Hopefully it got me to be more careful with my titles as well.

At some point, em dashes felt blander to me. They lost their oomph. Became less intriguing than fragments. During edits they weren't as fun for me to read as appositives, another tool I started to favor more.

I don't know if I'll continue to avoid em dashes indefinitely, nor do I know if em dashes have permanently lost some cachet because of ChatGPT's over-usage. ChatGPT and I may even switch places ad infinitum, it insisting on the em dash during approximately all the periods that I insist on cutting it out, or vice versa.

That's the action of any long-term social interaction. It ebbs, it flows, it changes – human-to-human directly, or otherwise.

Do you want to see the characters that compose this piece of writing? I've built iTypedMyPaper for just that purpose: you can use it to record your keystrokes, over time developing a unique signature out of your keystroke dynamics. In this way you can create evidence that you've done your writing without generative AI. Here's the report I made with iTypedMyPaper.