·7 min read

The GPT Era: From GPT-1 to GPT-5 - A Complete History

Kishore Gunnam

Kishore Gunnam

Developer & Writer

If transformers were the engine, GPT was the car that showed the world what it could do.

Seven years. From a modest experiment to systems millions use daily.

The big story of GPT isn’t “a model got bigger.” It’s three separate shifts happening together:

  • Pre-training made models broadly capable (learn general patterns from lots of text).
  • Instruction tuning / RLHF made them usable in conversation (follow the user, not just autocomplete).
  • Product + tools made them practical (search, retrieval, function calling, memory-like UX).

The Scaling Journey

GPT Parameter Growth

GPT-12018
117M
GPT-22019
1.5B
GPT-32020
175B
GPT-42023
~1.7T
GPT-52025
Unknown

That's roughly 10,000x growth in seven years.


GPT-1 (June 2018)

Modest by today's standards, but it proved the concept using the transformer architecture.


GPT-2 (February 2019)

OpenAI released it gradually over 9 months. Looking back, its output seems quaint. But it started important safety conversations.


GPT-3 (June 2020)

No more fine-tuning for each task - just write the right prompt. The prompt engineering era began.


InstructGPT & RLHF (2022)

Raw GPT-3 predicted text. It didn't try to be helpful. RLHF changed that.

RLHF: Teaching AI to Help

1

Collect Examples

Humans write ideal responses

2

Train Reward Model

Learn what humans prefer

3

Optimize with RL

Train to maximize reward

4

Result

Model follows instructions

This transformed GPT from "text predictor" to "helpful assistant." Full explanation in Part 6: AI Alignment.


ChatGPT (November 2022)

The model wasn't new. The interface was. Suddenly everyone could talk to AI.


GPT-4 (March 2023)

GPT-3.5
GPT-4
~10th percentile
Bar Exam
~90th percentile
Text only
Images
Understands images
Often fails
Reasoning
Much better

GPT-4 Turbo & 4o (2023-2024)


o1: Reasoning Era (September 2024)


GPT-5 (August 2025)


Key Lessons

Scaling works. Each 10x jump brought new capabilities.

RLHF was crucial. Turned text predictor into helpful assistant.

Interfaces matter. GPT-3 existed for 2 years before ChatGPT made it viral.

Reasoning is next. o1 showed how models think matters as much as size.


Common beginner mistakes

  • Thinking GPT models “know the truth.” They generate likely text unless grounded with sources.
  • Thinking “more parameters” automatically means “better.” Training data, tuning, and evaluation matter as much.
  • Mixing up “ChatGPT features” (tools, browsing, memory) with “model capability.”

What's Next?

In Part 4, we explore the competition - Claude, Gemini, LLaMA, and how the landscape shaped 2025.