Paper #7

GPT-4 Technical Report (2023)

AI Confidence: 80%

AI-generated

TL;DR

GPT-4 is a large multimodal model that accepts both text and image inputs and produces text outputs. It represents a significant leap in capability across reasoning, knowledge, and instruction following compared to GPT-3.5.

What It Does

GPT-4 is OpenAI's most capable model at the time of its release. It processes both text and images, performs at human-level or above on many professional and academic benchmarks (passing the bar exam in the 90th percentile, scoring 5s on multiple AP exams), and demonstrates markedly improved reasoning compared to its predecessors.

The technical report is notably opaque about architecture details, training data, and model size. OpenAI explicitly chose not to disclose these details, citing competitive and safety concerns.

Why It Matters

GPT-4 demonstrated that continued scaling and training improvements produce qualitatively new capabilities. Tasks that GPT-3.5 could not do reliably (multi-step reasoning, following complex instructions, understanding images) became routine with GPT-4.

It set a new bar for what commercial AI systems can do and triggered an industry-wide race to match its capabilities. Every subsequent model release from Anthropic, Google, Meta, and others has been benchmarked against GPT-4.

The multimodal capability (understanding images) opened new application categories: visual question answering, diagram interpretation, OCR, and more.

Key Details

Authors: OpenAI (no individual authors listed on the technical report).

Notable benchmark: 90th percentile on the Uniform Bar Exam (up from 10th percentile for GPT-3.5).

Link to paper: https://arxiv.org/abs/2303.08774

Sources & Further Reading

Full paper: https://arxiv.org/abs/2303.08774

OpenAI: "GPT-4" - https://openai.com/research/gpt-4

Sparks of AGI paper (Microsoft Research) - https://arxiv.org/abs/2303.12712