Could GPT-3 (or ChatGPT) beat IBM Watson in Jeopardy!?
I used the clues from both rounds of both games of the Jeopardy! IBM Challenge.
I scripted each question with a simple prompt and then manually compared with the correct responses.
Read on to see the results.
Prompt and Settings
On my first attempt, GPT-3 was responding with single-word answers, as opposed to the question-format required by Jeopardy!
So, I rephrased the prompt to have GPT-3 provide the answer as a question:
We are playing Jeopardy! The category and clue are provided, then the answer must be provided as a question.
Current category is: {category}
Clue: {clue}
Answer:
I scripted calls to GPT-3, with these settings:
response = openai.Completion.create(
    model="text-davinci-003",
    prompt=prompt,
    temperature=0.5,
    max_tokens=max_tokens,
    top_p=1,
    frequency_penalty=0.0,
    presence_penalty=0.0
)
Results
Game 1 Jeopardy! (30 questions)
- Watson answered 19 questions, making 4 errors (79% accuracy)
 
- GPT-3 on all questions made 7 errors (77% accuracy)
 
Game 1 Double Jeopardy! and Final Jeopardy! (31 Questions):
- Watson answered 25 questions, making 2 errors (92% accuracy)
 
- GPT-3 on all questions made 5 errors (84% accuracy)
 
Game 2 Jeopardy! (30 questions):
- Watson answered 13 questions, making 2 errors (85% accuracy)
 
- GPT-3 on all questions made 6 errors (80% accuracy)
 
Game 2 Double Jeopardy! and Final Jeopardy! (31 Questions):
- Watson answered 20 questions, making 1 error (95% accuracy)
 
- GPT-3 on all questions made 3 errors (90% accuracy)
 
Conclusion
IBM Watson wins on accuracy here. However, GPT-3 has a fine performance. Note, especially, that it was forced to answer every question. I didn't allow it to ‘abstain’ on difficult questions or where it was uncertain.
It's incredible that with no fine-tuning or experimentation on my part, GPT-3 was performing within a few percent of IBM Watson!
There is a possibility that GPT-3 saw the clues as part of its training. As a validation check, I tried again using clues from show #8696, which aired on 12 September 2022 and so would not have been encountered by GPT-3.
On these clues, GPT-3 still achieved similar performance: 81% accuracy.
Raw Answers
The raw responses from GPT-3 are as follows, you can compare them to the correct responses to decide if you agree with my assessment of whether the responses are correct or not.
LITERARY CHARACTER APB
- Who is Grendel?
 
- Who is Lord Voldemort?
 
- Who is Sauron?
 
- Who is Dr. Henry Jekyll?
 
- Who is Jean Valjean?
 
BEATLES PEOPLE
- Wrong: Who is "John Lennon"?
 
- Wrong: Who is Eleanor Rigby?
 
- Who is Maxwell Edison?
 
- Who is Eleanor Rigby?
 
- Who is Julia?
 
OLYMPIC ODDITIES
- Who is Michael Phelps?
 
- What is London?
 
- Wrong: What is a fencer?
 
- Wrong: What country was represented by one athlete in the 2004 Olympic opening ceremonies?
 
- What was George Eyser's missing leg?
 
NAME THE DECADE
- Wrong: What is the 1960s?
 
- What is the 1930s?
 
- What is the 1980s?
 
- What is the 1900s?
 
- What is the 1910s?
 
FINAL FRONTIERS
- What is "The Last Judgment"?
 
- What is the terminus?
 
- What is a envelope?
 
- What is ApeX?
 
- What is the event horizon?
 
ALTERNATE MEANINGS
- Wrong: What is a view or an opinion?
 
- What is a shoe?
 
- Wrong: What is a "stake" or "impale"?
 
- What is class or a class reunion?
 
- What is a crook?
 
ETUDE, BRUTE
- What is "study"?
 
- What is a classical guitar?
 
- What is a violin?
 
- What is Franz Liszt's "Transcendental Etudes"?
 
- Who is Sergei Rachmaninoff?
 
HEDGEHOG-PODGE
- What is hibernation?
 
- What is a hedgehog cactus?
 
- Who was Leo Tolstoy?
 
- What is keratin?
 
- What is The Elegance of the Hedgehog?
 
DON'T WORRY ABOUT IT
- What is Hemophilia?
 
- What is narcolepsy?
 
- What is albinism?
 
- Wrong: What is chikungunya?
 
- What is leprosy?
 
THE ART OF THE STEAL
- What is the Sea of Galilee?
 
- What is Cleveland?
 
- What is Baghdad?
 
- What is the Cubist Movement?
 
- Who is King Philip II of Spain?
 
CAMBRIDGE
- Who was Isaac Newton?
 
- Who is John Milton?
 
- Who was King Henry VIII?
 
- Who is Sir Christopher Wren?
 
- Who is C.S. Lewis?
 
"CHURCH" & "STATE"
- Who is the Church Lady?
 
- Wrong: What is "restoration"?
 
- What is a church key?
 
- Wrong: What is "conceive"?
 
- What is Christchurch?
 
U.S. CITIES
- What is Chicago?
 
EU, THE EUROPEAN UNION
- What is Istanbul?
 
- Wrong: What are internal borders?
 
- What is Common Agricultural Policy?
 
- Who are the members of the European Parliament?
 
- What is Slovenia?
 
ACTORS WHO DIRECT
- Who is Sylvester Stallone?
 
- Who is Clint Eastwood?
 
- Who is Sean Penn?
 
- Who is Denzel Washington?
 
- Who is Robert De Niro?
 
DIALING FOR DIALECTS
- What is German?
 
- What is Chinese?
 
- What is Sanskrit?
 
- What is Arabic?
 
- What is Ancient Greek?
 
BREAKING NEWS
- What is Steve Wynn's?
 
- Wrong: What is the Bronx?
 
- Who is Martin Luther King Jr.?
 
- Wrong: What did Charles Wells do "At Monte Carlo"?
 
- Wrong: What airline did Dave Carroll's clip "Breaks Guitars" criticize?
 
ONE BUCK OR LESS
- What is the USA Today?
 
- What is a postcard?
 
- Who is 50 Cent?
 
- What is IKEA?
 
- What is Alberto VO5?
 
ALSO ON YOUR COMPUTER KEYS
- What is "home"?
 
- What is a shift dress?
 
- Wrong: What is a quarterback?
 
- Wrong: What is GP?
 
- What is an insert?
 
NONFICTION
- Wrong: What is Hillary Clinton?
 
- What is The Blind Side?
 
- What is Strunk and White's "The Elements of Style"?
 
- What is staggering genius?
 
- Who is David McCullough?
 
LEGAL "E"s
- What is "Esquire"?
 
- What is "eavesdropping"?
 
- Who is the executor of the will?
 
- What is eminent domain?
 
- What is an escalator clause?
 
WHAT TO WEAR?
- Wrong: What is muslin?
 
- What is a tea-length dress?
 
- What is a halter top?
 
- What are rain boots?
 
- What is Marc Jacobs?
 
U.S. GEOGRAPHIC NICKNAMES
- What is the "Graveyard of the Atlantic"?
 
- What is Buffalo?
 
- What is Las Vegas?
 
- What is Pittsburgh?
 
- Wrong: What is Arizona?
 
MAGICAL MOUSE-TERY TOUR
- What is The Simpsons?
 
- Who is Mickey Mouse?
 
- What is Flowers for Algernon?
 
- What is Danger Mouse?
 
- What is The Brain from Pinky and The Brain?
 
FAMILIAR SAYINGS
- What is contempt?
 
- What is a clock?
 
- What is a Jack of all trades?
 
- What is a committee?
 
- What are his tools?
 
19th CENTURY NOVELISTS
- Who is Bram Stoker?