ChatGPT vs GPT-4 Logic Wars

Apr 11, 2023

Researchers at Zhejiang University (among other contributors) are testing the limits of logical reasoning for AI models.

Mainly, they wanted to use algorithms to harness logical reasoning which has long been an endeavor for language models.

The researchers had some interesting takeaways:

ChatGPT performed consistency in Chinese and English test, GPT4 did not
Chat-GPT performs well on well-known Logical reasoning tests like LogiQA and ReClor. However, when tested on the newly released dataset, namely AR-LSAT, and on LogiQA2.0 out-of-distribution dataset, the performance declined significantly.
On the ReClor dev set, GPT-4 reaches a 92.00% accuracy which is remarkable! However, when tested on the AR-LSAT test set, GPT-4 performs surprisingly worse with only an 18.27% accuracy

So essentially,

"Language AI models do well on known logical tests but struggle with out-of-distribution tests immensely!"

So Essentially