A study by Stanford and Berkeley researchers has found that OpenAI's language

 


A study by Stanford and Berkeley researchers has found that OpenAI's language models performed worse in some areas in June compared to March.

 For instance, GPT-4's accuracy in identifying prime numbers dropped from 97.6% to 2.4%.

The non-peer-reviewed study examined the performance of GPT-3.5 and GPT-4 in areas like solving math problems, answering dangerous/sensitive questions, generating code, and visual reasoning.

  • GPT-4 showed less willingness to answer sensitive questions in June, and both models had more formatting mistakes in code generation.
  • In June versus March, GPT-4's directly executable generations fell from 52% to 10%.
  • The paper highlights the issue of model drift, or a decline in the models' accuracy and performance over time.
  • "Overall, our findings show that the behavior of the 'same' LLM service can change substantially in a relatively short amount of time," the researchers wrote, adding that it's important to continuously monitor the models' performance.

The study aligns with some user reports about the models becoming less intelligent.

  • However, OpenAI's vice president of product, Peter Welinder, has denied intentional changes to make the models "dumber," saying that users may notice more issues over time simply because they use ChatGPT more.

   

Post a Comment

Previous Next

Contact Form