• First AI Movers
  • Posts
  • GPT-5 vs GPT-4o: Blind Test Uncovers User Psychology & The Future of AI Adoption [2025 Guide]

GPT-5 vs GPT-4o: Blind Test Uncovers User Psychology & The Future of AI Adoption [2025 Guide]

Why Emotional Attachment Shapes Our AI Preferences — Plus Surprising Lessons for Leaders Navigating AI Change in 2025

In partnership with

A blind testing app shows users often prefer GPT-5 responses over GPT-4o when they can't tell which is which, contradicting the vocal complaints about GPT-5's launch. This psychological disconnect reveals how brand attachment and aversion to change can override actual performance preferences, highlighting deeper patterns in how we relate to AI systems.

The recent transition of the OpenAI model sparked an unprecedented user revolt. Despite GPT-5's objective improvements—94.6% accuracy on AIME 2025 mathematics tests versus GPT-4o's 71%, and 74.9% performance on real-world coding benchmarks compared to 30.8%—Reddit communities and social media erupted with demands for GPT-4o's return.

I'm Dr. Hernani Costa, founder of First AI Movers, where I help executives navigate the transformation to AI. Through my newsletter, which reaches hundreds of companies, I've seen firsthand how psychological factors often outweigh technical metrics in AI adoption decisions. The GPT-5 controversy perfectly illustrates why understanding user psychology is just as critical as understanding model capabilities when implementing AI in organizations.

Today, we're going to examine what happened when an anonymous developer created a blind testing platform that removes brand bias, revealing unexpected gaps between stated preferences and actual choices. The findings provide crucial insights for anyone leading AI initiatives in their organization.

Table of Contents

What Happened During the GPT-5 Launch Crisis?

When OpenAI launched GPT-5 in August 2025, the company made a strategic decision that backfired spectacularly: they deprecated GPT-4o without warning. Reddit communities, particularly those focused on AI and ChatGPT, erupted in criticism. Users described feeling like they had "lost a friend," complaining about GPT-5's perceived "coldness" and "robotic" personality.

The backlash was swift and intense. Power users who had formed deep attachments to GPT-4o's conversational style demanded its immediate return. OpenAI, recognizing the severity of the user revolt, quickly reversed course and restored GPT-4o access within a week.

But here's where it gets interesting: many users who celebrated GPT-4o's return began reporting that the restored model felt different from the original. Reddit user Suitable Style 7321 wrote, "It's become clear to me that the version of ChatGPT 4o that they've rolled back is not the one we had before. It feels more like GPT-5 with a few slight tweaks".

This observation raises intriguing questions about the distinction between perception and reality in AI interactions.

The Anonymous Developer's Brilliant Solution

Enter an anonymous programmer known as "Flowers" (or "Flower Slop" on X), who created an ingenious solution to separate emotional attachment from actual preference. Their blind testing platform at gptblindvoting. presents users with pairs of responses—one from GPT-5 and one from GPT-4o—without revealing which model generated which response.

The methodology was carefully designed to eliminate bias. Both models received identical prompts, with formatting constraints applied to prevent users from identifying the models based on their response structures. As the creator explained, "I specifically used the gpt-5-chat model, so there was no thinking involved at all. Both have the same system message to give short outputs without formatting because otherwise it’s too easy to see which one is which".

What the Blind Test Results Actually Show

Recent blind test tools have allowed users to compare responses from GPT-5 and GPT-4o without knowing which model they are using. Many technical users and developers, when voting blindly, prefer GPT-5’s straightforwardness and accuracy, but a large share of everyday users still choose GPT-4o for its creative, “warmer” responses.

A recurring theme in social media discussions and Reddit forums is that users’ subjective preferences often contradict their stated opinions. One Reddit user, surprised by their own blind test results, said: “I was expecting the results to be 50-50 with the conclusion being “see, you don’t miss 4o at all because you can’t even distinguish between the two”, but I got about 80% on GPT-5, which surprised me, because most answers were extremely similar yet apparently GPT-5 does have an edge that made me prefer its answers.” Another noted GPT-5 felt “more succinct and direct,” but others “emotionally missed GPT-4o’s personality,” even after selecting more GPT-5 responses in blind tests.

Tech media and experts agree: “Objective improvements do not always lead to subjective satisfaction. Personality, emotional intelligence, and how ‘human’ a model feels have become as important as technical competence.” Companies are now challenged to balance improvements in performance with the strong emotional attachments users have formed with their favorite AI models.

The disconnect is striking. Users who vocally criticized GPT-5's launch often found themselves preferring its responses in blind conditions. This suggests that brand perception and aversion to change significantly influence our stated preferences for AI.

The Psychology Behind the Preference Gap

Research on human-AI relationships reveals several psychological factors at play:

  • Attachment Formation: Users develop emotional bonds with AI systems that extend beyond objective performance. The sudden removal of GPT-4o triggered genuine grief responses similar to losing a familiar tool or companion.

  • Change Aversion: Humans naturally resist changes to systems they've mastered, especially when the change is imposed rather than chosen. The forced transition amplified negative reactions regardless of actual model quality.

  • Expectation Bias: When users know they're interacting with a "new" model, they actively look for differences and may interpret neutral changes as negative ones.

The Meta AI Talent Wars: When Money Can't Buy Loyalty

The psychological patterns revealed in the GPT preference study mirror what's happening in AI talent acquisition. Meta's aggressive recruitment for its Superintelligence Labs offers a fascinating parallel case study in how attachment and culture trump pure financial incentives.

Meta reportedly offered "nine-figure pay packages" to attract top researchers, but within weeks, several high-profile hires left to return to their previous companies. Avi Verma and Ethan Knight left Meta's Superintelligence Lab after less than a month to go back to OpenAI. Rishabh Agarwal, who joined Meta in April, also left, saying on X: "It was a tough decision not to continue with the new super Intelligence TBD lab, especially given the talent and compute density."

These departures, despite unprecedented compensation offers, demonstrate that workplace attachment involves factors beyond monetary rewards—much like user attachment to AI models, which extends beyond technical capabilities.

My Take: The Meta talent exodus perfectly illustrates what I see in SMEs and enterprise AI adoption. Companies often assume that better specs or higher salaries automatically translate to better outcomes. But humans—whether employees or AI users—form complex relationships that include emotional attachment, familiarity, and cultural fit. Smart AI leaders factor these psychological elements into their implementation strategies.

NVIDIA's Mixed Signals: When Success Feels Like Failure

NVIDIA's Q2 2025 earnings offer another perspective on how psychological framing influences the perception of AI progress. The company reported record revenue of $46.7 billion, representing 56% year-over-year growth. Yet the market response was tepid, with shares falling 5% in after-hours trading.

The disconnect stems from expectations management. While 56% growth would be extraordinary for most companies, NVIDIA faced comparisons to its 2024 quarters, where revenue grew by over 200% year-over-year. As one analysis noted, "NVIDIA in 2024 had multiple quarters where revenue was up by more than 100% compared to 2023. Now obviously the idea that the largest company on earth is going to continue to grow revenue at anywhere close to 200% in perpetuity defies all economic logic".

Jensen Huang remains optimistic about long-term AI capital expenditure, believing "3 to 4 trillion is fairly sensible for the next 5 years". Morgan Stanley's latest capital expenditures (CapEx) estimate shows 56% growth, a 12 percentage point increase from their first-quarter forecast.

Lessons for AI Implementation Leaders

The NVIDIA earnings reaction offers crucial insights for SME and Enterprise AI adoption:

  • Expectation Management: Setting realistic timelines and success metrics prevents the "disappointment despite success" phenomenon that NVIDIA experienced.

  • Long-term Vision Communication: Huang's multi-trillion-dollar AI CapEx projections help investors understand the extended timeline for AI transformation—a communication strategy SME and enterprise leaders should emulate with their stakeholders.

  • Performance Context: Just as NVIDIA's 56% growth may seem slow compared to 200% quarterly growth, AI implementations that deliver solid ROI might appear underwhelming if stakeholders expect revolutionary, overnight changes.

Bringing It All Together: What This Means for AI Leaders

The GPT-5 preference paradox, Meta's talent retention challenges, and NVIDIA's market reception reveal consistent patterns about human psychology in AI adoption:

User Attachment Trumps Technical Metrics

People form emotional relationships with AI tools that extend far beyond feature lists. When planning AI transitions in your organization, consider these factors:

  • Gradual Introduction: Instead of forced switches, offer parallel access to new and familiar systems

  • Change Communication: Frame updates as enhancements rather than replacements

  • Feedback Loops: Create channels for users to express concerns and preferences during transitions

Blind Testing Reveals True Preferences

The anonymous developer's blind testing approach offers a powerful methodology for SMEs and Enterprise AI evaluation:

  • Remove Brand Bias: Test AI tools without revealing which vendors provide which solutions

  • Focus on Outcomes: Measure actual task completion and user satisfaction rather than stated preferences

  • Iterative Refinement: Use blind comparisons to optimize AI tool selection and configuration continuously

Cultural Fit Matters More Than Compensation

Meta's talent exodus despite massive compensation packages mirrors what happens when organizations choose AI solutions based solely on technical specs or cost:

  • Workflow Integration: The best AI tool is often the one that fits existing workflows rather than the most technically advanced

  • Training Investment: User comfort and competence with AI tools require time and cultural adaptation

  • Retention Strategy: Once teams become proficient with AI systems, switching costs include both technical and psychological elements

Final Thoughts

The GPT-5 vs GPT-4o controversy teaches us that successful AI adoption requires managing human psychology as carefully as technical specifications. When users can't tell which model they're using, they often prefer the technically superior option. However, when they are aware that a change has been imposed, emotional attachment and aversion to change tend to dominate their responses.

Smart AI leaders recognize that the best technology isn't always the technology users think they want. The blind testing approach offers a powerful method for distinguishing genuine performance preferences from psychological biases, thereby facilitating more objective decision-making in the selection and deployment of AI tools.

The lesson for SMEs and Enterprise AI adoption is clear: invest as much effort in change management and user psychology as you do in technical evaluation. The most capable AI system is worthless if your team resists using it.

Want to stay ahead of AI trends that matter to your business? Join 4,000+ executives reading First AI Movers Daily Briefing. Every day, I break down the AI developments that will actually impact your industry—no fluff, just actionable insights. Subscribe to First AI Movers Daily Briefing

Ready to optimize your organization's AI adoption strategy? Connect with me on LinkedIn or email [email protected] for strategic partnerships and consulting opportunities.

About the Author

Dr. Hernani Costa is an AI strategist, founder of First AI Movers, and fractional CxO partner helping executives and founders navigate AI transformation without losing their humanity. With a PhD in Computational Linguistics and over 25 years of experience spanning academic research, startup leadership, and AI consulting, Dr. Hernani has guided dozens of organizations through the practical implementation of AI while maintaining high ethical standards. These days, he's laser-focused on helping leaders become truly AI-first, cutting through the complexity to deliver insights that actually move the needle.

Connect with Dr. Hernani: LinkedIn | Strategic partnerships: [email protected] | Newsletter: First AI Movers | Insights: insights.firstaimovers.com

Now, a word from our main sponsor:

Training cutting edge AI? Unlock the data advantage today.

If you’re building or fine-tuning generative AI models, this guide is your shortcut to smarter AI model training. Learn how Shutterstock’s multimodal datasets—grounded in measurable user behavior—can help you reduce legal risk, boost creative diversity, and improve model reliability.

Inside, you’ll uncover why scraped data and aesthetic proxies often fall short—and how to use clustering methods and semantic evaluation to refine your dataset and your outputs. Designed for AI leaders, product teams, and ML engineers, this guide walks through how to identify refinement-worthy data, align with generative preferences, and validate progress with confidence.

Whether you're optimizing alignment, output quality, or time-to-value, this playbook gives you a data advantage. Download the guide and train your models with data built for performance.

Reply

or to participate.