First AI Movers
Posts
AI Transforms Work: Smarter Integrations, Human-like Voices

AI Transforms Work: Smarter Integrations, Human-like Voices

OpenAI's workplace connectors, ElevenLabs' expressive TTS model, and eco-friendly scheduling tools redefine productivity

Dr. Hernani Costa
June 12, 2025

Good morning and welcome to First AI Movers Pro. Today, we have two big updates shaking up the AI landscape: OpenAI is supercharging ChatGPT with deep workplace integrations, and ElevenLabs is revolutionizing text-to-speech with voices so lifelike they’ll make robotic monotones a thing of the past. Let’s get into it.

OpenAI: ChatGPT Gets Connected to Your Work

OpenAI has announced a major expansion of ChatGPT’s capabilities for business and enterprise users, effectively turning it into a workplace operating system for daily tasks. The new update allows ChatGPT to plug directly into a host of internal tools and data sources in real time, all while respecting existing user permissions. It can now search and synthesize information from your files, emails, and company apps to provide richer, more context-aware answers (with sources cited). Alongside this, OpenAI rolled out a native meeting recording feature in the ChatGPT Mac app that transcribes discussions and generates structured notes with action items. Together, these features signal OpenAI’s ambition to make ChatGPT a one-stop interface for work.

ChatGPT’s new connectors interface shows multiple data sources (email, cloud drives, calendars, etc.) that the AI can search for information. By indexing internal documents and tools, ChatGPT can answer context-rich questions (e.g., “What are our Q1 sales?”) using your own data, with citations and without breaking access permissions.

Here are the key upgrades OpenAI introduced:

Deep Research Connectors: Paid ChatGPT users (Plus, Pro) outside Europe and all Team/Enterprise/Edu customers can now connect ChatGPT to popular workplace apps and databases. This includes tools like Outlook email and calendars, Microsoft Teams, Google Drive, Gmail, HubSpot, Linear, and more, letting ChatGPT pull in knowledge from your emails, documents, tickets, and other internal sources in real time. For example, you can ask, “Find last week’s roadmap in Box,” and get an inline result with a link to the file. (Note: Certain connectors aren’t yet available in the EU/UK due to regulatory constraints.)
Enterprise Integrations: Additional cloud storage connectors – including SharePoint, OneDrive, Dropbox, and Box – are live specifically for Team, Enterprise, and Edu plans. ChatGPT can index files from these services and answer questions like “What was our Q1 revenue last year?” by fetching data from your company spreadsheets. All responses respect the user’s existing file access permissions and come with references to the source documents, addressing businesses’ privacy concerns.

Custom Connectors (MCP): Workspace admins can now build custom connectors to proprietary or third-party apps using the new Model Context Protocol (MCP), co-developed with Anthropic. Currently in beta, MCP lets developers integrate any tool or database into ChatGPT’s “deep research” mode. This means companies can hook ChatGPT into internal systems (CRM, wiki, etc.) by writing an MCP-compliant connector. OpenAI noted that Pro, Team, and Enterprise users can leverage MCP starting now, which could quickly expand the list of compatible platforms as others adopt the protocol. In short, if it has an API, you can likely make ChatGPT talk to it.
“Record Mode” for Meetings: Taking aim at AI note-takers, OpenAI launched a recording feature in the ChatGPT desktop app for Mac. Team, Enterprise, and Edu users can hit a Record button during meetings or brainstorms, and ChatGPT will automatically capture the audio, transcribe it, and produce organized notes and summaries. The notes come complete with timestamped citations and suggested follow-up actions. Uniquely, ChatGPT’s recorder doesn’t join the call as a participant (unlike Zoom’s or Teams’ bots) – it just listens locally through your device’s mic and turns what’s said into a structured report. For now, this is rolling out on macOS, but broader support is expected soon.

My Take: This feels like OpenAI’s biggest step yet toward making ChatGPT a true all-in-one workplace assistant. While meeting transcription itself is old hat (quality transcripts are a commodity now), the real battleground will be UX and workflow integration – basically, how seamlessly ChatGPT weaves into your daily tools and processes. As VC Olivia Moore noted, “quality transcription has been a commodity… it will come down to UI choices”. And on that front, ChatGPT has an enormous head start with an estimated 500 million-plus weekly active users fueling its distribution. However, this move also raises red flags for every startup building single-feature AI apps. When the platform (OpenAI) bundles your feature natively, you risk getting Sherlocked. The race to own the AI workflow interface has officially begun, and OpenAI just made a power play that others, from Microsoft to countless startups, will have to respond to.

ElevenLabs: Drops the Mic with Expressive Voice AI

Meanwhile, ElevenLabs has launched Eleven v3 (alpha), its latest text-to-speech model – and it’s a game-changer for voice AI. Billed as their “most expressive” TTS model ever, Eleven v3 can generate speech in over 70 languages with stunning realism. It introduces new features like multi-speaker dialogue mode and inline audio style tags that let you direct tone and emotion mid-sentence. The result? AI voices that can whisper, laugh, sigh, interrupt each other, and carry on a conversation that sounds eerily human. We’re witnessing the death of the robotic monotone – this model pushes synthetic voices much closer to genuine human speech in both nuance and dynamism.

Here are the highlights from ElevenLabs’ v3 update:

Multi-lingual Mastery: One voice, 70+ languages. The new model supports an impressive range of languages – more than 70 – all while preserving the speaker’s vocal characteristics. You can switch a synthesized voice from English to Spanish to Mandarin on the fly, without retraining, and it will speak each with natural fluency and accent. This opens up truly global use cases, from multilingual audiobooks and games to more accessible content across regions. The voices also show deeper text understanding, handling things like stress and cadence better, so the delivery feels authentic in each language.
Multi-Speaker Dialogue: Eleven v3 can simulate real conversations between multiple AI voices, all in one go. In “dialogue mode,” a single prompt can generate a back-and-forth exchange between different speakers, complete with natural pacing, interruptions, and emotional shifts as they react to each other. The model maintains contextual awareness between the voices, so dialogue flows logically and with shared understanding of the scenario. This is a big leap from the typical one-voice-at-a-time limitation of earlier TTS systems. Now, AI characters in games or stories can truly talk to each other with proper timing and tone, as if a human director orchestrated the scene.
Granular Audio Control: Content creators get film-director level control over how the AI delivers lines. Eleven v3 introduces inline audio tags – cues you insert into the text, like [excited], [whispers], [laughs], [sighs]– modulate the voice’s mood and speaking style on the flye. For example, you can script: “[whispers] There’s something behind the door... [shouts] Run!” and the voice will actually whisper the first part and shout the next. These tags cover not just emotions but also non-verbal sounds and delivery quirks, giving audio producers precise control over tone and even things like pauses or breaths. It’s like directing an actor – you can literally write in stage directions for the AI narrator.
Emotional Intelligence in Speech: Perhaps most impressively, v3’s voices demonstrate a new level of emotional awareness and reactivity. They can seamlessly handle interruptions, dynamic mood changes, and call-and-response interactions without sounding jarring. Under the hood, the model was redesigned for expressiveness – the voices don’t just read text plainly; they perform it. They’ll sigh, chuckle, or gasp as appropriate, making the speech feel “genuinely responsive and alive”. In ElevenLabs’ own demo, the AI voices were practically theatrical, conveying excitement, sorrow, tension, and humor with human-like nuance. This kind of emotional range has been very hard for TTS to get right until now.

My Take: I was blown away by the demo of Eleven v3. We are now witnessing the death of robotic voices. This model’s ability to infuse personality and emotion into speech is unlike anything I’ve heard before. Imagine when we integrate this into virtual assistants or humanoid robots – it will anthropomorphize AI even further, since our machines will literally speak with human-like expression and feeling. (In fact, v3 voices can sigh and laugh, sounding “genuinely alive,” as the creators said.) One recommendation: check out ElevenLabs’ prompting guide for v3, because getting the most out of these audio tags and dialogue features will take some practice. With the right prompts, the voices can be jaw-dropping. The gap between human and artificial communication is narrowing at an extraordinary pace, and ElevenLabs just pushed it even closer.

That’s a wrap for today.
Until tomorrow,
— Dr Hernani Costa @ First AI Movers Pro

Get Your 5-Minute AI Edge

This content is free, but you must subscribe to access Dr. Costa's critical AI insights on technologies and policies delivered before your first meeting.

Already a subscriber?Sign in.Not now

Reply

or to participate.