First AI Movers
Posts
Rise of the Everyday Agent—Software That Actually Gets Things Done

Rise of the Everyday Agent—Software That Actually Gets Things Done

Browsers, retailers, cloud giants, and researchers give AI agents real jobs across nine specialties and seven languages

Dr. Hernani Costa
June 14, 2025

In partnership with

Good morning, First AI Movers,

Happy Saturday! The theme today is agentic AI, i.e., systems that don't just answer but act on our behalf. In the past week, we saw browsers, retailers, cloud giants, and researchers give software agents real jobs. Let's unpack what matters.

Lead Story: The Rise of the Everyday Agent

Opera Neon: a browser built around agents.

Opera's new Neon browser ships with an "AI Shelf" that hosts mini‑agents for tasks like summarizing pages, rewriting email drafts, and price‑tracking items in‑tab. Each agent can call the browser's DOM API, so it clicks links, fills forms, and even schedules deliveries without user copy‑pasting. Opera says third‑party devs will get an SDK next quarter—think Chrome extensions, but powered by LLM‑driven action graphs.

Amazon's stealth agent & robotics group.

Internal memos leaked this week reveal Amazon has spun up an R&D unit codenamed Orion to merge Alexa's LLM stack with its warehouse‑robot fleet. The goal: autonomous "pick‑plan‑pack" agents that talk to humans, query inventory, and dispatch Sparrow robots on the fly. Orion inherits staff from Zoox (self‑driving) and Lab126 (Echo devices), hinting Amazon wants one agentic brain spanning home, cloud, and logistics.

Biomni: a multitask biomedical agent.

Stanford researchers open‑sourced Biomni, an agent fine‑tuned on PubMed, protein databases, and lab protocols. It reads PDFs, parses CSV assay results, drafts grant sections, and suggests follow‑up experiments—then logs everything in ELN software. In benchmarks across nine life‑science workflows, Biomni matched or beat human PhDs on task accuracy (average F1 ≈ 0.91) while finishing jobs 7× faster.

Why this matters?

Interface shift. Agent‑first browsers (Neon) and cloud APIs (Amazon Orion) move from "chat on the side" to "AI integrated into the work surface."

Vertical expertise. Biomni shows domain‑specific agents can outperform generic LLMs by wiring in structured data and tool APIs.

Productivity upside. Google engineers already write 30% of new code with AI, up from 25% last October. GitHub Copilot writes ~40% of Microsoft's commits, and Meta targets 50% within a year. As agents mature, that curve bends even steeper.

Taken together, agentic AI is leaving the lab and embedding in everyday workflows—from lab benches to browsers to fulfillment centers.

Quick Takes!

Retail runs on agents!

Walmart is piloting LLM agents for merchant onboarding and real‑time customer support, while Amazon tests autonomous task chains for warehouse slotting and returns. Early Walmart prototypes cut SKU‑listing time by 60%.

Clinical multi‑agent pipeline shines!

A University of Pittsburgh study chained "reader," "evidence‑retriever," and "coder" agents on 6k real EMR notes, flagging cognitive‑impairment cases with F1 ≈ 0.90—on par with neurology residents.

Security red flag!

Researchers found a new prompt‑injection flaw in Microsoft Copilot that let malicious web content rewrite the agent's browser actions. Microsoft patched the demo, but auditors warn: "Any agent that can browse or click is a potential attack surface—treat it like RPA with admin rights."

Fun Fact

Google famously rented a herd of goats to mow the lawn at its Mountain View HQ back in 2009. The eco‑friendly groundskeepers—guided by a border collie named Jen—chewed weeds for a week and fertilized the grass for free. Proof that even Big Tech sometimes prefers analog agents! 🐐

Tool Highlight — Gemini Gems: Pocket‑Size Agents for Daily Tasks

How & Why: In the Gemini interface hit "Create Gem." Give it a name ("Markdown‑Proofreader") and seed instructions. Gems persist those rules, so every chat follows your template—great for repetitive copy edits, brand‑tone checks, or quick SQL sanity tests.

When to use:

You run the same prompt daily ("summarize daily sales CSV").
Teammates need a branded voice guide.
You want a lightweight agent without spinning up your own API host.

Limitations: Custom knowledge size is capped (~20k characters today), and Gems can't yet call external APIs—so heavy‑duty actions still need Project Mariner or third‑party tools.

Wrap‑Up & CTA

Agentic AI is no longer theory—browsers, warehouses, and biotech labs are using it today. Which workflow in your stack is ripe for its first autonomous helper?

Dive deeper: I break down Google's agent roadmap and build tips in my LinkedIn Newsletter piece—read it here for insider strategies on leveraging Google's latest AI toolkit for your startup.

Share your thoughts or agent war‑stories—just hit reply.

Stay curious & keep your GPUs cool,

— The AI Sailor ⚓️

Get Your 5-Minute AI Edge

This content is free, but you must subscribe to access Dr. Costa's critical AI insights on technologies and policies delivered before your first meeting.

Already a subscriber?Sign in.Not now

Reply

or to participate.