Category: AI Related

  • Amazon Launches Agentic AI-Powered Seller Assistant for Third-Party Merchants

    Amazon unveiled an upgraded Seller Assistant, an AI agent designed to automate and optimize tasks for its third-party sellers, who account for over 60% of sales on the platform. Powered by Amazon Bedrock, Amazon Nova, and Anthropic’s Claude models, this “agentic” AI goes beyond simple chatbots by reasoning, planning, and executing actions with seller authorization—transforming it into a proactive business partner.

    Here is the key Features and Capabilities

    • Inventory and Fulfillment Optimization: The agent continuously monitors stock levels, identifies slow-movers, and suggests pricing tweaks or removals. It analyzes demand forecasts to recommend optimal shipment plans via Fulfillment by Amazon (FBA), balancing costs, speed, and availability.
    • Account Health Monitoring: It scans for issues like policy violations, poor customer metrics, or compliance gaps in real-time, proposing and implementing fixes (e.g., updating listings) upon approval to prevent sales disruptions.
    • Compliance Assistance: Handles complex regulations by alerting sellers to missing certifications during product setup and guiding document submissions, reducing errors in international sales.
    • Advertising Enhancement: Integrated with Creative Studio, it generates tailored ad creatives from conversational prompts, analyzing product data and shopper trends. Early users report up to 338% improvements in click-through rates.
    • Business Growth Strategies: Reviews sales data and customer behavior to recommend expansions, such as new categories, seasonal plans, or global markets, helping sellers scale efficiently.

    Sellers interact via natural language in Seller Central, where the agent provides instant answers, resources, or automated actions—freeing up time for core business activities. For instance, it can coordinate inventory orders or draft growth plans autonomously.

    Benefits for Sellers

    This tool addresses pain points amid trade tensions and rising costs, like predicting demand to avoid overstocking. Sellers like Alfred Mai of Sock Fancy praise it as a “24/7 business consultant” that handles routine ops while keeping humans in control. By automating tedious tasks, it could save hours weekly, boost efficiency, and drive revenue—especially for small merchants competing in a volatile e-commerce landscape.

    Rollout and Availability

    Currently available at no extra cost to all U.S. sellers in Seller Central, with global expansion planned for the coming months. Additional features, like advanced analytics, will roll out progressively. Amazon positions this as part of broader AI investments, following tools like Rufus for shoppers.

    As AI agents proliferate, this launch underscores Amazon’s push to retain seller loyalty amid competition from Shopify and Walmart. Early feedback highlights its potential, though some note the need for oversight to avoid over-reliance. For more, check Seller Central or Amazon’s innovation blog.

  • Google DeepMind Proposes “Sandbox Economies” for AI Agents, a paper on Virtual Agent Economies

    Google DeepMind researchers, led by Nenad Tomašev, published a paper on arXiv titled “Virtual Agent Economies,” exploring the rise of autonomous AI agents forming a new economic layer. The study frames this as a “sandbox economy,” where agents transact at scales beyond human oversight, potentially automating diverse cognitive tasks across industries.

    The framework analyzes agent economies along two dimensions: origins (emergent vs. intentional) and permeability (permeable vs. impermeable boundaries with the human economy). Current trends suggest a spontaneous, permeable system, offering vast coordination opportunities but risking systemic instability, inequality, and ethical issues. The authors advocate for proactive design to ensure steerability and alignment with human flourishing.

    Examples illustrate potential applications. In science, agents could accelerate discovery through ideation, experimentation, and resource sharing via blockchain for fair credit. Robotics might involve agents negotiating tasks, compensating for energy and time. Personal assistants could bid on user preferences, like vacation bookings, yielding concessions for compensation to prioritize high-value tasks.

    Opportunities include enhanced efficiency and “mission economies” directing agents toward global challenges, such as sustainability or health. However, risks encompass market failures, adversarial attacks, reward hacking, and inequality amplification if access is uneven.

    Key design proposals include auction mechanisms for resource allocation and preference resolution, ensuring fairness. Mission economies, inspired by Mazzucato’s work, could incentivize collective goals via subsidies or taxes. Socio-technical infrastructure is crucial: verifiable credentials for trust, blockchain for transparency, and governance for safety. The paper discusses integrating human preferences, addressing sybil attacks, and fostering cooperative norms.

    Drawing from economics, game theory, and AI safety, the authors reference historical tech shifts and warn of parallels to financial crises. They emphasize collective action to manage permeability, preventing contagion while enabling beneficial integration.

    This visionary paper calls for interdisciplinary collaboration to architect agent markets, balancing innovation with ethics. As AI agents proliferate—evidenced by systems in education, healthcare, and more—intentional design could unlock unprecedented value, steering toward equitable, sustainable outcomes.

    Source

  • OpenAI Study Reveals How People Use ChatGPT, a comprehensive research paper…

    OpenAI released a comprehensive research paper titled “How People Use ChatGPT,” authored by Aaron Chatterji, Tom Cunningham, David Deming, Zoë Hitzig, Christopher Ong, Carl Shan, and Kevin Wadman. The study analyzes the rapid adoption and usage patterns of ChatGPT, the world’s largest consumer chatbot, from its November 2022 launch through July 2025. By then, ChatGPT had amassed 700 million users—about 10% of the global adult population—sending 18 billion messages weekly, marking unprecedented technological diffusion.

    Using a privacy-preserving automated pipeline, the researchers classified a representative sample of conversations from consumer plans (Free, Plus, Pro). Key findings show non-work-related messages growing faster than work-related ones, rising from 53% to over 70% of usage. Work messages, while substantial, declined proportionally due to evolving user behavior within cohorts rather than demographic shifts. This highlights ChatGPT’s significant impact on home production and leisure, potentially rivaling its productivity effects in paid work.

    The paper introduces taxonomies to categorize usage. Nearly 80% of conversations fall into three topics: Practical Guidance (e.g., tutoring, how-to advice, ideation), Seeking Information (e.g., facts, current events), and Writing (e.g., drafting, editing, summarizing). Writing dominates work tasks at 40%, with two-thirds involving modifications to user-provided text. Contrary to prior studies, coding accounts for only 4.2% of messages, and companionship or emotional support is minimal (under 2%).

    A novel “Asking, Doing, Expressing” rubric classifies intents: Asking (49%, seeking info/advice for decisions), Doing (40%, task performance like writing/code), and Expressing (11%, sharing views). At work, Doing rises to 56%, emphasizing generative AI’s output capabilities. Mapping to O*NET work activities, 58% involve information handling and decision-making, consistent across occupations, underscoring ChatGPT’s role in knowledge-intensive jobs.

    Demographics reveal early male dominance (80%) narrowing to near parity by 2025. Users under 26 send nearly half of messages, with growth fastest in low- and middle-income countries. Educated professionals in high-paid roles use it more for work, aligning with economic value from decision support.

    The study used LLM classifiers validated against public datasets, ensuring privacy—no humans viewed messages. Appendices detail prompts, validation (high agreement on key tasks), and a ChatGPT timeline, including models like GPT-5.

    Overall, the paper argues ChatGPT enhances productivity via advice in problem-solving, especially for knowledge workers, while non-work uses suggest vast consumer surplus. As AI evolves, understanding these patterns informs its societal and economic impacts.

    Source

  • Google Gemini 3 Flash Spotted on LM Arena as “Oceanstone” – Secret Pre-Release Testing Underway?

    In a development that’s sending ripples through the AI community, Google’s highly anticipated Gemini 3 Flash appears to have been quietly deployed on the popular LMSYS Chatbot Arena (LM Arena) under the codename “oceanstone.” The stealth release, first highlighted in social media discussions on September 15, suggests Google is conducting rigorous pre-launch testing for what could be its next-generation lightweight language model. While not officially confirmed by Google DeepMind, early indicators point to impressive performance, positioning “oceanstone” as a potential frontrunner in efficiency and speed.

    The buzz ignited with a viral X (formerly Twitter) post from AI engineer Mark Kretschmann (@mark_k), who on September 15 announced: “Google Gemini 3 Flash was secretly released on LM Arena as codename ‘oceanstone’ 🤫.” The post quickly garnered over 1,200 likes and 50 reposts, sparking widespread speculation. Kretschmann, known for his insights into AI benchmarks, didn’t provide screenshots but referenced the model’s appearance on the arena’s leaderboard, where users anonymously battle AI models in blind comparisons to generate Elo ratings based on human preferences.

    Subsequent posts amplified the news. Kol Tregaskes (@koltregaskes) shared a screenshot of the LM Arena interface showing “oceanstone” in the rankings, questioning if it’s Gemini 3 Flash or a new Gemma variant. An anonymous internal source, cited in a thread by @synthwavedd, described “oceanstone” as a “3.0 S-sized model” – implying it’s in the same compact size class as the current Gemini 2.5 Flash, optimized for low-latency tasks like agentic workflows and multimodal processing. This aligns with Google’s pattern of using codenames for testing; for instance, the recent Gemini 2.5 Flash Image was tested as “nano-banana” before its August 2025 public reveal, where it dominated image generation leaderboards with a record 171-point Elo lead.

    LM Arena, a crowdsourced platform with millions of user votes, is a key testing ground for AI models. “Oceanstone” reportedly debuted late on September 15, climbing ranks rapidly in categories like coding, reasoning, and general chat. Early user feedback on X praises its speed and coherence, with one developer noting it outperforms Gemini 2.5 Flash in quick-response scenarios without sacrificing quality. Turkish AI researcher Mehmet Eren Dikmen (@ErenAILab) echoed the excitement: “Gemini 3.0 Flash modeli Oceanstone adı altında LmArena’da deneniyor. Sonunda bu uzamış araya bir son veriyoruz.” (Translation: “Finally, we’re ending this long wait – news is picking up!”)

    This isn’t Google’s first rodeo with secret arena drops. Past examples include “nightwhisper” and “dayhush” for unreleased Gemini iterations, as discussed in Reddit’s r/Bard community back in April. The timing is intriguing: It follows a flurry of Google AI announcements, including Veo 3 video generation in early September and Gemma 3’s March release. With competitors like OpenAI’s GPT-5 and Anthropic’s Claude 3.7 pushing boundaries, Gemini 3 Flash could emphasize “thinking” capabilities – Google’s hybrid reasoning mode that balances cost, latency, and accuracy.

    Google has yet to comment, but developers can access similar previews via the Gemini API in AI Studio. Artificial Intelligence news account @cloudbooklet urged: “New Arena Model Alert! A stealth entry just dropped: oceanstone 💎✨ Is this Gemini 3 Flash or a brand-new Gemma variant?” Community guesses lean toward Gemini 3, given the “Flash” branding for fast models.

    As testing continues, “oceanstone” could reshape the lightweight AI landscape. Stay tuned – if history repeats, an official unveiling might follow soon, potentially integrating with Vertex AI for enterprise use. For now, AI enthusiasts are flocking to LM Arena to vote and probe its limits.

  • Google Releases VaultGemma: The Largest Differentially Private AI Model

    Google Research unveiled VaultGemma, a groundbreaking 1-billion-parameter language model, marking it as the largest open-source AI model trained from scratch with differential privacy (DP). This release, detailed in a blog post by Amer Sinha and Ryan McKenna, represents a significant milestone in building AI systems that prioritize user privacy while maintaining high utility. VaultGemma’s weights are now available on Hugging Face and Kaggle, accompanied by a technical report to foster further innovation in privacy-centric AI development.

    Differential privacy, a cornerstone of VaultGemma’s design, ensures robust protection of training data by injecting calibrated noise to prevent memorization. This approach guarantees that the model cannot reproduce sensitive information from its training dataset, offering a formal privacy guarantee at the sequence level (ε ≤ 2.0, δ ≤ 1.1e-10). In practical terms, this means that if a fact appears in only one training sequence, VaultGemma essentially “forgets” it, ensuring responses are statistically indistinguishable from a model untrained on that sequence. However, DP introduces trade-offs, including reduced training stability and increased computational costs, which Google’s new research addresses.

    The accompanying study, “Scaling Laws for Differentially Private Language Models,” conducted with Google DeepMind, provides a comprehensive framework for understanding these trade-offs. The research introduces DP scaling laws that model the interplay between compute, privacy, and data budgets. A key metric, the “noise-batch ratio,” compares the amount of privacy-preserving noise to batch size, simplifying the complex dynamics of DP training. Through extensive experiments, the team found that larger batch sizes are critical for DP models, unlike non-private training, where smaller models with larger batches often outperform larger models with smaller batches. These insights guide practitioners in optimizing training configurations for specific privacy and compute constraints.

    VaultGemma, built on the responsible and safe foundation of the Gemma 2 model, leverages these scaling laws to achieve compute-optimal training at scale. The team addressed challenges like Poisson sampling in DP-SGD (Stochastic Gradient Descent) by adopting scalable techniques that maintain fixed-size batches while preserving strong privacy guarantees. Performance tests show VaultGemma’s utility is comparable to non-private models from five years ago, such as GPT-2 (1.5B parameters), across benchmarks like HellaSwag, BoolQ, and TriviaQA. While a utility gap persists compared to non-DP models, Google’s research lays a roadmap to close it through advanced mechanism design.

    Empirical tests confirm VaultGemma’s privacy efficacy, showing no detectable memorization when prompted with training data prefixes. This release empowers the AI community to build safer, privacy-first models, with Google’s open-source approach fostering collaboration. The project acknowledges contributions from the Gemma and Google Privacy teams, including experts like Peter Kairouz and Brendan McMahan. As AI integrates deeper into daily life, VaultGemma stands as a pivotal step toward powerful, privacy-by-design AI, with potential to shape the future of responsible innovation.

  • Fellou CE (Concept Edition): The Agentic Browser Redefines Web Interaction (executes tasks, automates workflows, and conducts deep research on behalf of users)

    On August 11, 2025, Fellou, a Silicon Valley-based startup, announced the upcoming launch of Fellou CE (Concept Edition), the world’s first agentic AI browser, set to transform how users interact with the internet. Unlike traditional browsers like Chrome or Safari, Fellou doesn’t just display web content—it actively executes tasks, automates workflows, and conducts deep research on behalf of users. With over 1 million users since its 2025 debut, Fellou is redefining browsing as a proactive, AI-driven experience, positioning itself as a digital partner for professionals, researchers, and creators.

    Fellou’s standout feature, Deep Action, enables the browser to interpret natural language commands and perform complex, multi-step tasks autonomously. For example, users can instruct Fellou to “find the cheapest flights from New York to London and book them” or “draft a LinkedIn article on AI trends.” The browser navigates websites, fills forms, and completes actions without user intervention, leveraging its Eko framework to integrate with platforms like GitHub, LinkedIn, and Notion. This capability, tested successfully in creating private GitHub repositories in under three minutes, showcases Fellou’s ability to handle real-world tasks efficiently.

    The browser’s Deep Search feature conducts parallel searches across public and login-required platforms like X, Reddit, and Quora, generating comprehensive, traceable reports in minutes. For instance, a market analyst can request a report on 2025 EdTech startups, and Fellou will compile funding details, investor data, and market trends from multiple sources, saving hours of manual research. Its Agentic Memory learns from user behavior, refining suggestions and streamlining tasks over time. This adaptive intelligence, combined with a shadow workspace that runs tasks in the background, ensures users can multitask without disruption.

    Fellou prioritizes privacy, processing data locally with AES-256 encryption and deleting cloud-processed data post-task. Its Agent Studio, a marketplace for custom AI agents, fosters a developer ecosystem where users can create or access tailored workflows using natural language. Currently available for Windows and macOS (with Linux and mobile versions in development), Fellou operates a freemium model, offering free access during its Early Adopter Program and planned premium tiers for advanced features.

    Posts on X highlight enthusiasm for Fellou’s potential to “make Chrome look ancient,” with users praising its hands-free automation and report quality. However, its beta phase may involve bugs, and advanced commands require a learning curve. Compared to rivals like Perplexity’s Comet, Fellou’s 5.2x faster task completion (3.7 minutes vs. 11–18 minutes) and context-aware automation set it apart. Co-founded by Yang Xie, a 2021 Forbes U30 Asia honoree, Fellou is poised to lead the agentic browser revolution, empowering users to focus on creativity while AI handles the web’s grunt work.

  • GitHub CEO Thomas Dohmke: “Embrace AI or Leave the Profession”. A clear warning that AI is reshaping software development

    GitHub CEO Thomas Dohmke has issued a strong warning to software developers: they must embrace artificial intelligence (AI) or leave the profession. His message reflects how AI is reshaping software development, transforming developers from traditional coders into “AI managers” or “creative directors of code” who guide, prompt, and review AI-generated code rather than manually writing every line themselves.

    Dohmke’s stance is based on an in-depth study by GitHub involving 22 developers who already extensively use AI tools. He predicts that AI could write up to 90% of all code within the next two to five years, making AI proficiency essential for career survival in software engineering. Developers who adapt are shifting to higher-level roles involving system architecture, critical review of AI output, quality control, and prompt engineering. Those who resist this transformation risk becoming obsolete or forced to leave the field.

    • Next 5 years: AI tools may automate 90% of coding
    • By 2030: 90% automation predicted, with developers urged to upskill amid ethical and competitive challenges

    This evolution entails a fundamental reinvention of the developer role: from manual coding to managing AI systems and focusing on complex design and problem-solving tasks. Dohmke emphasizes that developers should not see AI as a threat but as a collaborative partner that enhances productivity and creativity.

    GitHub’s CEO frames AI adoption not merely as a technological shift but as a critical career imperative, urging the developer community to embrace AI-driven workflows or face obsolescence.

  • Apple’s LLM Technology Boosts Prediction Speed. What is “multi-token prediction” (MTP) framework?

    Apple’s innovation in large language models centers on a “multi-token prediction” (MTP) framework, which enables models to predict multiple tokens simultaneously rather than generating text one token at a time as in traditional autoregressive models. This approach improves inference speed significantly, with reported speedups of 2–3× on general tasks and up to 5× in more predictable domains like coding and math, while maintaining output quality.

    The core of Apple’s MTP framework involves inserting special “mask” tokens into the input prompts. These placeholders allow the model to speculate on several upcoming tokens at once. Each predicted token sequence is then immediately verified against what standard sequential decoding would produce, reverting to single-token prediction if needed to ensure accuracy. This leads to faster text generation without degrading quality, thanks to techniques such as a “gated LoRA adaptation” that balances speculation and verification.

    In training, Apple’s method augments input sequences by appending multiple mask tokens corresponding to future tokens to be predicted. The model learns to output these future tokens jointly while preserving its ability to predict the next token normally. This involves a carefully designed attention mechanism that supports parallel prediction while maintaining autoregressive properties. The training process parallelizes what would otherwise be sequential queries, improving training efficiency and improving the model’s ability to “think ahead” beyond the immediate next token.

    This innovation addresses the inherent bottleneck in traditional autoregressive models, which generate text sequentially, limiting speed and efficiency. By enabling multi-token simultaneous prediction, Apple’s research unlocks latent multi-token knowledge implicitly present in autoregressive models, essentially teaching them to anticipate multiple future words at once, much like human language planning.

    Overall, Apple’s multi-token prediction framework represents a significant advancement in AI language model inference, promising faster, more efficient generation without sacrificing accuracy—key for real-world applications like chatbots and coding assistants.

  • OpenAI gives $1M+ bonuses to 1,000 employees amid talent war

    OpenAI gave special multimillion-dollar bonuses exceeding $1 million to about 1,000 employees on August 7, 2025, as part of its strategy amid intense competition for AI talent. This move came just hours after launching a major product, reflecting the high stakes in the ongoing talent war to secure and retain top AI researchers and engineers.

    In the broader context, this talent war in AI includes massive compensation packages from leading AI and tech companies like Google DeepMind, Meta, and Microsoft, with top researchers receiving offers that can reach tens of millions of dollars annually. OpenAI’s bonuses and compensation packages form part of this competitive landscape, where retaining specialized AI talent is critical due to their immense impact on innovation and company success.

    The median total compensation for OpenAI engineers ranges widely, with some senior engineers earning in excess of $1 million annually, and top researchers receiving over $10 million per year when including stock and bonuses. The $1M+ bonuses to roughly 1,000 employees signify a large-scale, strategic investment by OpenAI to maintain its leadership and workforce stability amid fierce recruiting battles in AI development.

    These large bonuses are a strategic investment by OpenAI reflecting the high stakes in the AI talent war and their transition to a for-profit model allowing more flexible, lucrative employee compensation.

  • Microsoft Word can now read you document overviews like podcasts

    Microsoft Word, integrated with Microsoft 365 Copilot, now offers a feature that can generate audio overviews of documents that you can listen to like podcasts. This tool produces smart, summarized narrations of Word documents, PDFs, or Teams meeting recordings stored in OneDrive. Users can customize the listening experience with playback controls such as speed adjustment, jumping forward/backward, pausing, and saving the audio to OneDrive for later or sharing.

    There are two styles available for the audio overviews:

    • Summary Style: A single AI voice provides a clear, quick summary of the main points.
    • Podcast Style: Two AI voices (male and female, with neutral American accents) engage in a conversational discussion about the document’s content, creating a dynamic, story-like podcast feel.

    This feature is currently available only in English and requires a Microsoft 365 Copilot license. It works on documents stored online in OneDrive or SharePoint but doesn’t support local files. Generation time is typically a few minutes, even for large documents.

    To use it, open a document in Word on Windows or the web, click the Copilot button on the Home tab, and ask the AI to generate an audio overview. The resulting audio has a media player embedded with controls, and you can switch between summary and podcast styles.

    This audio overview feature enhances productivity by allowing users to absorb key document insights hands-free, useful for multitasking or on the move.