• Google DeepMind just launched Genie 3 that can generate detailed, interactive 3D environments from a simple text prompt or image

    Google DeepMind has just launched Genie 3, an advanced AI “world model” that can generate detailed, interactive 3D environments from a simple text prompt or image. Unlike its predecessor Genie 2, Genie 3 allows real-time exploration and modification of these worlds. Users can change objects, weather, or add characters dynamically—referred to as “promptable world events.” The environments maintain visual consistency over time, remembering the placement of objects for up to about a minute, and run at 720p resolution and 24 frames per second.

    Genie 3 is positioned as a significant step toward artificial general intelligence (AGI) by providing complex, realistic interactive worlds that can train AI agents. This model does not rely on hard-coded physics but learns how the world works by remembering and reasoning about what it generates. It supports longer interactions than Genie 2—several minutes versus just 10-20 seconds—and enables AI agents and humans to move around and interact in these simulated worlds in real time.

    Google DeepMind is currently releasing Genie 3 as a limited research preview to select academics and creators to study its risks and safety before wider access. It is not yet publicly available for general use. It is a breakthrough world model that creates immersive, interactive 3D environments useful both for gaming-type experiences and advancing AI research toward human-level intelligence.

    Genie 3’s key technical differences that enable it to modify worlds dynamically on the fly include several innovations over previous models:

    1. Frame-by-frame Real-time Generation at 24 FPS and 720p resolution: Genie 3 generates the environment live and continuously, allowing seamless, game-like interaction that feels immediate and natural.
    2. Persistent World Memory: The model retains a “long-term visual memory” of the environment for several minutes, enabling the world to keep consistent state and the effects of user actions (e.g., painted walls stay painted even after moving away and returning) without re-generating from scratch.
    3. Promptable World Events: Genie 3 supports dynamic insertion and alteration of elements in the generated world during real-time interaction via text prompts—for example, adding characters, changing weather, or introducing new objects on the fly. This is a major advancement over earlier systems that required pre-generated or less flexible environments.
    4. More Sophisticated Physical and Ecological Modeling: The system models environments with realistic physical behaviors like water flow, lighting changes, and ecological dynamics, allowing more natural interactions and consistent environment evolution.
    5. Real-time Response to User Actions: Unlike Genie 2, which processed user inputs with lag and limited real-time interaction, Genie 3 swiftly integrates user controls and environmental modifications frame by frame, resulting in highly responsive navigation and modification capabilities.
    6. Underlying Architecture Improvements: While details are proprietary, Genie 3 leverages advances from over a decade of DeepMind’s research in simulated environments and world models, emphasizing multi-layered memory systems and inference mechanisms to maintain coherence and enable prompt-grounded modification of the simulation in real time.

    Together, these technologies allow Genie 3 to generate, sustain, and modify richly detailed simulated worlds interactively, making it suitable for both immersive gaming experiences and as a robust platform for training advanced AI agents in complex, dynamic scenarios.

  • Apple’s “Answers, Knowledge and Information” (AKI) team is developing a stripped down ChatGPT experience

    Apple has formed a new internal team called “Answers, Knowledge and Information” (AKI) that is developing a stripped-down ChatGPT-like AI experience. This team is building an “answer engine” designed to crawl the web and respond to general-knowledge questions, effectively creating a lightweight competitor to ChatGPT. The goal is to integrate this AI-powered search capability into Apple’s products such as Siri, Spotlight, and Safari, and also potentially as a standalone app.

    This marks a shift from Apple’s previous approach, where Siri relied on third-party AI like ChatGPT via partnerships, resulting in a somewhat fragmented user experience. The new Apple-built system aims to deliver more direct, accurate answers rather than defaulting frequently to Google Search results, improving usability especially on devices without screens like HomePod. The team, led by senior director Robby Walker, is actively hiring engineers experienced in search algorithms and engine development to accelerate this project.

    Apple CEO Tim Cook has emphasized the importance of AI, considering it a major technological revolution comparable to the internet and smartphones, and is backing the investment in this AI initiative accordingly. While the project is still in early stages, it represents Apple’s growing commitment to developing its own conversational AI and search capabilities rather than relying heavily on external partnerships.

    Apple’s “Answers” team is creating a streamlined ChatGPT rival focused on delivering web-based, AI-driven answers within Apple’s ecosystem, intending to enhance Siri and other services with conversational AI search.

  • Character.AI Launches World’s First AI-Native Social Feed

    Character.AI has launched the world’s first AI-native social feed, a dynamic and interactive platform integrated into its mobile app. Unlike traditional social media, this feed centers on AI-generated characters, scenes, streams, and user-created content that can be engaged with, remixed, and expanded by users. The feed offers chat snippets, AI-generated videos, character profiles, and live debates between characters, creating a collaborative storytelling playground where the boundary between creator and consumer disappears.

    Users can interact by continuing storylines, rewriting narratives, inserting themselves into adventures, or remixing existing content with a single tap. The platform includes multimodal tools like Chat Snippets (for sharing conversation excerpts), Character Cards, Streams (live debates or vlogs), Avatar FX (video creation from images and scripts), and AI-generated backgrounds to enrich storytelling.

    According to Character.AI’s CEO Karandeep Anand, this social feed marks a significant shift from passive content consumption to active creation, effectively replacing “doomscrolling” with creative engagement. It transforms Character.AI from a one-on-one chat app into a full-fledged AI-powered social entertainment platform, enabling users to co-create and explore countless narrative possibilities.

    This innovation allows for a new kind of social media experience that blends AI-driven storytelling with user participation, fostering a unique ecosystem of interactive content creation among Character.AI’s 20 million users and beyond.

  • Google DeepMind and Kaggle launch AI chess tournament to evaluate models’ reasoning skills

    Google and Kaggle are hosting an AI chess tournament from August 5-7, 2025, to evaluate the reasoning skills of top AI models, including OpenAI’s o3 and o4-mini, Google’s Gemini 2.5 Pro and Flash, Anthropic’s Claude Opus 4, and xAI’s Grok 4.

    Organized with Google DeepMind, Chess.com, and chess streamers Levy Rozman and Hikaru Nakamura, the event will be livestreamed on Kaggle.com, featuring a single-elimination bracket with best-of-four matches. The Kaggle Game Arena aims to benchmark AI models’ strategic thinking in games like chess, Go, and Werewolf, testing skills like reasoning, memory, and adaptation.

    Models will use text-based inputs without external tools, facing a 60-minute move limit and penalties for illegal moves. A comprehensive leaderboard will rank models based on additional non-livestreamed games, with future tournaments planned to include more complex games and simulations.

    This matters because it represents a fundamental shift in AI evaluation from static tests to dynamic competition, providing transparent insights into how leading AI models reason and strategize. The platform could reshape how we measure and understand artificial intelligence capabilities.

    You can follow up the tournament here

  • NVIDIA dropped a paper arguing that Small Language Models (SLMs) are the real future of agentic AI

    Forget everything you thought you knew about AI agents running on massive LLMs. A bombshell new paper from NVIDIA Research“Small Language Models are the Future of Agentic AI” — is flipping the script on how we think about deploying intelligent agents at scale.

    You don’t need GPT-5 to run most AI agents. You need a fleet of tiny, fast, specialized SLMs. Let’s unpack what this means, why it matters, and how it could reshape the entire AI economy.

    The Big Idea in One Sentence: Small Language Models (SLMs) aren’t just good enough for AI agents — they’re better. And economically, operationally, and environmentally, they’re the inevitable future. While everyone’s chasing bigger, flashier LLMs, NVIDIA is arguing that for agentic workflows — where AI systems perform repetitive, narrow, tool-driven tasks — smaller is smarter.

    What’s an “Agentic AI” Again?

    AI agents aren’t just chatbots. They’re goal-driven systems that plan, use tools (like APIs or code), make decisions, and execute multi-step workflows — think coding assistants, customer service bots, or automated data analysts. Right now, almost all of these agents run on centralized LLM APIs (like GPT-4, Claude, or Llama 3). But here’s the catch: Most agent tasks are not open-ended conversations. They’re structured, predictable, and highly specialized — like parsing a form, generating JSON for an API call, or writing a unit test.

    The question is that: So why use a 70B-parameter brain when a 7B one can do the job — faster, cheaper, and locally?

    Why SLMs Win for Agents (The NVIDIA Case)?

    1. They’re Already Capable Enough : SLMs today are not weak — they’re focused. Modern SLMs punch way above their weight:

    • Phi-3 (7B) performs on par with 70B-class models in code and reasoning.
    • NVIDIA’s Nemotron-H (9B) matches 30B LLMs in instruction following — at 1/10th the FLOPs.
    • DeepSeek-R1-Distill-7B beats GPT-4o and Claude 3.5 on reasoning benchmarks.
    • xLAM-2-8B leads in tool-calling accuracy — critical for agents.

    2. They’re 10–30x Cheaper & Faster to Run ? Running a 7B model vs. a 70B model means:

    • Lower latency (real-time responses)
    • Less energy & compute
    • No need for multi-GPU clusters
    • On-device inference (yes, your laptop or phone)

    With tools like NVIDIA Dynamo and ChatRTX, you can run SLMs locally, offline, with strong data privacy — a game-changer for enterprise and edge use.

    3. They’re More Flexible & Easier to Fine-Tune?  Want to tweak your agent to follow a new API spec or output format? With SLMs:

    • You can fine-tune in hours, not weeks.
    • Use LoRA/QLoRA for low-cost adaptation.
    • Build specialized experts for each task (e.g., one SLM for JSON, one for code, one for summaries).

    This is the “Lego approach” to AI: modular, composable, and scalable — not monolithic.

    But Aren’t LLMs Smarter? The Great Debate?  Agents don’t need generalists — they need specialists.

    • Agents already break down complex tasks into small steps.
    • The LLM is often heavily prompted and constrained — basically forced to act like a narrow tool.
    • So why not just train an SLM to do that one thing perfectly?

    And when you do need broad reasoning? Use a heterogeneous agent system:

    • Default to SLMs for routine tasks.
    • Call an LLM only when needed (e.g., for creative planning or open-domain Q&A).

    This hybrid model is cheaper, faster, and more sustainable.

    So Why Aren’t We Using SLMs Already?

    1. Massive investment in LLM infrastructure — $57B poured into cloud AI in 2024 alone.
    2. Benchmarks favor generalist LLMs — we’re measuring the wrong things.
    3. Marketing hype — SLMs don’t get the headlines, even when they outperform.

    But these are inertia problems, not technical ones. And they’re solvable.

    How to Migrate from LLMs to SLMs: The 6-Step Algorithm?

    NVIDIA even gives us a practical roadmap:

    1. Log all agent LLM calls (inputs, outputs, tool usage).
    2. Clean & anonymize the data (remove PII, sensitive info).
    3. Cluster requests to find common patterns (e.g., “generate SQL”, “summarize email”).
    4. Pick the right SLM for each task (Phi-3, SmolLM2, Nemotron, etc.).
    5. Fine-tune each SLM on its specialized dataset (use LoRA for speed).
    6. Deploy & iterate — keep improving with new data.

    This creates a continuous optimization loop — your agent gets smarter and cheaper over time.

    Real-World Impact: Up to 70% of LLM Calls Could Be Replaced

    In case studies on popular open-source agents:

    • MetaGPT (software dev agent): 60% of LLM calls replaceable
    • Open Operator (workflow automation): 40%
    • Cradle (GUI control agent): 70%

    That’s huge cost savings — and a massive reduction in AI’s carbon footprint.

    The Bigger Picture: Sustainable, Democratized AI

    This isn’t just about cost. It’s about:

    • Democratization: Smaller teams can train and deploy their own agent models.
    • Privacy: Run agents on-device, no data sent to the cloud.
    • Sustainability: Less compute = less energy = greener AI.

    Final Thoughts: The LLM Era is Ending. The SLM Agent Era is Just Beginning.

    We’ve spent years scaling up — bigger models, more parameters, more GPUs.Now, it’s time to scale out: modular, efficient, specialized SLMs working together in intelligent agent systems. NVIDIA isn’t just making a technical argument — they’re calling for a paradigm shift. And if they’re right, the future of AI won’t be in the cloud. It’ll be on your device, running silently in the background, doing its job — fast, cheap, and smart.

  • Amazon always Bee listening! Amazon acquires AI wearable startup Bee to boost personal assistant technology

    Amazon has agreed to acquire Bee, a San Francisco-based startup that developed an AI-powered wearable device resembling a $50 wristband. This device continuously listens to the wearer’s conversations and surroundings, transcribing audio to provide personalized summaries, to-do lists, reminders, and suggestions through an associated app. Bee’s technology can integrate user data such as contacts, email, calendar, photos, and location to build a searchable log of daily interactions, enhancing its AI-driven insights. The acquisition, announced in July 2025 but not yet finalized, will see Bee’s team join Amazon to integrate this wearable AI technology with Amazon’s broader AI efforts, including personal assistant functionalities.

    The AI wristband uses built-in microphones and AI models to automatically transcribe conversations unless manually muted. While the device’s accuracy can sometimes be affected by ambient sounds or media, Amazon emphasized its commitment to user privacy and control, intending to apply its established privacy standards to Bee’s technology. Bee claims it does not store raw audio recordings and uses high security standards, with ongoing tests of on-device AI models to enhance privacy.

    This acquisition complements Amazon’s previous ventures into wearable tech, such as the discontinued Halo health band and its Echo smart glasses with Alexa integration. Bee represents a cost-accessible entry into AI wearables with continuous ambient intelligence, enabling Amazon to expand in this competitive market segment, which includes other companies like OpenAI and Meta developing AI assistants and wearables.

    The financial terms of the deal have not been disclosed. Bee was founded in 2022, raised $7 million in funding, and is led by CEO Maria de Lourdes Zollo. Bee’s vision is to create personal AI that evolves with users to enrich their lives. Amazon plans to work with Bee’s team for future innovation in AI wearables post-acquisition.

  • Google MLE-STAR, A state-of-the-art machine learning engineering agent

    MLE-STAR is a state-of-the-art machine learning engineering agent developed by Google Cloud that automates various ML tasks across diverse data types, achieving top performance in competitions like Kaggle. Unlike previous ML engineering agents that rely heavily on pre-trained language model knowledge and tend to make broad code modifications at once, MLE-STAR uniquely integrates web search to retrieve up-to-date, effective models and then uses targeted code block refinement to iteratively improve specific components of the ML pipeline. It performs ablation studies to identify the most impactful code parts and refines them with careful exploration.

    Here is the Key advantages of MLE-STAR include:

    • Use of web search to find recent and competitive models (such as EfficientNet and ViT), avoiding outdated or overused choices.
    • Component-wise focused improvement rather than wholesale code changes, enabling deeper exploration of feature engineering, model selection, and ensembling.
    • A novel ensembling method that combines multiple solutions into a superior single ensemble rather than simple majority voting.
    • Built-in data leakage and data usage checkers that detect unrealistic data processing strategies or neglected data sources, refining the generated code accordingly.
    • The framework won medals in 63% of MLE-Bench-Lite Kaggle competitions with 36% being gold medals.

    MLE-STAR lowers the barrier to ML adoption by automating complex workflows and continuously improving through web-based retrieval of state-of-the-art methods, ensuring adaptability as ML advances. Its open-source code is available for researchers and developers to accelerate machine learning projects.

    This innovation marks a shift toward more intelligent, web-augmented ML engineering agents that can deeply and iteratively refine models for better results.

  • Interview with Anthropic CEO Dario Amodei: AI’s Potential, OpenAI Rivalry, GenAI Business, Doomerism

    Dario Amodei, CEO of Anthropic, discusses a range of topics concerning artificial intelligence, his company’s strategy, and his personal motivations. He emphasizes that he gets “very angry when people call me a doomer” because he understands the profound benefits of AI, motivated in part by his father’s death from an illness that was later cured, highlighting the urgency of scientific progress. He believes Anthropic has a “duty to warn the world about what’s going to happen” regarding AI’s possible downsides, even while strongly appreciating its positive applications, which he articulated in his essay “Machines of Loving Grace”.

    Amodei’s sense of urgency stems from his belief in the exponential improvement of AI capabilities, which he refers to as “the exponential”. He notes that AI models are rapidly progressing from “barely coherent” to “smart high school student,” then to “smart college student” and “PhD” levels, and are beginning to “apply across the economy”. He sees this exponential growth continuing, despite claims of “diminishing returns from scaling”. He views terms like “AGI” and “super-intelligence” as “totally meaningless” marketing terms that he avoids using.
    Anthropic’s business strategy is a “pure bet on this technology”, specifically focusing on “business use cases of the model” through its API, rather than consumer-facing chatbots or integration into existing tech products like Google or OpenAI. He argues that focusing on business use cases provides “better incentives to make the models better” by aligning improvements with tangible value for enterprises like Pfizer. Coding, for example, became a key use case due to its rapid adoption and its utility in developing subsequent models.

    Financially, Anthropic has demonstrated rapid growth, going from zero to $100 million in revenue in 2023, $100 million to $1 billion in 2024, and $1 billion to “well above four” or $4.5 billion in the first half of 2025, calling it the “fastest growing software company in history” at its scale. Amodei clarifies that while the company may appear unprofitable due to significant investments in training future, more powerful models, each deployed model is actually “fairly profitable”. He also addresses concerns about large language model liabilities like “continual learning,” stating that while models don’t change underlying weights, their “context windows are getting longer,” allowing them to absorb information during interaction, and new techniques are being developed to address this.

    Regarding competition, Anthropic has raised nearly $20 billion and is confident its “data center scaling is not substantially smaller than that of any of the other companies”. Amodei emphasizes “talent density” as their core competitive advantage, noting that many Anthropic employees turn down offers from larger tech companies due to their belief in Anthropic’s mission and its fair, systematic compensation principles. He expresses skepticism about competitors trying to “buy something that cannot be bought,” referring to mission alignment.
    Amodei dismisses the notion that open source AI models pose a significant threat, calling it a “red herring”. He explains that unlike traditional open source software, AI models are “open weights” (not source code), making them hard to inspect and requiring significant inference resources, so the critical factor is a model’s quality, not its openness.

    On a personal level, Amodei’s upbringing in San Francisco instilled an interest in fundamental science, particularly physics and math, rather than the tech boom. His father’s illness and death in 2006 profoundly impacted him, driving him first to biology to address human illnesses, and then to AI, which he saw as the only technology capable of “bridg[ing] that gap” to understand and solve complex biological problems “beyond human scale”. This foundational motivation translates into a “singular obsession with having impact,” focusing on creating “positive sum situations” and bending his career arc towards helping people strategically.
    He left OpenAI, where he was involved in scaling GPT-3, because he realized that the “alignment of AI systems and the capability of AI systems is intertwined”, but that organizational-level decisions, sincere leadership motivations, and company governance were crucial for positive impact, leading him to found Anthropic to “do it our own way”. He vehemently denies claims that he “wants to control the entire industry,” calling it an “outrageous lie”. Instead, he advocates for a “race to the top”, where Anthropic sets an example for the field by publicly releasing responsible scaling policies, interpretability research, and safety measures, encouraging others to follow, thereby ensuring that “everyone wins” by building safer systems.

    Amodei acknowledges the “terrifying situation” where massive capital is accelerating AI development. He continues to speak up about AI’s dangers despite criticism and personal risk to the company, believing that control is feasible as “we’ve gotten better at controlling models with every model that we release”. His warning about risks is not to slow down progress but to “invest in safety techniques and can continue the progress”. He criticizes both “doomers” who claim AI cannot be built safely and “financially invested” parties who dismiss safety concerns or regulation, calling both positions “intellectually and morally unserious”. He believes what is needed is “more thoughtfulness, more honesty, more people willing to go against their interest” to understand the situation and add “light and some insight”.

    Source: https://www.youtube.com/watch?v=mYDSSRS-B5U

  • OpenAI is making a major investment in Norway with its first AI data center in Europe

    OpenAI is making a major investment in Norway with its first AI data center in Europe, called Stargate Norway. This project is a collaboration with British AI infrastructure company Nscale and Norwegian energy firm Aker ASA, forming a 50/50 joint venture. The initial phase will involve about a $1 billion investment to build a facility near Narvik in northern Norway, powered entirely by renewable hydropower.

    The data center will initially have a capacity of 230 MW and install 100,000 Nvidia GPUs by the end of 2026, with ambitions to expand its capacity by an additional 290 MW in future phases, potentially scaling tenfold as demand grows. OpenAI will be a primary customer (“off-taker”) of the compute capacity under its “OpenAI for Countries” program, which aims to increase AI infrastructure sovereignty and accessibility across Europe.

    The project emphasizes sustainability, leveraging Norway’s cool climate, low electricity prices, and abundant renewable energy for efficient and large-scale AI computing. It will provide secure, scalable, and sovereign AI infrastructure for customers across Norway, Northern Europe, and the UK, benefiting startups, researchers, and public/private sectors.

    OpenAI’s Norway investment is a landmark $1 billion+ AI infrastructure project to build a state-of-the-art, renewable-powered data center addressing Europe’s AI compute needs and advancing local AI ecosystem development.

  • Google NotebookLM latest updates

    Google NotebookLM, the AI-powered research and note-taking assistant, has received several significant updates in 2025 centered around enhanced ways to visualize, navigate, and interact with research content:

    • Video Overviews: As of mid-2025, Google rolled out Video Overviews, which generate narrated slide presentations that transform dense documents (notes, PDFs, images) into clear visual summaries. These overviews pull in images, diagrams, quotes, and data from your sources to explain concepts more intuitively. Users can customize focus topics, learning goals, and target audience for more tailored explanations. This offers a visual alternative to the existing Audio Overviews that provide podcast-style summaries. Video Overviews are currently available in English with more languages to come.
    • Interactive Mind Maps: A new Mind Map feature allows users to explore connections between complex topics within their notebooks, helping deepen understanding by visualizing relationships in uploaded materials. For example, this can map related concepts around a research subject like environmental issues.
    • Language and Output Flexibility: Users can now select the output language for AI-generated text, making it easier to generate study guides, briefing documents, and chat responses in various languages.
    • Studio Panel Upgrades: The redesigned Studio panel lets users create and store multiple outputs of the same type (Audio Overviews, Video Overviews, Mind Maps, Reports) within a notebook. It supports multitasking features such as listening to an Audio Overview while exploring a Mind Map simultaneously.
    • Improved User Experience and Multilingual Support: Audio Overviews now support multiple lengths and over 50 languages. Dark mode, conversation style switching, and easier sharing of notebooks have also been introduced.

    These features are broadly available for Google Workspace customers across various tiers, including Business, Education, and Nonprofits, with phased rollout continuing through 2025.