Category: AI Related

  • OpenAI is releasing GPT-5, its new flagship model, to all of its ChatGPT users and developers

    GPT-5 has officially been released as of early August 2025 and is now available to all ChatGPT users, including free, Plus, Pro, and Team accounts. OpenAI announced GPT-5 on August 7, 2025, marking it as their most advanced AI model to date, with significant improvements in intelligence, speed, reasoning, and safety over GPT-4 and earlier versions.

    Here is the key details about GPT-5 :

    • Expert-level intelligence across domains such as math, coding, science, and health.
    • Reduced hallucinations and improved truthful, honest answers.
    • A reasoning model (“GPT-5 thinking”) for harder problems, automatically invoked or selectable by users.
    • Real-time model routing for efficiency and quality of responses.
    • Enhanced capabilities for creative writing and complex software generation.
    • Integrated safety mechanisms including safe completions to balance helpfulness with risk.
    • Accessibility to all ChatGPT users, including free tier, Plus, Pro, and Teams, with extended capabilities for paid users.
    • Availability in developer tools like GitHub Copilot and Azure AI.

    GPT-5 essentially replaces previous ChatGPT models and represents a significant upgrade in real-world use, combining speed, accuracy, and safety for a wide range of users and applications.

  • Are Google’s AI Features a Threat to Web Traffic?Claims undermining SEO strategies and threatening online journalism

    Google Search chief Liz Reid defended Google’s AI features in a blog post by stating that total organic click volume from Google Search to websites remains “relatively stable” year-over-year, and claimed that AI is driving more searches and higher quality clicks to websites. She argued that AI Overviews are a natural evolution of previous Google search features and emphasized the increase in overall search queries and stable or slightly improved click quality, defined as clicks where users do not quickly return to search results.

    However, this defense contrasts with multiple independent studies and reports that show significant reductions in website traffic due to Google’s AI Overviews. Research by the Pew Research Center and others indicates that AI summaries reduce click-through rates by nearly half in some cases, decreasing external site visits from about 15% to around 8% on searches with AI Overviews. Studies from SEO firms such as Ahrefs, Amsive, and BrightEdge find click rate declines ranging from roughly 30% to over 50% depending on the query type, especially for informational, non-branded keywords. The rise of “zero-click” searches—where users get answers directly from AI summaries without visiting any site—has been noted as a major factor, with estimates that around 60% of Google searches now fall into this category. This trend has caused concern among publishers and SEO experts who report significant traffic drops and threats to online content monetization.

    Google disputes the methodologies and conclusions of these external studies, arguing that their internal data shows overall stable or slightly improved click volumes and that some sites are indeed losing traffic while others gain. However, Google has not publicly released detailed data to substantiate all of these claims, leading to ongoing debate about the true impact of AI features on web traffic.

    While Liz Reid asserts that total organic clicks remain stable year-over-year despite AI integration, independent research and publisher reports overwhelmingly show that Google’s AI features—particularly AI Overviews—have caused significant reductions in website traffic and click-through rates, especially for informational and non-branded queries.

  • OpenAI GPT OSS, the new open-source model designed for efficient on-device use and local inference

    OpenAI has released an open-weight model called gpt-oss-20b, a medium-sized model with about 21 billion parameters designed for efficient on-device use and local inference. It operates with a Mixture-of-Experts (MoE) architecture, having 32 experts but activating 4 per token, resulting in 3.6 billion active parameters during each forward pass. This design grants strong reasoning and tool-use capabilities with relatively low memory requirements — it can run on systems with as little as 16GB of RAM. The model supports up to 128k tokens of context length, enabling it to handle very long inputs.

    “gpt-oss-20b” achieves performance comparable to OpenAI’s o3-mini model across common benchmarks, including reasoning, coding, and function calling tasks. It leverages modern architectural features such as Pre-LayerNorm for training stability, Gated SwiGLU activations, and Grouped Query Attention for faster inference. This model is intended to provide strong real-world performance while being accessible for consumer hardware deployments. Both gpt-oss-20b and the larger gpt-oss-120b (117B parameters) models are released under the Apache 2.0 license, aiming to foster transparency, accessibility, and efficient usage by developers and researchers.

    In summary:

    • Parameters: ~21 billion total, 3.6 billion active per token
    • Experts: 32 total, 4 active per token (Mixture-of-Experts)
    • Context length: 128k tokens
    • Runs with as little as 16GB memory
    • Performance matches o3-mini benchmarks, strong at coding, reasoning, few-shot function calling
    • Released open-weight under Apache 2.0 license for broad developer access

    This model is a step toward more accessible powerful reasoning AI that can run efficiently on local or edge devices. Follow the link

  • Google DeepMind just launched Genie 3 that can generate detailed, interactive 3D environments from a simple text prompt or image

    Google DeepMind has just launched Genie 3, an advanced AI “world model” that can generate detailed, interactive 3D environments from a simple text prompt or image. Unlike its predecessor Genie 2, Genie 3 allows real-time exploration and modification of these worlds. Users can change objects, weather, or add characters dynamically—referred to as “promptable world events.” The environments maintain visual consistency over time, remembering the placement of objects for up to about a minute, and run at 720p resolution and 24 frames per second.

    Genie 3 is positioned as a significant step toward artificial general intelligence (AGI) by providing complex, realistic interactive worlds that can train AI agents. This model does not rely on hard-coded physics but learns how the world works by remembering and reasoning about what it generates. It supports longer interactions than Genie 2—several minutes versus just 10-20 seconds—and enables AI agents and humans to move around and interact in these simulated worlds in real time.

    Google DeepMind is currently releasing Genie 3 as a limited research preview to select academics and creators to study its risks and safety before wider access. It is not yet publicly available for general use. It is a breakthrough world model that creates immersive, interactive 3D environments useful both for gaming-type experiences and advancing AI research toward human-level intelligence.

    Genie 3’s key technical differences that enable it to modify worlds dynamically on the fly include several innovations over previous models:

    1. Frame-by-frame Real-time Generation at 24 FPS and 720p resolution: Genie 3 generates the environment live and continuously, allowing seamless, game-like interaction that feels immediate and natural.
    2. Persistent World Memory: The model retains a “long-term visual memory” of the environment for several minutes, enabling the world to keep consistent state and the effects of user actions (e.g., painted walls stay painted even after moving away and returning) without re-generating from scratch.
    3. Promptable World Events: Genie 3 supports dynamic insertion and alteration of elements in the generated world during real-time interaction via text prompts—for example, adding characters, changing weather, or introducing new objects on the fly. This is a major advancement over earlier systems that required pre-generated or less flexible environments.
    4. More Sophisticated Physical and Ecological Modeling: The system models environments with realistic physical behaviors like water flow, lighting changes, and ecological dynamics, allowing more natural interactions and consistent environment evolution.
    5. Real-time Response to User Actions: Unlike Genie 2, which processed user inputs with lag and limited real-time interaction, Genie 3 swiftly integrates user controls and environmental modifications frame by frame, resulting in highly responsive navigation and modification capabilities.
    6. Underlying Architecture Improvements: While details are proprietary, Genie 3 leverages advances from over a decade of DeepMind’s research in simulated environments and world models, emphasizing multi-layered memory systems and inference mechanisms to maintain coherence and enable prompt-grounded modification of the simulation in real time.

    Together, these technologies allow Genie 3 to generate, sustain, and modify richly detailed simulated worlds interactively, making it suitable for both immersive gaming experiences and as a robust platform for training advanced AI agents in complex, dynamic scenarios.

  • Apple’s “Answers, Knowledge and Information” (AKI) team is developing a stripped down ChatGPT experience

    Apple has formed a new internal team called “Answers, Knowledge and Information” (AKI) that is developing a stripped-down ChatGPT-like AI experience. This team is building an “answer engine” designed to crawl the web and respond to general-knowledge questions, effectively creating a lightweight competitor to ChatGPT. The goal is to integrate this AI-powered search capability into Apple’s products such as Siri, Spotlight, and Safari, and also potentially as a standalone app.

    This marks a shift from Apple’s previous approach, where Siri relied on third-party AI like ChatGPT via partnerships, resulting in a somewhat fragmented user experience. The new Apple-built system aims to deliver more direct, accurate answers rather than defaulting frequently to Google Search results, improving usability especially on devices without screens like HomePod. The team, led by senior director Robby Walker, is actively hiring engineers experienced in search algorithms and engine development to accelerate this project.

    Apple CEO Tim Cook has emphasized the importance of AI, considering it a major technological revolution comparable to the internet and smartphones, and is backing the investment in this AI initiative accordingly. While the project is still in early stages, it represents Apple’s growing commitment to developing its own conversational AI and search capabilities rather than relying heavily on external partnerships.

    Apple’s “Answers” team is creating a streamlined ChatGPT rival focused on delivering web-based, AI-driven answers within Apple’s ecosystem, intending to enhance Siri and other services with conversational AI search.

  • Character.AI Launches World’s First AI-Native Social Feed

    Character.AI has launched the world’s first AI-native social feed, a dynamic and interactive platform integrated into its mobile app. Unlike traditional social media, this feed centers on AI-generated characters, scenes, streams, and user-created content that can be engaged with, remixed, and expanded by users. The feed offers chat snippets, AI-generated videos, character profiles, and live debates between characters, creating a collaborative storytelling playground where the boundary between creator and consumer disappears.

    Users can interact by continuing storylines, rewriting narratives, inserting themselves into adventures, or remixing existing content with a single tap. The platform includes multimodal tools like Chat Snippets (for sharing conversation excerpts), Character Cards, Streams (live debates or vlogs), Avatar FX (video creation from images and scripts), and AI-generated backgrounds to enrich storytelling.

    According to Character.AI’s CEO Karandeep Anand, this social feed marks a significant shift from passive content consumption to active creation, effectively replacing “doomscrolling” with creative engagement. It transforms Character.AI from a one-on-one chat app into a full-fledged AI-powered social entertainment platform, enabling users to co-create and explore countless narrative possibilities.

    This innovation allows for a new kind of social media experience that blends AI-driven storytelling with user participation, fostering a unique ecosystem of interactive content creation among Character.AI’s 20 million users and beyond.

  • Google DeepMind and Kaggle launch AI chess tournament to evaluate models’ reasoning skills

    Google and Kaggle are hosting an AI chess tournament from August 5-7, 2025, to evaluate the reasoning skills of top AI models, including OpenAI’s o3 and o4-mini, Google’s Gemini 2.5 Pro and Flash, Anthropic’s Claude Opus 4, and xAI’s Grok 4.

    Organized with Google DeepMind, Chess.com, and chess streamers Levy Rozman and Hikaru Nakamura, the event will be livestreamed on Kaggle.com, featuring a single-elimination bracket with best-of-four matches. The Kaggle Game Arena aims to benchmark AI models’ strategic thinking in games like chess, Go, and Werewolf, testing skills like reasoning, memory, and adaptation.

    Models will use text-based inputs without external tools, facing a 60-minute move limit and penalties for illegal moves. A comprehensive leaderboard will rank models based on additional non-livestreamed games, with future tournaments planned to include more complex games and simulations.

    This matters because it represents a fundamental shift in AI evaluation from static tests to dynamic competition, providing transparent insights into how leading AI models reason and strategize. The platform could reshape how we measure and understand artificial intelligence capabilities.

    You can follow up the tournament here

  • NVIDIA dropped a paper arguing that Small Language Models (SLMs) are the real future of agentic AI

    Forget everything you thought you knew about AI agents running on massive LLMs. A bombshell new paper from NVIDIA Research“Small Language Models are the Future of Agentic AI” — is flipping the script on how we think about deploying intelligent agents at scale.

    You don’t need GPT-5 to run most AI agents. You need a fleet of tiny, fast, specialized SLMs. Let’s unpack what this means, why it matters, and how it could reshape the entire AI economy.

    The Big Idea in One Sentence: Small Language Models (SLMs) aren’t just good enough for AI agents — they’re better. And economically, operationally, and environmentally, they’re the inevitable future. While everyone’s chasing bigger, flashier LLMs, NVIDIA is arguing that for agentic workflows — where AI systems perform repetitive, narrow, tool-driven tasks — smaller is smarter.

    What’s an “Agentic AI” Again?

    AI agents aren’t just chatbots. They’re goal-driven systems that plan, use tools (like APIs or code), make decisions, and execute multi-step workflows — think coding assistants, customer service bots, or automated data analysts. Right now, almost all of these agents run on centralized LLM APIs (like GPT-4, Claude, or Llama 3). But here’s the catch: Most agent tasks are not open-ended conversations. They’re structured, predictable, and highly specialized — like parsing a form, generating JSON for an API call, or writing a unit test.

    The question is that: So why use a 70B-parameter brain when a 7B one can do the job — faster, cheaper, and locally?

    Why SLMs Win for Agents (The NVIDIA Case)?

    1. They’re Already Capable Enough : SLMs today are not weak — they’re focused. Modern SLMs punch way above their weight:

    • Phi-3 (7B) performs on par with 70B-class models in code and reasoning.
    • NVIDIA’s Nemotron-H (9B) matches 30B LLMs in instruction following — at 1/10th the FLOPs.
    • DeepSeek-R1-Distill-7B beats GPT-4o and Claude 3.5 on reasoning benchmarks.
    • xLAM-2-8B leads in tool-calling accuracy — critical for agents.

    2. They’re 10–30x Cheaper & Faster to Run ? Running a 7B model vs. a 70B model means:

    • Lower latency (real-time responses)
    • Less energy & compute
    • No need for multi-GPU clusters
    • On-device inference (yes, your laptop or phone)

    With tools like NVIDIA Dynamo and ChatRTX, you can run SLMs locally, offline, with strong data privacy — a game-changer for enterprise and edge use.

    3. They’re More Flexible & Easier to Fine-Tune?  Want to tweak your agent to follow a new API spec or output format? With SLMs:

    • You can fine-tune in hours, not weeks.
    • Use LoRA/QLoRA for low-cost adaptation.
    • Build specialized experts for each task (e.g., one SLM for JSON, one for code, one for summaries).

    This is the “Lego approach” to AI: modular, composable, and scalable — not monolithic.

    But Aren’t LLMs Smarter? The Great Debate?  Agents don’t need generalists — they need specialists.

    • Agents already break down complex tasks into small steps.
    • The LLM is often heavily prompted and constrained — basically forced to act like a narrow tool.
    • So why not just train an SLM to do that one thing perfectly?

    And when you do need broad reasoning? Use a heterogeneous agent system:

    • Default to SLMs for routine tasks.
    • Call an LLM only when needed (e.g., for creative planning or open-domain Q&A).

    This hybrid model is cheaper, faster, and more sustainable.

    So Why Aren’t We Using SLMs Already?

    1. Massive investment in LLM infrastructure — $57B poured into cloud AI in 2024 alone.
    2. Benchmarks favor generalist LLMs — we’re measuring the wrong things.
    3. Marketing hype — SLMs don’t get the headlines, even when they outperform.

    But these are inertia problems, not technical ones. And they’re solvable.

    How to Migrate from LLMs to SLMs: The 6-Step Algorithm?

    NVIDIA even gives us a practical roadmap:

    1. Log all agent LLM calls (inputs, outputs, tool usage).
    2. Clean & anonymize the data (remove PII, sensitive info).
    3. Cluster requests to find common patterns (e.g., “generate SQL”, “summarize email”).
    4. Pick the right SLM for each task (Phi-3, SmolLM2, Nemotron, etc.).
    5. Fine-tune each SLM on its specialized dataset (use LoRA for speed).
    6. Deploy & iterate — keep improving with new data.

    This creates a continuous optimization loop — your agent gets smarter and cheaper over time.

    Real-World Impact: Up to 70% of LLM Calls Could Be Replaced

    In case studies on popular open-source agents:

    • MetaGPT (software dev agent): 60% of LLM calls replaceable
    • Open Operator (workflow automation): 40%
    • Cradle (GUI control agent): 70%

    That’s huge cost savings — and a massive reduction in AI’s carbon footprint.

    The Bigger Picture: Sustainable, Democratized AI

    This isn’t just about cost. It’s about:

    • Democratization: Smaller teams can train and deploy their own agent models.
    • Privacy: Run agents on-device, no data sent to the cloud.
    • Sustainability: Less compute = less energy = greener AI.

    Final Thoughts: The LLM Era is Ending. The SLM Agent Era is Just Beginning.

    We’ve spent years scaling up — bigger models, more parameters, more GPUs.Now, it’s time to scale out: modular, efficient, specialized SLMs working together in intelligent agent systems. NVIDIA isn’t just making a technical argument — they’re calling for a paradigm shift. And if they’re right, the future of AI won’t be in the cloud. It’ll be on your device, running silently in the background, doing its job — fast, cheap, and smart.

  • Amazon always Bee listening! Amazon acquires AI wearable startup Bee to boost personal assistant technology

    Amazon has agreed to acquire Bee, a San Francisco-based startup that developed an AI-powered wearable device resembling a $50 wristband. This device continuously listens to the wearer’s conversations and surroundings, transcribing audio to provide personalized summaries, to-do lists, reminders, and suggestions through an associated app. Bee’s technology can integrate user data such as contacts, email, calendar, photos, and location to build a searchable log of daily interactions, enhancing its AI-driven insights. The acquisition, announced in July 2025 but not yet finalized, will see Bee’s team join Amazon to integrate this wearable AI technology with Amazon’s broader AI efforts, including personal assistant functionalities.

    The AI wristband uses built-in microphones and AI models to automatically transcribe conversations unless manually muted. While the device’s accuracy can sometimes be affected by ambient sounds or media, Amazon emphasized its commitment to user privacy and control, intending to apply its established privacy standards to Bee’s technology. Bee claims it does not store raw audio recordings and uses high security standards, with ongoing tests of on-device AI models to enhance privacy.

    This acquisition complements Amazon’s previous ventures into wearable tech, such as the discontinued Halo health band and its Echo smart glasses with Alexa integration. Bee represents a cost-accessible entry into AI wearables with continuous ambient intelligence, enabling Amazon to expand in this competitive market segment, which includes other companies like OpenAI and Meta developing AI assistants and wearables.

    The financial terms of the deal have not been disclosed. Bee was founded in 2022, raised $7 million in funding, and is led by CEO Maria de Lourdes Zollo. Bee’s vision is to create personal AI that evolves with users to enrich their lives. Amazon plans to work with Bee’s team for future innovation in AI wearables post-acquisition.

  • Google MLE-STAR, A state-of-the-art machine learning engineering agent

    MLE-STAR is a state-of-the-art machine learning engineering agent developed by Google Cloud that automates various ML tasks across diverse data types, achieving top performance in competitions like Kaggle. Unlike previous ML engineering agents that rely heavily on pre-trained language model knowledge and tend to make broad code modifications at once, MLE-STAR uniquely integrates web search to retrieve up-to-date, effective models and then uses targeted code block refinement to iteratively improve specific components of the ML pipeline. It performs ablation studies to identify the most impactful code parts and refines them with careful exploration.

    Here is the Key advantages of MLE-STAR include:

    • Use of web search to find recent and competitive models (such as EfficientNet and ViT), avoiding outdated or overused choices.
    • Component-wise focused improvement rather than wholesale code changes, enabling deeper exploration of feature engineering, model selection, and ensembling.
    • A novel ensembling method that combines multiple solutions into a superior single ensemble rather than simple majority voting.
    • Built-in data leakage and data usage checkers that detect unrealistic data processing strategies or neglected data sources, refining the generated code accordingly.
    • The framework won medals in 63% of MLE-Bench-Lite Kaggle competitions with 36% being gold medals.

    MLE-STAR lowers the barrier to ML adoption by automating complex workflows and continuously improving through web-based retrieval of state-of-the-art methods, ensuring adaptability as ML advances. Its open-source code is available for researchers and developers to accelerate machine learning projects.

    This innovation marks a shift toward more intelligent, web-augmented ML engineering agents that can deeply and iteratively refine models for better results.