Category: AI Related

  • China’s AI startup Zhipu releases GLM-4.5 and GLM-4.5 Air

    Zhipu AI (also known as Z.ai or 智谱AI) is a leading Chinese AI company specializing in large language models and other artificial intelligence technologies. Originating from Tsinghua University, Zhipu AI has attracted major investment from top Chinese tech firms and international backers. By 2024, it was regarded as one of the “AI Tiger” companies in China and is a significant player in the global AI landscape. The company is known for rapidly developing innovative LLMs, releasing open-source models, and building tools focused on agentic and reasoning capabilities.

    GLM-4.5 and GLM-4.5 Air: Overview

    Both GLM-4.5 and its compact sibling, GLM-4.5 Air, are foundation large language models designed for advanced reasoning, coding, and agentic tasks. They mark Zhipu AI’s push to unify general cognitive capabilities and serve as powerful backbones for intelligent agent applications.

    GLM-4.5

    • Size: 355 billion total parameters, 32 billion active parameters at runtime.

    • Core Features:

      • Hybrid Reasoning: Supports a “thinking mode” for tool use and multi-step reasoning (e.g., solving math, code, and logical problems) and a “non-thinking mode” for instant responses.
      • Agent Readiness: Designed for agent-centric workflows, integrating tool-calling natively for seamless automation and coding.
      • Performance:
        • Ranks in top three across many industry benchmarks, comparable to leading models such as Claude 4 Opus and Gemini 2.5 Pro.
        • Particularly excels in mathematics, coding, data analysis, and scientific reasoning—achieving near or at state-of-the-art results in tests like MMLU Pro and AIME24.
        • Demonstrates a high tool-calling success rate (90.6%) and strong coding benchmark performance.
    • Context Window: 128,000 tokens.
    • Open source: Weights and implementation available for research and commercial use (MIT license condition).

    GLM-4.5 Air

    • Size: 106 billion total parameters, 12 billion active parameters during inference.
    • Design: Lightweight, mixture-of-experts architecture for optimal efficiency and deployment flexibility, including running locally on consumer-grade hardware.
    • Same 128K context window as GLM-4.5.
    • Hybrid Reasoning & Agentic Capabilities:

      • Maintains strong reasoning and tool-use abilities, a hallmark of the GLM-4.5 family.
      • Offers a balance of performance and resource consumption, making it well suited to cost-sensitive and high-throughput applications.
      • On benchmarks, it scores competitively with other industry-leading models while using far fewer compute resources.
    • Use cases: Efficient deployment for enterprise AI assistants, automation, coding support, customer service, and affordable large-scale deployments.

    Performance and Accessibility

    • Competitive Pricing: API costs are among the lowest on the market, reflecting Zhipu AI’s strategy to undercut competitors and democratize access to advanced AI.
    • Open Source Access: Both models are available for free testing and deployment through multiple platforms like Hugging Face, Zhipu AI Open Platform, and third-party APIs.
    • Community and Ecosystem: Zhipu AI encourages developer and research engagement, providing technical blogs, documentation, and standard model APIs.

    In Summary

    • Zhipu AI is a dominant force in China’s rapidly growing AI industry, focusing on high-performance, open-source language models.
    • GLM-4.5 is a very large LLM targeting top-tier reasoning, agentic, and coding abilities.
    • GLM-4.5 Air offers similar power but much higher efficiency for wider, cost-effective deployment.

    These models are part of a new wave of AI technologies enabling more accessible, adaptable, and powerful agentic applications in both research and enterprise settings.

  • OpenAI launches Study Mode in ChatGPT, Study and learn

    ChatGPT Study Mode is a new feature designed to transform the ChatGPT experience from just giving quick answers into a guided, step-by-step learning process. It helps users build a deeper understanding of any topic by engaging them with Socratic-style questions that prompt critical thinking, breaking down concepts into manageable parts, and personalizing lessons based on the user’s skill level and past interactions if memory is enabled.

    Here is the key features of Study Mode include:

    • Asking interactive, guiding questions to stimulate reasoning rather than providing direct answers immediately.

    • Breaking down complex topics into easy-to-follow sections, progressively increasing in complexity.

    • Offering quizzes, open-ended questions, and personalized feedback to check understanding and track progress.

    • Supporting multimodal inputs like images, PDFs, and voice dictation to work with users’ learning materials.

    • Being accessible on all ChatGPT plans (Free, Plus, Pro, Team) globally, with availability soon in ChatGPT Edu plans.

    Study Mode was developed in collaboration with educators, scientists, and pedagogy experts, incorporating proven learning science principles such as managing cognitive load, encouraging active participation, fostering curiosity, and promoting metacognition and self-reflection.

    It is particularly useful for homework help, exam preparation, and unpacking class concepts while encouraging students to think critically rather than just completing tasks. Although it enhances learning engagement, users can still switch back to the standard ChatGPT interface for direct answers, which some critics argue may limit its impact if students prefer shortcuts.

    To activate it, users select Tools in the ChatGPT prompt window and choose Study and learn, then specify their topic, level, and goals for a tailored learning session.

    ChatGPT Study Mode is an AI-powered interactive tutoring experience aimed at supporting deeper, more active learning through guided questioning, personalized content, and multimodal inputs, rather than just providing finished answers. It represents OpenAI’s effort to make ChatGPT a more effective educational tool.

  • Rakuten launches comprehensive AI platform across services

    Rakuten has launched a comprehensive AI platform called “Rakuten AI,” designed to enhance and streamline user experiences across its entire ecosystem of services. The full-scale launch began on July 30, 2025, with initial integration into Rakuten Link, the communication app for Rakuten Mobile subscribers. Rakuten AI is accessible free of charge and also available as a standalone web app in beta, aimed at broad user engagement in digital communications.

    The platform features advanced agentic AI capabilities, including chat functions, automatic search prompts, voice-to-text and image input, AI research, personalized shopping recommendations, translation, text reading support, programming assistance, and image generation. Rakuten AI has deep Japanese language and cultural awareness, which enables it to provide personalized, context-rich interactions in areas like e-commerce, fintech, travel, education, wellness, and entertainment.

    A significant expansion is planned for autumn 2025, when Rakuten AI will be integrated into Rakuten Ichiba, Rakuten’s flagship e-commerce marketplace. This integration will offer users real-time product recommendations based on behavioral data and purchasing insights, broadening the AI’s role in personalized customer experiences.

    Additionally, Rakuten Mobile offers a corporate-focused generative AI service called “Rakuten AI for Business,” launched earlier in 2025, which supports business tasks such as document creation, translation, brainstorming, and analysis. This service is optimized for the Japanese market and business customs, emphasizing security and ease of deployment at a monthly subscription rate.

    Rakuten’s AI initiative, branded under the term “AI-nization,” reflects the company’s strategic commitment to embedding AI deeply within its product ecosystem to empower both consumers and businesses. This approach was highlighted at Rakuten AI Optimism 2025, a three-day event dedicated to showcasing AI technologies and innovations across Rakuten’s services.

  • Gemini 2.5 Flash-Lite is now stable and generally available

    Gemini 2.5 Flash-Lite is Google DeepMind’s most cost-efficient and fastest model in the Gemini 2.5 family, designed specifically for high-volume, latency-sensitive AI tasks such as translation, classification, and other real-time uses. It balances performance and low cost without compromising quality, making it ideal for applications requiring both speed and efficiency.

    Key features of Gemini 2.5 Flash-Lite include:

    • Low latency and high throughput optimized for real-time, high-volume workloads.
    • Optional native reasoning (“thinking”) capabilities that can be toggled on for more complex tasks, enhancing output quality.
    • Tool use support including abilities like search and code execution.
    • Cost efficiency at rates of about $0.10 input and $0.40 output per million tokens, providing an economical choice for large-scale use.
    • Supports multiple input types including text, images, video, audio, and PDF.
    • Token limit of up to 1,048,576 for input and 65,536 for output.
    • Available for production use via Google AI Studio and Vertex AI.

    It stands out for combining speed, cost-effectiveness, quality reasoning, and multitasking capabilities, making it suitable for developers needing scalable, interactive, and real-time AI services.

  • Gemini Embedding now generally available in the Gemini API

    The Gemini Embedding model, called gemini-embedding-001, is a state-of-the-art text embedding model recently made generally available by Google through the Gemini API and Vertex AI. It is designed to generate dense vector representations of text that capture semantic meaning, enabling advanced natural language processing applications.

    Here is the key features of gemini-embedding-001 include:

    • High Performance and Versatility: It consistently ranks top on the Massive Text Embedding Benchmark (MTEB) for multilingual tasks, outperforming previous Google embedding models and many commercial alternatives.
    • Multilingual Support: Supports over 100 languages, making it ideal for global and cross-lingual applications such as translation, semantic search, and classification.
    • Long Input Handling: Accepts input sequences up to 2048 tokens, allowing for longer and more complex text or document embeddings.
    • Large Embedding Dimension: Outputs vectors with a default size of 3072 dimensions, offering detailed semantic representation. Developers can scale down the output dimensions to 1536 or 768 using Matryoshka Representation Learning (MRL) to balance between embedding quality, computational cost, and storage needs.
    • Unified Across Domains: Performs well across diverse fields—science, legal, finance, software development—offering a single solution for multiple enterprise and research use cases.
    • Flexible Usage: Available with free and paid tiers on Google’s Gemini API, allowing experimentation at no cost and scaling for production.

    Overall, gemini-embedding-001 provides a cutting-edge, flexible, and efficient embedding solution that can be integrated easily to enhance tasks like semantic search, classification, recommendation, and more sophisticated AI workflows across many languages and domains.

  • Qwen3-Coder: Agentic Coding in the World

    Qwen3-Coder is a cutting-edge open-source AI coding model developed by Alibaba’s Qwen team, designed specifically for advanced programming tasks such as code generation, debugging, and managing complex software workflows. Its standout variant, Qwen3-Coder-480B-A35B-Instruct, features a massive Mixture-of-Experts (MoE) architecture with 480 billion parameters, though only 35 billion are active during inference, allowing efficient use of computational resources.

    Here is the key features of Qwen3-Coder :

    • Agentic Coding Capabilities: It excels at autonomous multi-step programming tasks, including generating, testing, debugging code, and integrating with external tools and browsers, making it highly interactive and capable of handling complex developer workflows.
    • Ultra-Long Context Window: Natively supports up to 256,000 tokens, with the ability to extend to 1 million tokens, which facilitates handling large codebases, multi-file projects, and long contextual interactions in one pass.
    • High Training Volume and Code Focus: Trained on 7.5 trillion tokens, with 70% from code data, ensuring strong code understanding and generation capabilities.
    • Reinforcement Learning and Post-Training: Enhanced with long-horizon reinforcement learning, enabling it to learn from multi-step interactions and improve task execution success, especially in real-world settings.
    • Multi-Language Support: Optimized for more than 40 programming languages, including Python, Java, JavaScript, C++, Rust, Go, and many others, tailored to conform with best coding practices for each.
    • Open-Source Tools and Ecosystem: Alongside the model, Alibaba released Qwen Code, a command-line interface tool optimized to leverage the model’s agentic features by allowing natural language interaction for programming tasks.
    • Competitive Performance: Qwen3-Coder achieves state-of-the-art results on major coding benchmarks and performs comparably to leading proprietary models like Claude Sonnet 4.

    Qwen3-Coder is a powerful, scalable, and versatile AI coding assistant that supports complex, multi-turn development workflows by understanding, generating, and debugging code effectively across many languages and environments. It is publicly available as open source, fostering community collaboration and practical adoption.

  • SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

    SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction is a new method in computer vision focused on Video Object Segmentation (VOS), which is the task of tracking and segmenting target objects across video frames. Unlike traditional methods that rely mainly on feature or appearance matching, SeC introduces a concept-driven framework that progressively builds high-level, object-centric representations or “concepts” of the target object.

    Here is the key points about SeC:

    • Concept-driven approach: It moves beyond pixel-level matching to construct a semantic “concept” of the object by integrating visual cues across multiple video frames using Large Vision-Language Models (LVLMs). This allows more human-like understanding of objects.
    • Progressive construction: The object concept is built progressively and used to robustly identify and segment the target even across drastic visual changes, occlusions, and complex scene transformations.
    • Adaptive inference: SeC dynamically balances semantic reasoning via LVLMs with enhanced traditional feature matching, adjusting computational resources based on scene complexity to improve efficiency.
    • Benchmarking: To evaluate performance in conceptually challenging video scenarios, the authors introduced the Semantic Complex Scenarios Video Object Segmentation benchmark (SeCVOS), including 160 videos with significant appearance and scene variations.
    • Performance: SeC achieved state-of-the-art results, showing an 11.8-point improvement over the prior best method (SAM 2.1) on the SeCVOS benchmark, highlighting its superior capability in handling complex videos.

    In simpler terms, SeC works like a “smart detective” that learns and refines a rich mental image or concept of the object being tracked over time, similar to how humans recognize objects by understanding their characteristics beyond just appearance. This approach significantly advances video object segmentation, especially in challenging conditions where objects undergo drastic changes or are partially obscured.

  • What is the Hierarchical Reasoning Model (HRM)?

    The Hierarchical Reasoning Model (HRM) is a novel AI architecture designed to efficiently perform complex sequential reasoning tasks by mimicking how the human brain processes information at multiple hierarchical levels and timescales. It consists of two interconnected recurrent modules:

    • high-level module that operates slowly to handle abstract, strategic planning.
    • low-level module that runs quickly to perform detailed, local computations based on the high-level plan.

    This separation allows the model to achieve significant computational depth and handle long, complex reasoning sequences within a single forward pass, without requiring large amounts of training data or explicit supervision of intermediate reasoning steps.

    HRM excels at tasks like solving complex Sudoku puzzles, optimal pathfinding in large mazes, and performing well on the Abstraction and Reasoning Corpus (ARC), which is a benchmark for measuring general intelligence capabilities. Remarkably, it attains high performance using only 27 million parameters and about 1,000 training examples, far fewer than typical large language models.

    Key features include:

    • Hierarchical convergence: The low-level module converges to a local solution, which is integrated by the high-level module to update strategies and refine further processing.

    • Adaptive Computational Time (ACT): HRM dynamically adjusts the amount of computation depending on task complexity, improving efficiency.

    It does not rely on large-scale pretraining or chain-of-thought supervision. HRM’s internal dynamic reasoning processes can be decoded and visualized, offering interpretability advantages over other neural reasoning methods. Overall, HRM represents a brain-inspired approach toward universal and general-purpose AI reasoning systems, offering substantial computational efficiency and stronger reasoning capabilities compared to larger, conventional models.

  • Google Shopping introduced a new AI-powered shopping experience called “AI Mode”

    Google Shopping introduced a new AI-powered shopping experience called “AI Mode,” featuring several advanced tools to enhance product discovery, try-ons, and price tracking.

    Here is the Key updates include:

    • Virtual Try-On: Shoppers in the U.S. can upload a full-length photo of themselves and virtually try on styles from billions of apparel items across Google Search, Google Shopping, and Google Images. This tool helps visualize how clothes might look without needing to physically try them on, making shopping more personalized and interactive.

    • AI-Powered Shopping Panel: When searching for items, AI Mode runs simultaneous queries to deliver highly personalized and visually rich product recommendations and filters tailored to specific needs or preferences. For example, searching for travel bags can dynamically update to show waterproof options suitable for rainy weather.

    • Price Alerts with Agentic Checkout: Users can now “track price” on product listings, specify preferred size, color, and target price. Google will notify shoppers when the price drops to their desired range, helping them buy at the right time.

    • Personalized and Dynamic Filters: The system uses Google’s Gemini AI models paired with the Shopping Graph that contains over 50 billion fresh product listings, enabling precise filtering by attributes like size, color, availability, and price.

    • Personalized Home Feed and Dedicated Deals Page: Google Shopping offers customized feeds and dedicated deals sections tailored to individual shopping habits and preferences.

    These features are designed to make online shopping more intuitive, personalized, and efficient, leveraging AI to guide buyers from product discovery through to purchase.Google plans to roll out these features broadly in the U.S. in the coming months of 2025, enhancing the online shopping experience through AI-driven insights and assistance.

  • Microsoft Copilot adds visual avatar with real-time expressions

    Microsoft has introduced an experimental feature called Copilot Appearance, which gives its Copilot AI assistant a visual avatar capable of real-time expressions and gestures. This new feature brings non-verbal communication to Copilot, enhancing voice interactions with an animated avatar that smiles, nods, and displays a range of emotional cues, making the experience more human-like and engaging.

    Here is the Key Details on Copilot Appearance:

    • What It Is: Copilot Appearance is a dynamic, blob-shaped avatar that reacts visually to conversations. It shows real-time facial and body expressions, such as smiling, nodding, or showing surprise, based on the context of your voice chat.

    • How It Works: To use the feature, enter Voice Mode on the Copilot web interface by clicking the microphone icon, then go to Voice Settings and toggle “Copilot Appearance” on. Once enabled, Copilot will react to what you say with animations and expressions.

    • Scope and Availability: The feature is currently in early experimental rollout, limited to select users in the United States, United Kingdom, and Canada. Microsoft has not announced a broader or global release yet, and the feature is only available through the browser version of Copilot—not on Windows, macOS, or mobile apps.

    • Intended Purpose: Beyond basic utility, the avatar aims to make interactions warmer, less robotic, and more relatable through non-verbal cues. According to Microsoft’s AI chief Mustafa Suleyman, the goal is to give Copilot a persistent identity and sense of presence, with potential for further personalization in the future.

    • Comparison and Context: Unlike previous Microsoft animated assistants (such as Clippy), Copilot’s avatar is designed to be less intrusive, more ambient, and focused on signaling understanding and personality rather than distracting animations.

    • Current Limitations: Access is limited, the feature is still experimental, and it doesn’t add productivity features—it’s about improving user engagement and feedback is being closely monitored.

    Copilot’s new visual avatar represents a significant step in making AI assistants more expressive and lifelike, but access is currently limited and it is not yet available on all platforms.