Category: News

  • Google’s mystery ‘nano banana’ AI model revealed in Gemini

    Google’s mystery “nano banana” AI model has been revealed as Gemini 2.5 Flash Image, a state-of-the-art image generation and editing model developed by Google DeepMind and integrated into the Gemini app. This model has quickly gained attention for its exceptional ability to maintain subject consistency across multiple edits, ensuring that the likeness of people, pets, or products remains intact even after numerous transformations. It allows users to make precise and natural edits to images using simple natural language prompts, such as changing backgrounds, adjusting poses, or merging multiple images seamlessly. The nano banana model also leverages Gemini’s world knowledge to better understand and generate images suitable for creative and practical applications. It is now available for free use in the Gemini app and accessible to developers via the Gemini API, Google AI Studio, and Vertex AI.

    Here is the Key Features of Google Nano Banana (Gemini 2.5 Flash Image):

    • Maintains subject consistency to avoid “drift” across multiple edits.
    • Allows blending and merging of multiple input images with natural language instructions.
    • Enables precise transformations like changing styles, outfits, or even adding color to black-and-white photos.
    • Uses Gemini’s world knowledge to enhance image generation accuracy.
    • Available for both consumer use and developer integration.

    This model marks a significant improvement in AI image editing by solving one of the biggest challenges—keeping the core attributes and identity of the subjects intact while enabling flexible creative control and realism in edits.

  • Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers

    Microsoft has released VibeVoice-1.5B, an open-source text-to-speech (TTS) model capable of synthesizing up to 90 minutes of continuous speech involving four distinct speakers. This cutting-edge model leverages a novel architecture combining a Large Language Model backbone with acoustic and semantic tokenizers to enable extended multi-speaker conversations with natural turn-taking and consistent vocal identities.

    VibeVoice-1.5B is available under the MIT license, making it accessible to researchers and developers. It requires about 7 GB of GPU memory, allowing users with consumer-grade GPUs like the RTX 3060 to run multi-speaker synthesis. Supported languages are English and Chinese, and the model can also perform cross-lingual synthesis and singing voice generation.

    Microsoft plans to expand this line with a larger, 7-billion-parameter streaming-optimized model in the future, while also embedding safety measures like audio watermarks and restrictions against misuse such as voice impersonation or disinformation. This release marks a significant democratization of advanced TTS technology for extended, natural, multi-speaker audio generation.

  • Anthropic has released a new AI agent called “Claude for Chrome” that works in a side panel within the browser

    Anthropic has released a new AI agent called “Claude for Chrome,” which integrates directly into the Google Chrome browser as an extension. This agent is powered by Anthropic’s Claude AI models and is currently in a research preview phase. It is available to a limited group of 1,000 subscribers on Anthropic’s Max plan, with a waitlist for others interested.

    Claude for Chrome works in a side panel within the browser, allowing it to maintain context on what users are viewing and interact with web pages by clicking buttons and filling out forms upon user permission. This integration aims to make the AI assistant more useful by helping with tasks like managing calendars, scheduling meetings, drafting emails, and more directly within the browser environment.

    Anthropic emphasizes safety and security due to the potential risks posed by AI agents operating in browsers, such as prompt injection attacks where malicious instructions could be hidden on websites. The company has implemented several safeguards to reduce such risks, including site-level permissions for users to control Claude’s access and action confirmations for sensitive tasks. Despite improvements, Anthropic continues testing and refining its defenses before wider release.

    Overall, Claude for Chrome represents Anthropic’s effort to bring AI assistance directly into the user’s browsing experience while prioritizing safety and control.

  • Apple Event Confirmed for September 9 — iPhone 17, Apple Watch 11, AirPods Pro 3 and more

    Apple has officially confirmed its next big event for Tuesday, September 9, 2025. This eagerly awaited event will unveil the new iPhone 17 series, Apple Watch Series 11, AirPods Pro 3, and several other product updates. The event sets the stage for Apple’s latest innovations in hardware and software, including the highly anticipated iPhone 17 Air, which features a notably thin design, and the Apple Watch Ultra 3 with advanced health monitoring features. Additionally, Apple’s iOS 26 with a new “Liquid Glass” design will be showcased. While excitement is high, some analysts expect possible stock volatility following the event due to tempered expectations for revolutionary upgrades in the iPhone lineup. The event will be held at the Steve Jobs Theater in Cupertino and livestreamed globally .

  • Apple considers Google Gemini to power next-gen Siri

    Apple is in early discussions with Google to potentially use Google’s Gemini AI as the core technology to power a redesigned, next-generation Siri voice assistant. The company approached Google with the idea of creating a customized AI model that would run on Apple’s servers and serve as the foundation for the revamped Siri expected to launch in 2026.

    Currently, Apple is exploring multiple options for Siri’s AI “brain.” It is developing two versions simultaneously: one using its own internal AI models (codenamed Linwood) and another based on external technology (codenamed Glenwood), which could be Google’s Gemini or others like Anthropic’s Claude and OpenAI’s ChatGPT. Apple has not yet finalized any agreements or decided whether to fully adopt an external partner or rely on its own AI models. The talks with Google are still preliminary, and Google is training a customized Gemini model for potential use on Apple infrastructure.

    This move aims to catch up with competitors like Google and Samsung, who have integrated generative AI capabilities into their assistants. Apple’s revamped Siri has faced delays and challenges, but the new architecture promises more advanced and personalized AI features.

    In summary, Apple is considering licensing and integrating Google’s Gemini AI to power next-gen Siri but is still weighing its options among several AI providers and has not yet made a final decision.

  • Meta partners with Midjourney to license AI image and video technology

    Meta has entered into a partnership with Midjourney, an AI startup known for its advanced image and video generation technology, to license Midjourney’s “aesthetic technology” for integration into Meta’s future AI models and products. This collaboration involves a technical partnership between the research teams of both companies and aims to help Meta develop AI-powered creative tools that can compete with industry rivals like OpenAI and Google.

    Meta’s Chief AI Officer, Alexandr Wang, described the partnership as part of a comprehensive strategy to deliver the best AI products by combining top talent, ambitious computing resources, and collaborations with leading industry players. Midjourney’s technology, which includes highly advanced models for generating images from text prompts and recently released video models, will enhance Meta’s offerings in AI-generated imagery and video.

    Midjourney remains an independent, community-supported lab with no external investors. The licensing deal signifies a major step in Meta’s AI ambitions, complementing its existing in-house tools such as the AI image generator “Imagine” and AI video editor “Movie Gen.” Meta’s CEO Mark Zuckerberg has heavily invested in AI, acquiring talent and companies to boost its capabilities.

    The partnership could lead to new AI creative tools integrated across Meta’s platforms, potentially improving functionalities in apps like Facebook, Instagram, and WhatsApp by leveraging Midjourney’s unique aesthetic AI technology. The terms and timeline of the partnership’s full rollout have not been disclosed yet, but the collaboration marks a significant move for Meta in the competitive AI space.

  • Elon Musk teases Grok 5, says it could be the first real step toward true AGI (Artificial General Intelligence)

    Elon Musk has teased that his company’s upcoming AI model, Grok 5, could be “a real shot at being a true AGI” (Artificial General Intelligence) and is scheduled to launch before the end of 2025. Musk describes Grok 5 as potentially “crushingly good,” hinting it may surpass previous models and even outperform OpenAI’s GPT-5 according to recent comparisons.

    AGI refers to a type of AI that matches or surpasses human cognitive abilities across virtually all tasks, a milestone that AI companies such as OpenAI and Google are still striving to achieve. Musk’s bold claims about Grok 5 signify a strong belief that it could represent the first genuine step toward AGI, which would be a pivotal moment in AI development.

    Grok-4, the predecessor, has already received praise for faster response times, advanced multimodal support, and strong performance in mathematics and physics. Musk suggests Grok 5 will take this further, enhancing xAI’s position in the competitive AI landscape. So far, detailed capabilities of Grok 5 remain undisclosed, but anticipation is high that it will significantly raise the bar in AI intelligence and functionality.

    In summary, Musk claims Grok 5 AI could be true AGI by year-end 2025, marking a possible breakthrough in AI technology and competition with other leading AI models.

  • The AI Chatbots Big Bang: The Full Study at a Glance, study by OneLittleWeb

    The 2025 study by OneLittleWeb, titled “The AI ‘Big Bang’ Study 2025,” provides a comprehensive analysis of the top 10 AI chatbots based on web traffic, media citations, and user engagement from August 2024 to July 2025. The study, utilizing data from sources like Semrush, aitools.xyz, MuckRack, and app stores, ranks chatbots across eight key performance indicators, offering insights into their market presence, growth, and user experience. The AI tools market, encompassing over 10,500 tools, recorded nearly 100 billion web visits, with the top 10 chatbots capturing 55.88 billion visits, or 58.8% of the total, highlighting significant market consolidation.

    ChatGPT, developed by OpenAI, dominates with 46.59 billion visits and a 48.36% market share, maintaining its position as the most popular chatbot due to its robust performance in language tasks, accessibility, and free availability. Grok, created by xAI, emerges as a surprising second-place contender with 686.91 million visits and a 1.17% market share, driven by its integration into the X platform and rapid user base growth. Other notable chatbots include DeepSeek, Gemini, Perplexity, Claude, Microsoft Copilot, Blackbox AI, Monica, and Meta AI, with DeepSeek and Grok showing the fastest growth rates, the former with a 113,007% year-over-year increase.

    The study reveals a 123.35% year-over-year traffic growth for these top chatbots, adding 30.9 billion visits compared to the previous year, underscoring the rising popularity of conversational AI. Media coverage significantly influences traffic, with peaks in January and February 2025 (817.6K and 1.1M citations) correlating with traffic surges to 4.3 billion and 4.4 billion visits, respectively, and a high of 5.8 billion in March. However, some chatbots like DeepSeek experienced a 39.5% traffic drop over five months, reflecting volatility tied to media attention.

    The methodology emphasizes transparency, using weighted scores across visibility, growth, and user experience metrics to ensure a balanced ranking. This approach helps identify trusted and high-performing chatbots, offering actionable insights for users and businesses. For instance, chatbots like Poe attract users by providing access to multiple AI models, while Meta AI benefits from integration into platforms like Instagram and WhatsApp, likely underrepresenting its true usage.

    For businesses, the study suggests a dual strategy: optimizing for traditional search engines, which still dominate with 1.86 trillion visits, while adapting content for AI-driven platforms through structured data and high-quality, concise answers. The findings dispel the notion that chatbots are replacing search engines, showing they complement them by serving distinct user needs, such as creative tasks versus navigational searches. As AI chatbots continue to evolve, their role in reshaping online discovery is undeniable, but search engines remain dominant for now.

    Details here

  • Google launches a new Pixel Journal app

    Google has launched a new journaling app called Pixel Journal, introduced at the Pixel 10 launch event in August 2025. Pixel Journal uses on-device AI, specifically the Gemini Nano models running on the Pixel 10’s Tensor G5 chip, to provide writing prompts that help users fill out journal entries. These prompts are personalized based on memories, past entries, and user goals to assist in processing thoughts and maintaining journaling habits.

    The app allows users to add photos, locations, activities, and mood entries to their journal. It also provides insights about writing patterns such as when users commonly write, the longest entry by word count in a given period, and the total number of entries per week or month. For privacy, Pixel Journal supports locking the app with a PIN to keep entries secure and all data processing is done offline on the device.

    Currently, Pixel Journal is available exclusively for the Pixel 10 series, with potential plans to expand support to older Pixel devices in the future. The app is currently offered in English only.

    This launch positions Google’s Pixel Journal as a private and AI-enhanced journaling tool, comparable to Apple’s Journal app introduced in 2023 but with a unique approach focusing on on-device AI and data privacy.

  • Meta Rolls Out AI-Powered Audio Translations for Video on Facebook and Instagram

    Meta has introduced an AI-powered video translation feature on Facebook and Instagram that automatically dubs videos, especially Reels, into another language while preserving the creator’s original voice tone and style. This translation tool also offers lip-syncing that matches the translated audio with the creators’ mouth movements, making it seem as if they are naturally speaking the translated language.

    Key features of the Meta video translation tool include:

    • Automatic audio translations between English and Spanish at launch, with plans to add more languages.
    • A toggle option called “Translate your voice with Meta AI” available before posting a Reel, allowing creators to enable translation and optionally add lip-syncing.
    • Creators can preview and approve translations before publishing, with the ability to turn off the feature without affecting the original video.
    • Translated Reels are shown to viewers in their preferred language, with a label indicating the use of Meta AI for translation.
    • Creator tips for best results include speaking clearly, minimizing background noise, facing the camera, and avoiding overlapping speech between two speakers.
    • Facebook creators with more than 1,000 followers and Instagram public accounts can access the feature where Meta AI is available.
    • Meta recently added the ability for Facebook creators to upload up to 20 manual dubbed audio tracks to a single Reel to connect with audiences beyond AI-supported languages.
    • The tool also provides new analytics in the Insights panel, showing views by language to help creators gauge their expanded reach.

    This feature aims to break down language barriers, helping creators reach a broader global audience by making their video content accessible in multiple languages with natural-sounding translations and synced lip movements.