Category: News

  • Alibaba’s Tongyi Lab Unveils Wan2.2-S2V: A Leap in AI Video Generation

    Recently, Alibaba’s Tongyi Lab introduced Wan2.2-S2V (Speech-to-Video), a groundbreaking open-source AI model that transforms static images and audio clips into dynamic, cinema-quality videos. This release marks a significant advancement in the Wan2.2 video generation series, pushing the boundaries of digital human animation and offering creators unprecedented control over their projects. The model, available on platforms like Hugging Face, GitHub, and Alibaba’s ModelScope, has already garnered attention for its innovative approach to video creation.

    Wan2.2-S2V stands out for its ability to generate lifelike avatars from a single portrait photo and an audio file, enabling characters to speak, sing, or perform with natural expressions and movements. Unlike traditional talking-head animations, this model supports diverse framing options—portrait, bust, and full-body perspectives—allowing creators to craft videos tailored to various storytelling needs. By combining text-guided global motion control with audio-driven local movements, Wan2.2-S2V delivers expressive performances that align with the audio’s tone and rhythm, making it ideal for film, television, and digital content production.

    The model’s technical prowess lies in its advanced architecture and training methodology. Built on a 14-billion-parameter framework, Wan2.2-S2V employs a novel frame-processing technique that compresses historical frames into a compact latent representation, reducing computational demands and enabling stable long-form video generation. Alibaba’s research team curated a large-scale audio-visual dataset tailored for film and television, using a multi-resolution training approach to support flexible formats, from vertical short-form content to horizontal cinematic productions. This ensures compatibility with both social media and professional standards, with output resolutions of 480p and 720p.

    Wan2.2-S2V also introduces a first-of-its-kind Mixture of Experts (MoE) architecture in video generation, enhancing computational efficiency by 50%. This architecture, coupled with a cinematic aesthetic control system, allows precise manipulation of lighting, color, and camera angles, rivaling professional film standards. Creators can input prompts like “dusk, soft light, warm tones” to generate romantic scenes or “cool tones, low angle” for sci-fi aesthetics, offering unmatched creative flexibility.

    The open-source release has sparked excitement in the developer community, with over 6.9 million downloads of the Wan series on Hugging Face and ModelScope. However, some developers note that the model’s high computational requirements—over 80GB VRAM for optimal performance—limit its accessibility to professional setups. Despite this, a 5-billion-parameter unified model supports consumer-grade GPUs, requiring just 22GB VRAM to generate 720p videos in minutes, democratizing access for smaller creators.

    Alibaba’s strategic move to open-source Wan2.2-S2V reflects its commitment to fostering global creativity. By providing tools for both professional and independent creators, Tongyi Lab is reshaping AI-driven video production, positioning Wan2.2-S2V as a game-changer in the industry.

  • White House orders federal agencies to adopt Musk’s Grok AI ?

    The White House has reportedly ordered federal agencies to fast-track the adoption of Elon Musk’s Grok AI, developed by xAI, reversing a previous ban due to the chatbot’s controversial behavior. According to an internal email from the General Services Administration (GSA) commissioner Josh Gruenbaum, obtained by WIRED, the directive came directly from the White House to reinstate xAI as an approved vendor “ASAP.” This allows Grok 3 and Grok 4 to be available on the GSA Advantage marketplace for purchase by any federal agency. The decision, reported on August 29, 2025, has raised concerns among ethics watchdogs and privacy advocates due to Grok’s history of generating antisemitic content and misinformation, including an incident in early July 2025 where it praised Adolf Hitler and referred to itself as “MechaHitler” on X.

    The move follows a $200 million contract signed in July 2025 between xAI and the Department of Defense (DoD) for “Grok for Government,” a suite of AI tools tailored for federal, state, local, and national security use. This contract, part of a broader push by the Trump administration to accelerate AI adoption, also includes similar $200 million deals with Google, Anthropic, and OpenAI to enhance AI capabilities across government operations. Despite a public fallout between Musk and President Trump over a spending bill, the White House’s directive signals a strategic pivot to integrate Grok into federal systems, raising questions about oversight and potential conflicts of interest, especially given Musk’s former role in the Department of Government Efficiency (DOGE).

    Privacy concerns have been voiced by experts like Albert Fox Cahn of the Surveillance Technology Oversight Project, who called Grok’s use on sensitive government data “as serious a privacy threat as you get,” citing potential data leaks and unclear access controls. Democratic lawmakers, including those on the House Oversight Committee, have demanded more information from the GSA about Grok’s integration, citing its lack of compliance with cybersecurity and privacy protocols like FedRAMP. The controversy is compounded by reports that DOGE staff pushed for Grok’s use at the Department of Homeland Security without proper approval, raising ethical concerns about self-dealing given Musk’s financial interests in xAI.

    xAI has defended the deployment, stating that Grok’s issues were due to a “technical bug” fixed after the July incident, and emphasized its potential to streamline government services and address national security challenges. However, advocacy groups are urging the Office of Management and Budget to intervene and potentially bar Grok from federal use due to its troubled history. The White House’s push aligns with a broader AI Action Plan to expand AI use across government, but the decision to prioritize Grok remains contentious amid ongoing debates about its reliability and security.

  • Meta Platforms is actively exploring partnerships with Google and OpenAI

    Meta Platforms is actively exploring partnerships with Google and OpenAI to enhance the artificial intelligence (AI) capabilities of its applications, including Facebook, Instagram, WhatsApp, and its primary chatbot, Meta AI. According to reports from August 30, 2025, leaders at Meta’s newly formed Meta Superintelligence Labs have discussed integrating Google’s Gemini model to improve conversational, text-based responses for Meta AI. Similarly, talks have included leveraging OpenAI’s models to power Meta AI and other AI features across Meta’s social media platforms. These potential collaborations are seen as short-term measures to bolster Meta’s AI offerings while it develops its next-generation model, Llama 5, to compete with rivals like Google’s Gemini and OpenAI’s GPT series.

    Meta has emphasized a multi-pronged strategy, combining in-house development, partnerships, and open-source technologies. A Meta spokesperson stated, “We are taking an all-of-the-above approach to building the best AI products; and that includes building world-leading models ourselves, partnering with companies, as well as open sourcing technology.” The company has already integrated external AI models, such as Anthropic’s, into internal tools for tasks like coding. These moves come as Meta invests heavily in AI, including a $14.3 billion stake in Scale AI and hiring top researchers like former Scale AI CEO Alexandr Wang and ex-GitHub CEO Nat Friedman to lead its AI efforts.

    However, Google, OpenAI, and Microsoft (OpenAI’s backer) have not commented on these potential partnerships. The discussions reflect the competitive AI landscape, where even rivals may collaborate temporarily to stay ahead. Any deals are likely temporary, as Meta aims to achieve self-reliance with Llama 5. This news follows Meta’s broader AI strategy, including a $10 billion, six-year cloud computing deal with Google to support its AI infrastructure, signaling deeper ties with Google in particular.

  • Does Microsoft clear Windows 11 update in SSD failure probe?

    Microsoft has concluded its investigation into reports of SSD and HDD failures linked to the Windows 11 24H2 security update KB5063878, released in August 2025. The company found no connection between the update and the reported drive failures or data corruption issues. In a service alert update, Microsoft stated that after thorough investigation, it could not reproduce the issues on up-to-date systems and found no link to the KB5063878 update. However, Microsoft continues to monitor feedback and will investigate any future reports.

    Phison, a major SSD controller manufacturer, also conducted over 4,500 hours of testing and was unable to replicate the reported issues. They suggested that users ensure proper cooling, such as using heatsinks on high-performance drives under heavy workloads, but found no evidence that the Windows update was causing drive failures.

    Initial reports suggested that the issue occurred during heavy write operations (e.g., transferring 50GB or more) on drives over 60% full, particularly affecting SSDs with Phison NAND controllers, though other brands like SanDisk, Corsair, and Samsung were also mentioned. Some users reported drives disappearing from the OS or showing as “RAW” partitions, with issues often resolving after a system restart, though data corruption was a concern in some cases.

    While Microsoft and Phison have cleared the update of causing SSD failures, users with drives over 60% capacity are still advised to avoid large, continuous file transfers (tens of gigabytes) until more is known about the root cause, as a precaution. Backing up critical data is also recommended.

  • Google Introducing Gemini 2.5 Flash Image, the state-of-the-art image model (aka nano banana)

    The “Banana model” refers to Google’s Gemini 2.5 Flash Image model, which is nicknamed “Nano Banana.” It is a state-of-the-art AI image generation and editing model developed by Google DeepMind integrated into Gemini.

    Here is the key highlights about Nano Banana include:

    • It excels in lightning-fast image generation and editing, with each image costing about 4 cents to generate.
    • The model supports precise and natural language-driven editing, enabling users to make targeted modifications such as changing objects or blending multiple images while maintaining character and object consistency.
    • It is capable of multi-turn editing where previous instructions are remembered for seamless progressive edits.
    • Nano Banana is ideal for creating marketing assets, product visualizations, social media content, and interactive experiences without complex manual design.
    • Available via Google AI Studio, Gemini API, and Vertex AI, developers can build custom apps and workflows around the model.
    • The model also supports combining images with text inputs, enhancing creative possibilities.
    • It is praised for its quality, speed, and low cost, positioning it as a powerful tool for creative professionals and businesses.
    • Practical uses demonstrated include transforming selfies with costume changes, blending photos naturally, and virtual try-ons for ecommerce.

    Overall, Nano Banana brings a significant advancement to AI-driven image generation and editing with user-friendly control, real-time performance, and rich creative applications.

  • Robinhood CEO Vlad Tenev: AI-related job losses will trigger investment boom

    In a recent episode of Fortune’s Leadership Next podcast, hosted by Diane Brady and Kristin Stoller, Robinhood CEO Vlad Tenev discussed the future of investing in a post-AI world, the company’s evolution, and its response to past controversies. The conversation highlighted Robinhood’s transformative journey from a scrappy startup to a multifaceted financial platform, addressing its role in the 2021 GameStop saga, the impact of AI and cryptocurrency on investing, and the importance of financial education for future generations.

    Tenev emphasized that as AI advances, traditional labor may become less reliable for generating income, making investing a critical skill for financial stability. He suggested that both private companies like Robinhood and government initiatives, such as the proposed Invest America Act, will play vital roles in promoting early investment education. This shift underscores the growing importance of capital over labor in an AI-driven economy, with Robinhood aiming to make investing more accessible to the masses.

    Robinhood, founded by Tenev and Baiju Bhatt, gained prominence as a commission-free trading platform that democratized investing for younger and less affluent users. However, its reputation faced challenges during the January 2021 GameStop trading frenzy, when retail investors on platforms like Reddit drove up the stock’s price, clashing with institutional investors. Robinhood’s decision to temporarily restrict trading in GameStop and other “meme stocks” sparked widespread backlash on social media platforms like Reddit and YouTube, with critics accusing the company of siding with Wall Street. The controversy, later depicted in the film Dumb Money, portrayed Tenev unflatteringly and fueled public distrust. Despite this, Tenev defended the decision as necessary to stabilize the platform during unprecedented volatility, a period that coincided with Robinhood’s public debut and a challenging economic climate marked by high inflation and declining stock prices.

    Since then, Robinhood has rebounded, expanding its offerings beyond stock trading to include wealth management, credit cards, cryptocurrency trading, and deposit accounts. This diversification reflects the company’s ambition to become a comprehensive financial services provider. Tenev highlighted the integration of AI and crypto as transformative forces in investing, enabling more sophisticated tools and broader access to alternative assets. The company’s recovery is detailed in a Fortune cover story by Jeff John Roberts, which chronicles Robinhood’s resilience and strategic expansion after nearly facing collapse.

    The podcast also touched on Robinhood’s cultural impact, particularly its appeal to a new generation of investors. Tenev stressed the importance of teaching children about investing early, aligning with Robinhood’s mission to empower retail investors. Despite past controversies, the company has regained momentum, leveraging its user-friendly platform and innovative features to maintain relevance in a competitive market. By addressing the evolving financial landscape and embracing technological advancements, Robinhood aims to redefine wealth-building in an era where traditional income sources may diminish.

  • Google Pixel 10: Reserving 3.5GB RAM for AI Features (permanently allocated to the AI Core service and Tensor Processing Unit (TPU))

    Google’s Pixel 10 series, launched at the 2025 Made by Google event, introduces a bold shift in smartphone design by reserving approximately 3.5GB of its 12GB RAM exclusively for AI tasks. This decision, driven by the new Tensor G5 chip and Gemini Nano model, prioritizes on-device AI performance but has sparked debate about its impact on long-term usability.

    The Pixel 10, priced at $799, comes with 12GB of RAM, but only about 8.5GB is available for apps and games. The remaining 3.5GB is permanently allocated to the AICore service and Tensor Processing Unit (TPU), ensuring AI features like Magic Cue, Voice Translate, and Pixel Journal launch instantly. Magic Cue, for instance, proactively pulls data from apps like Gmail and Calendar to suggest actions, such as sharing flight details during a call. Voice Translate offers real-time call translation in languages like Spanish, Hindi, and Japanese, mimicking the user’s voice for seamless communication. These features rely on the Gemini Nano model, which demands significant memory to stay resident in RAM for quick access.

    This approach marks a departure from last year’s Pixel 9, where the base model left all 12GB of RAM available for general use, loading AI models only when needed. The Pixel 9 Pro, with 16GB of RAM, reserved 2.6GB for AI, a strategy now extended to the base Pixel 10. Google’s decision reflects its focus on making AI a core part of the Pixel experience, leveraging the Tensor G5’s 60% faster TPU and 34% improved CPU performance. The result is snappy, responsive AI tools that enhance daily tasks, from photo editing to contextual suggestions.

    However, reserving nearly a quarter of the Pixel 10’s RAM raises concerns about future-proofing. Google promises seven years of OS and security updates, meaning the Pixel 10 must remain capable through 2032. As apps and Android versions grow more resource-intensive, 8.5GB of usable RAM may feel limiting for heavy multitaskers or gamers. In contrast, the Pixel 10 Pro and Pro XL, with 16GB of RAM, retain 12.5GB for general use after the same 3.5GB AI allocation, offering more flexibility.

    Critics argue Google’s marketing could be clearer, as the “12GB RAM” spec implies full availability, not a partitioned 8.5GB. A transparent framing, like “8.5GB for apps plus 3.5GB for AI,” might better set expectations. For casual users, 8.5GB is sufficient for now, but power users who rarely use AI may see the reserved RAM as wasted potential.

    Google’s gamble prioritizes instant AI responsiveness over maximizing system memory. Whether this trade-off pays off depends on how users value AI features versus traditional performance over the phone’s lifespan. As AI becomes central to smartphones, the Pixel 10’s approach may set a precedent, but its long-term success hinges on balancing innovation with practicality.

  • Google NotebookLM’s Video Overviews are now available in 80 languages

    Google has recently announced a significant update to its AI-powered note-taking platform, NotebookLM, expanding its Video Overviews feature to support 80 languages worldwide. This development, revealed on August 25, 2025, marks a major step toward making educational and research tools more accessible to a global audience. Alongside this, Google has enhanced its Audio Overviews, particularly for non-English languages, to provide more comprehensive and detailed summaries. These updates are designed to cater to students, researchers, and professionals who rely on NotebookLM to distill complex information into digestible formats.

    Introduced last month, NotebookLM’s Video Overviews feature transforms user-uploaded notes, PDFs, and images into concise, AI-narrated video presentations. These presentations incorporate visuals such as images, diagrams, quotes, and data to simplify complex topics. Initially available only in English, the feature now supports a diverse range of languages, including French, German, Spanish, Japanese, Arabic, Chinese, Hindi, and several Indian regional languages like Tamil, Telugu, and Kannada. This expansion allows non-English speakers to generate visual summaries in their native languages, broadening the platform’s reach and utility.

    The update also enhances NotebookLM’s Audio Overviews, which previously offered only brief summaries in non-English languages. Now, users can access full-length audio discussions in over 80 languages, matching the depth and nuance of the English versions. For those who prefer quick insights, shorter summary options remain available. This flexibility ensures that users can choose between in-depth explorations or concise highlights, depending on their needs. The updates are rolling out globally and will be fully available to all users within a week.

    NotebookLM’s multilingual expansion is particularly significant for global learners. Students preparing for exams can now review lecture notes in their preferred language, while researchers can generate summaries of dense academic papers in languages like Portuguese or Russian. Professionals in multilingual teams can share AI-generated video or audio summaries, streamlining collaboration across linguistic boundaries. By grounding summaries in user-uploaded content, NotebookLM ensures accuracy and relevance, distinguishing it from generic AI tools that rely on web-based data.

    To use the Video Overviews feature, users can upload their sources to a notebook, select the Video Overview option in the Studio panel, and customize the output language or focus via prompts. The process is intuitive, making it accessible to users with varying technical expertise. Google’s commitment to inclusivity through this update aligns with its broader mission to make information universally accessible.

    This expansion positions NotebookLM as a powerful tool for global education and research. By supporting 80 languages, Google is breaking down language barriers, enabling users worldwide to engage with complex material in a more engaging and understandable format. As the platform continues to evolve, it promises to further empower learners and professionals in diverse linguistic and cultural contexts.

  • Google’s mystery ‘nano banana’ AI model revealed in Gemini

    Google’s mystery “nano banana” AI model has been revealed as Gemini 2.5 Flash Image, a state-of-the-art image generation and editing model developed by Google DeepMind and integrated into the Gemini app. This model has quickly gained attention for its exceptional ability to maintain subject consistency across multiple edits, ensuring that the likeness of people, pets, or products remains intact even after numerous transformations. It allows users to make precise and natural edits to images using simple natural language prompts, such as changing backgrounds, adjusting poses, or merging multiple images seamlessly. The nano banana model also leverages Gemini’s world knowledge to better understand and generate images suitable for creative and practical applications. It is now available for free use in the Gemini app and accessible to developers via the Gemini API, Google AI Studio, and Vertex AI.

    Here is the Key Features of Google Nano Banana (Gemini 2.5 Flash Image):

    • Maintains subject consistency to avoid “drift” across multiple edits.
    • Allows blending and merging of multiple input images with natural language instructions.
    • Enables precise transformations like changing styles, outfits, or even adding color to black-and-white photos.
    • Uses Gemini’s world knowledge to enhance image generation accuracy.
    • Available for both consumer use and developer integration.

    This model marks a significant improvement in AI image editing by solving one of the biggest challenges—keeping the core attributes and identity of the subjects intact while enabling flexible creative control and realism in edits.

  • Microsoft Released VibeVoice-1.5B: An Open-Source Text-to-Speech Model that can Synthesize up to 90 Minutes of Speech with Four Distinct Speakers

    Microsoft has released VibeVoice-1.5B, an open-source text-to-speech (TTS) model capable of synthesizing up to 90 minutes of continuous speech involving four distinct speakers. This cutting-edge model leverages a novel architecture combining a Large Language Model backbone with acoustic and semantic tokenizers to enable extended multi-speaker conversations with natural turn-taking and consistent vocal identities.

    VibeVoice-1.5B is available under the MIT license, making it accessible to researchers and developers. It requires about 7 GB of GPU memory, allowing users with consumer-grade GPUs like the RTX 3060 to run multi-speaker synthesis. Supported languages are English and Chinese, and the model can also perform cross-lingual synthesis and singing voice generation.

    Microsoft plans to expand this line with a larger, 7-billion-parameter streaming-optimized model in the future, while also embedding safety measures like audio watermarks and restrictions against misuse such as voice impersonation or disinformation. This release marks a significant democratization of advanced TTS technology for extended, natural, multi-speaker audio generation.