AI Weekly News Roundup - 05/8/2024

August 5, 2024 · 5 min read

COO of Neuronic AI

DeepMind's Mathematical Breakthrough

DeepMind has expanded the capabilities of its AlphaZero AI system to tackle mathematical problems, demonstrating the versatility of reinforcement learning approaches beyond game-playing. Building on AlphaZero's success in mastering chess, shogi, and Go through self-play 1, researchers adapted the algorithm to discover faster algorithms for fundamental computer science tasks. The new system, called AlphaDev, found ways to speed up sorting algorithms by up to 70% and improved key cryptographic hashing algorithms by 30% 2. These optimizations are already being implemented in widely-used programming languages like C++, potentially impacting trillions of computations daily 2. This advancement showcases how AI systems originally designed for games can be repurposed to solve real-world computational challenges, pushing the boundaries of algorithm efficiency and computer science innovation.

Multi-Angle Video Generation

Stability AI has introduced Stable Video 4D, a groundbreaking AI model that transforms a single object video into multiple novel-view videos from eight different angles 2. This innovative technology builds upon the company's Stable Video Diffusion model, moving from image-based video generation to full 3D dynamic video synthesis 2. Stable Video 4D generates 5-frame videos across 8 views in about 40 seconds, with the entire 4D optimization taking approximately 20 to 25 minutes 2. The model's ability to generate multiple novel-view videos simultaneously improves consistency in spatial and temporal axes, resulting in more detailed and faithful outputs compared to existing works 2. Potential applications include game development, video editing, and virtual reality, with ongoing research aimed at refining the model to handle a wider range of real-world videos 2 3.

Multilingual Speech Acceleration

ElevenLabs has released Turbo v2.5, a significant upgrade to their text-to-speech model that offers enhanced speed and language support. This new version provides a threefold increase in speed compared to its predecessor, with latency reduced to 300 milliseconds, making it ideal for real-time conversational AI applications 3. Turbo v2.5 now supports 32 languages, including Hindi, French, Spanish, and Mandarin, and introduces support for Vietnamese, Hungarian, and Swedish 4. The model is 25% faster than Turbo v2 for English text-to-speech conversion and is highly optimized for low-latency applications without compromising vocal performance 4. While it maintains high accuracy, especially with properly created instant voice clones, it does not include the style slider feature to prioritize speed 4. This release demonstrates ElevenLabs' commitment to advancing multilingual AI technology and improving user experience across various applications, from education to entertainment 3.

Google's Compact AI Models

Google has released Gemma 2 2B, a lightweight and efficient large language model designed for deployment on devices with limited resources. This 2 billion parameter model can run on just 1GB of GPU memory, making it suitable for use on laptops, desktops, and edge devices 1 2. Gemma 2 2B is available in both pre-trained and instruction-tuned variants, offering versatility for various text generation tasks including question answering, summarization, and reasoning 2 3. The model's relatively small size democratizes access to state-of-the-art AI capabilities, allowing for deployment in resource-constrained environments 2. Gemma 2 2B can be easily integrated with popular AI frameworks and tools such as LangChain, LlamaIndex, and Transformers, facilitating its use in a wide range of applications 1 3.

Video Segmentation Breakthrough

Meta has introduced SAM 2 (Segment Anything Model 2), a groundbreaking AI model for object segmentation in both images and videos. Building upon the success of its predecessor, SAM 2 offers a unified architecture that enables real-time, promptable segmentation across different media types 1. The model achieves state-of-the-art performance in video object segmentation, outperforming existing methods on various benchmarks while requiring three times less interaction time 1 2. SAM 2's capabilities include zero-shot generalization, allowing it to segment previously unseen objects, and a streaming memory mechanism for efficient video processing 1 3. To train SAM 2, Meta created the SA-V dataset, containing over 600,000 masklet annotations across 51,000 diverse videos 2. This advancement in AI technology opens up new possibilities for applications in fields such as video editing, augmented reality, and scientific research 1 3.

Russian AI Challenger

YandexGPT Experimental, a new and more powerful version of Yandex's basic language model, has entered the top ranks of the LLM Arena rating, performing on par with advanced models like GPT-4 Turbo and Claude 3.5 Sonnet. 1 This achievement is particularly notable in the Russian language domain, where YandexGPT Experimental excels in answering questions. The LLM Arena, launched by independent developers from the Russian ML community, provides a platform for evaluating large language models through user-driven assessments, offering an objective benchmark for Russian-language AI capabilities. 1 This development signals Yandex's progress in the competitive field of AI language models and highlights the growing sophistication of Russian-language AI technologies.

JPMorgan's AI Research Assistant

JPMorgan Chase has introduced LLM Suite, an in-house generative AI chatbot designed to function as a research analyst for employees in its asset and wealth management division 4 5. This AI tool, which is JPMorgan's version of OpenAI's ChatGPT, can assist with various tasks including writing, idea generation, problem-solving using spreadsheets, and summarizing documents 4. The bank has made LLM Suite available to many employees, with approximately 50,000 staff members (about 15% of its workforce) currently having access to the software 5. By developing its own AI chatbot, JPMorgan aims to enhance productivity while ensuring compliance with strict regulations and maintaining data security, addressing concerns that previously led the bank to restrict employee use of public AI tools like ChatGPT 5.

DeepMind's Mathematical Breakthrough​

Multi-Angle Video Generation​

Multilingual Speech Acceleration​

Google's Compact AI Models​

Video Segmentation Breakthrough​

Russian AI Challenger​

JPMorgan's AI Research Assistant​