Skip to main content

6 posts tagged with "LLM News"

View All Tags

· 5 min read
Alexander Carrington
Grok

Grok 2 Release Imminent

Elon Musk's xAI is set to release Grok 2, an upgraded version of its AI chatbot, in August 2024. This new iteration promises significant improvements, particularly in purging large language models (LLMs) from internet training data, addressing concerns about data quality and relevance. Grok 2 is expected to surpass current AI models on various metrics and will feature real-time web search integration and image generation capabilities. Following Grok 2, Musk has announced plans for Grok 3, slated for release by the end of 2024. Grok 3 will undergo training on an impressive 100,000 Nvidia H100 GPUs, potentially setting new benchmarks in the field of artificial intelligence.123.

Garner Cloud

Cloud-Driven AI Acceleration

Gartner predicts that the ongoing price war in China's Large Language Model (LLM) market will accelerate the shift of AI development towards cloud platforms. This prediction comes amid a fierce competition among Chinese tech giants, including Alibaba, Tencent, and Baidu, who have significantly slashed prices for their LLM offerings. The price cuts, reaching up to 97% in some cases, are expected to drive wider adoption of AI technologies across various industries. As LLMs become more affordable and accessible, they are likely to be viewed as essential infrastructure, similar to utilities like water and electricity. This trend is anticipated to not only boost the development of China's LLM sector but also accelerate the launch of more advanced AI language models, potentially narrowing the gap with US counterparts.123

Qwen2-Math

Alibaba's Math-Focused AI

Alibaba Cloud has unveiled Qwen2-Math, a series of mathematics-focused large language models (LLMs) that reportedly outperform leading AI models in mathematical tasks. The most advanced model, Qwen2-Math-72B-Instruct, has demonstrated superior performance on various math benchmarks compared to models like GPT-4, Claude 3.5 Sonnet, and Google's Gemini 1.5 Pro. These models, based on the general Qwen2 language models, underwent additional pre-training on a specialized math corpus, enabling them to excel in solving complex mathematical problems. While currently supporting primarily English, Alibaba plans to release bilingual and multilingual versions in the future, potentially expanding the models' applicability across different languages and regions.123

Openai Watermark

AI Text Watermarking Tool

OpenAI has developed a highly effective text watermarking tool capable of detecting AI-generated content with 99.9% accuracy, but has not yet released it publicly due to ongoing internal debates. The tool works by adding an imperceptible pattern to ChatGPT's output, allowing OpenAI to identify content created by its AI model. However, concerns have been raised about potential biases against non-native English writers and the ease with which the watermarking could be circumvented through methods like translation or rephrasing. Additionally, OpenAI is weighing customer feedback, with 69% of ChatGPT users expressing concerns about false accusations of AI cheating and 30% indicating they might switch to rival LLMs if such a tool were deployed. Despite these challenges, OpenAI recognizes the societal risks posed by AI-generated content and is exploring alternative approaches to address transparency and responsible AI use.123

xethub

HuggingFace Acquires XetHub

Hugging Face, a leading platform for open-source machine learning projects, has acquired XetHub, a Seattle-based data storage and collaboration startup founded by former Apple engineers. This acquisition, the largest in Hugging Face's history, aims to enhance the company's ability to host and manage large AI models and datasets. XetHub's technology enables Git-like version control for repositories up to terabytes in size, offering advanced features like content-defined chunking and deduplication. By integrating XetHub's capabilities, Hugging Face plans to upgrade its storage backend, potentially allowing for individual files larger than 1TB and total repository sizes exceeding 100TB. This move is expected to significantly boost Hugging Face's infrastructure, supporting the growing demand for larger AI models and datasets in the rapidly evolving field of artificial intelligence.1234

gpt-4o

GPT-4o Price Reduction

OpenAI has significantly reduced the pricing for its advanced GPT-4o model, making it more accessible to developers and businesses. The new gpt-4o-2024-08-06 model is now available at $2.50 per 1 million input tokens and $10.00 per 1 million output tokens, representing a 50% reduction for input tokens and a 33% reduction for output tokens compared to the previous version. This price cut positions GPT-4o competitively against Google's Gemini 1.5 Pro model, which costs $3.50 per 1 million input tokens and $10.50 per 1 million output tokens. Alongside the price reduction, OpenAI has introduced Structured Outputs in the API, ensuring model-generated outputs conform to specific JSON schemas provided by developers, enhancing the model's utility for various applications.1

Gemini

Gemini Price War Intensifies

Google has significantly reduced the pricing for its Gemini 1.5 Flash model, cutting costs by approximately 80% effective August 12, 2024. The new pricing structure offers input tokens at $0.075 per million and output tokens at $0.30 per million for prompts up to 128,000 tokens, making it nearly 50% cheaper than OpenAI's competing GPT-4o mini model. This aggressive price reduction is part of an ongoing AI pricing war, with Google also expanding language support to over 100 languages and introducing enhanced PDF understanding capabilities. The move is expected to intensify competition in the AI market, potentially challenging smaller AI startups and forcing other major players to reconsider their pricing strategies. Despite the lower cost, Gemini 1.5 Flash still lags behind GPT-4o mini in most AI benchmarks, except for MathVista.123

· 5 min read
Alexander Carrington
Deepmind

DeepMind's Mathematical Breakthrough

DeepMind has expanded the capabilities of its AlphaZero AI system to tackle mathematical problems, demonstrating the versatility of reinforcement learning approaches beyond game-playing. Building on AlphaZero's success in mastering chess, shogi, and Go through self-play 1, researchers adapted the algorithm to discover faster algorithms for fundamental computer science tasks. The new system, called AlphaDev, found ways to speed up sorting algorithms by up to 70% and improved key cryptographic hashing algorithms by 30% 2. These optimizations are already being implemented in widely-used programming languages like C++, potentially impacting trillions of computations daily 2. This advancement showcases how AI systems originally designed for games can be repurposed to solve real-world computational challenges, pushing the boundaries of algorithm efficiency and computer science innovation.

Stable Video

Multi-Angle Video Generation

Stability AI has introduced Stable Video 4D, a groundbreaking AI model that transforms a single object video into multiple novel-view videos from eight different angles 2. This innovative technology builds upon the company's Stable Video Diffusion model, moving from image-based video generation to full 3D dynamic video synthesis 2. Stable Video 4D generates 5-frame videos across 8 views in about 40 seconds, with the entire 4D optimization taking approximately 20 to 25 minutes 2. The model's ability to generate multiple novel-view videos simultaneously improves consistency in spatial and temporal axes, resulting in more detailed and faithful outputs compared to existing works 2. Potential applications include game development, video editing, and virtual reality, with ongoing research aimed at refining the model to handle a wider range of real-world videos 2 3.

ElevenLabs

Multilingual Speech Acceleration

ElevenLabs has released Turbo v2.5, a significant upgrade to their text-to-speech model that offers enhanced speed and language support. This new version provides a threefold increase in speed compared to its predecessor, with latency reduced to 300 milliseconds, making it ideal for real-time conversational AI applications 3. Turbo v2.5 now supports 32 languages, including Hindi, French, Spanish, and Mandarin, and introduces support for Vietnamese, Hungarian, and Swedish 4. The model is 25% faster than Turbo v2 for English text-to-speech conversion and is highly optimized for low-latency applications without compromising vocal performance 4. While it maintains high accuracy, especially with properly created instant voice clones, it does not include the style slider feature to prioritize speed 4. This release demonstrates ElevenLabs' commitment to advancing multilingual AI technology and improving user experience across various applications, from education to entertainment 3.

Gemma 2 2B

Google's Compact AI Models

Google has released Gemma 2 2B, a lightweight and efficient large language model designed for deployment on devices with limited resources. This 2 billion parameter model can run on just 1GB of GPU memory, making it suitable for use on laptops, desktops, and edge devices 1 2. Gemma 2 2B is available in both pre-trained and instruction-tuned variants, offering versatility for various text generation tasks including question answering, summarization, and reasoning 2 3. The model's relatively small size democratizes access to state-of-the-art AI capabilities, allowing for deployment in resource-constrained environments 2. Gemma 2 2B can be easily integrated with popular AI frameworks and tools such as LangChain, LlamaIndex, and Transformers, facilitating its use in a wide range of applications 1 3.

Meta SAM 2

Video Segmentation Breakthrough

Meta has introduced SAM 2 (Segment Anything Model 2), a groundbreaking AI model for object segmentation in both images and videos. Building upon the success of its predecessor, SAM 2 offers a unified architecture that enables real-time, promptable segmentation across different media types 1. The model achieves state-of-the-art performance in video object segmentation, outperforming existing methods on various benchmarks while requiring three times less interaction time 1 2. SAM 2's capabilities include zero-shot generalization, allowing it to segment previously unseen objects, and a streaming memory mechanism for efficient video processing 1 3. To train SAM 2, Meta created the SA-V dataset, containing over 600,000 masklet annotations across 51,000 diverse videos 2. This advancement in AI technology opens up new possibilities for applications in fields such as video editing, augmented reality, and scientific research 1 3.

Russian AI Challenger

YandexGPT Experimental, a new and more powerful version of Yandex's basic language model, has entered the top ranks of the LLM Arena rating, performing on par with advanced models like GPT-4 Turbo and Claude 3.5 Sonnet. 1 This achievement is particularly notable in the Russian language domain, where YandexGPT Experimental excels in answering questions. The LLM Arena, launched by independent developers from the Russian ML community, provides a platform for evaluating large language models through user-driven assessments, offering an objective benchmark for Russian-language AI capabilities. 1 This development signals Yandex's progress in the competitive field of AI language models and highlights the growing sophistication of Russian-language AI technologies.

JP Morgan

JPMorgan's AI Research Assistant

JPMorgan Chase has introduced LLM Suite, an in-house generative AI chatbot designed to function as a research analyst for employees in its asset and wealth management division 4 5. This AI tool, which is JPMorgan's version of OpenAI's ChatGPT, can assist with various tasks including writing, idea generation, problem-solving using spreadsheets, and summarizing documents 4. The bank has made LLM Suite available to many employees, with approximately 50,000 staff members (about 15% of its workforce) currently having access to the software 5. By developing its own AI chatbot, JPMorgan aims to enhance productivity while ensuring compliance with strict regulations and maintaining data security, addressing concerns that previously led the bank to restrict employee use of public AI tools like ChatGPT 5.

· 6 min read
Alexander Carrington
SearchGPT

AI-Powered Search Revolution

OpenAI has unveiled SearchGPT, a prototype AI-powered search engine that aims to challenge Google's dominance in online search. SearchGPT combines OpenAI's AI models, including ChatGPT, with real-time web information to provide fast, conversational answers along with clear links to relevant sources 1 3. The system is designed to enhance the search experience by highlighting high-quality content in a user-friendly interface 2. Currently available to a limited group of 10,000 test users and select publishers, OpenAI plans to integrate the best features of SearchGPT directly into ChatGPT in the future 1 2. This move positions OpenAI as a direct competitor to major search platforms like Google and Microsoft's Bing, potentially disrupting the online search market 3.

Yandex Research

Yandex LLM Compression Techniques

Yandex Research, in collaboration with IST Austria, NeuralMagic, and KAUST, has developed and open-sourced two innovative large language model (LLM) compression methods: Additive Quantization for Language Models (AQLM) and PV-Tuning. 1 2 These techniques aim to address the growing computational demands of LLMs by enabling efficient compression without significant loss in performance. AQLM allows for compressing LLMs to as low as 2 bits per parameter, potentially reducing model size by up to 16 times compared to standard 32-bit formats. 2 This advancement is particularly significant as it could make running large AI models more accessible and cost-effective for developers and researchers, potentially accelerating progress in the field of artificial intelligence.

Nemo Retriever

Enterprise Data Intelligence

NVIDIA has introduced NeMo Retriever, a new generative AI microservice designed to enhance the accuracy and capabilities of enterprise AI applications. Part of the NVIDIA NeMo platform, NeMo Retriever enables organizations to connect custom large language models (LLMs) to their proprietary data sources, facilitating highly accurate responses through retrieval-augmented generation (RAG) 2 4. This microservice offers GPU-accelerated tools for tasks such as document ingestion, semantic search, and interaction with existing databases, built on NVIDIA's software suite including CUDA, TensorRT, and Triton Inference Server 4. Major companies like Adobe, Cloudera, and NetApp are collaborating with NVIDIA to leverage NeMo Retriever, aiming to transform vast amounts of enterprise data into valuable business insights 4. The microservice is available through the NVIDIA API catalog and is part of the NVIDIA AI Enterprise software platform, supporting deployment across various cloud and data center environments 5.

Mistral Large 2

Powerful Multilingual AI Model

Mistral AI has unveiled Mistral Large 2 (ML2), a 123-billion-parameter large language model that claims to rival top models from OpenAI, Anthropic, and Meta in performance while using significantly fewer resources 5. ML2 boasts a 128,000 token context window, support for dozens of languages, and over 80 coding languages 3. According to Mistral's benchmarks, ML2 achieves an 84% score on the Massive Multitask Language Understanding (MMLU) test, approaching the performance of GPT-4 and Claude 3.5 Sonnet 5. Notably, ML2's smaller size compared to competitors allows for higher throughput and easier deployment on single servers with multiple GPUs, making it an attractive option for commercial applications 5. The model is available through Mistral's platform "la Plateforme" and Microsoft Azure, with a research license for non-commercial use 3 2.

Llama 3.1

Open AI Powerhouse

Meta has unveiled Llama 3.1, a series of open-source AI models that represent a significant advancement in the field of artificial intelligence. The flagship model, Llama 3.1 405B, boasts 405 billion parameters and is touted as the world's largest and most capable openly available foundation model 1 3. This model, along with updated 8B and 70B versions, offers improved capabilities in general knowledge, reasoning, tool use, and multilingual translation 5. Trained on over 15 trillion tokens using 16,000 Nvidia H100 GPUs, Llama 3.1 aims to compete with leading closed-source models like GPT-4 and Claude 3.5 Sonnet [undefined][3](https://www.wired.com/story/meta-ai-llama-3/). Meta's open-source approach allows developers to customize and enhance these models for various applications, potentially accelerating innovation in AI development 4. The release of Llama 3.1 marks a shift towards open-source AI becoming an industry standard, with Meta partnering with cloud providers to make the models widely accessible for enterprise use 4 5.

Mistral Nemo

AI Model Collaboration

Mistral AI and NVIDIA have unveiled Mistral NeMo 12B, a state-of-the-art language model designed for enterprise applications such as chatbots, multilingual tasks, coding, and summarization 1 4. This 12-billion-parameter model boasts a 128K context length, allowing it to process extensive and complex information more coherently and accurately 2 4. Trained on NVIDIA's DGX Cloud AI platform using 3,072 H100 80GB Tensor Core GPUs, Mistral NeMo 12B leverages Mistral AI's expertise in training data and NVIDIA's optimized hardware and software ecosystem 1 4. The model is released under the Apache 2.0 license, supports FP8 data format for efficient inference, and is packaged as an NVIDIA NIM inference microservice for easy deployment 2 4. Designed to fit on memory-efficient accelerators like NVIDIA L40S, GeForce RTX 4090, or RTX 4500 GPUs, Mistral NeMo offers high efficiency, low compute costs, and enhanced security features 3 4.

GPT-4o Mini

Cost-Efficient Multimodal Model

OpenAI has introduced GPT-4o mini, a cost-efficient small model designed to make AI more accessible and affordable. This multimodal model supports text and vision inputs, with a 128K token context window and knowledge up to October 2023 2. GPT-4o mini outperforms GPT-3.5 Turbo on various benchmarks while being more than 60% cheaper, priced at 15 cents per million input tokens and 60 cents per million output tokens 2. The model is now available in the Assistants API, Chat Completions API, and Batch API, with fine-tuning capabilities planned for release 2 4. While OpenAI has not disclosed the exact size of GPT-4o mini, it is reportedly in the same tier as other small AI models like Llama 3 8b, Claude Haiku, and Gemini 1.5 Flash 3.

· 5 min read
Alexander Carrington
Codestral

Codestral Mamba Unveiled

Mistral AI has unveiled Codestral Mamba 7B, a groundbreaking language model specialized in code generation. Based on the innovative Mamba2 architecture, this model offers linear time inference and the ability to handle sequences of theoretically infinite length, making it particularly efficient for coding applications 1. Codestral Mamba 7B achieves an impressive 75% score on HumanEval for Python coding, outperforming other open-source models in comparative evaluations 1 3. Released under the Apache 2.0 license, the model is freely available for use, modification, and distribution, with deployment options including the mistral-inference SDK and TensorRT-LLM 1. Alongside Codestral Mamba 7B, Mistral AI also introduced Mathstral 7B, a model designed for mathematical reasoning and scientific discovery, further expanding their suite of specialized AI tools 2 3.

Fujitsu - Cohere

Fujitsu Cohere AI Partnership

Fujitsu has announced a strategic partnership with Cohere Inc., a Toronto-based AI company, to develop and provide large language models (LLMs) with advanced Japanese language capabilities. 1 2 As part of this collaboration, Fujitsu has made a significant investment in Cohere and will become the exclusive provider of jointly developed services globally. 3 The partnership will focus on creating Takane, an advanced Japanese language model based on Cohere's Command R+ LLM, which features enhanced retrieval-augmented generation capabilities to mitigate hallucinations. 2 4 Fujitsu plans to offer Takane through its Kozuchi AI services starting in September 2024, targeting private cloud environments for enterprises in highly regulated industries. 3 5 This partnership aims to accelerate the adoption of generative AI globally while addressing specific industry needs and security requirements. 4 5

Deepl

DeepL's Translation Breakthrough

DeepL, a leading Language AI company, has launched a next-generation language model specifically built for translation and editing. This new large language model (LLM) outperforms competitors like Google Translate, ChatGPT-4, and Microsoft in translation quality and fluency, according to blind tests conducted with professional linguists 1 3. The model leverages DeepL's proprietary data accumulated over seven years, focusing on content creation and translation rather than relying on public internet data 3. It also incorporates "human model tutoring," where thousands of language experts were involved in training the AI to achieve superior translation results 3. The new LLM is currently available for DeepL Pro users in English, German, Japanese, and Simplified Chinese, with more languages planned for the future 1 3.

Spreadsheet LLM

AI-Powered Spreadsheet Analysis

Microsoft has unveiled SpreadsheetLLM, an innovative large language model designed to revolutionize spreadsheet analysis and interaction. This AI system addresses the longstanding challenge of applying LLMs to complex spreadsheet structures by introducing a novel encoding method called SheetCompressor. The framework compresses spreadsheets by up to 96%, allowing LLMs to handle large datasets within processing limits while preserving data integrity 1 2. SpreadsheetLLM employs a "Chain of Spreadsheet" (CoS) approach, breaking down spreadsheet reasoning into steps like table detection, matching, and reasoning 2. In tests, the model significantly outperformed existing methods for spreadsheet table detection and enhanced the capabilities of established LLMs like GPT-3.5 and GPT-4 in understanding spreadsheets 2. This breakthrough has the potential to transform data management and analysis across various industries, particularly in finance, accounting, and business analytics 3 4.

OpenGPT-X

European LLM Leaderboard Launch

The OpenGPT-X team has published a European LLM Leaderboard, addressing the need for broader language accessibility and robust evaluation metrics in multilingual language models 1 2. This initiative, part of the BMWK project OpenGPT-X, aims to advance the development and assessment of large language models (LLMs) across 21 of the 24 supported European languages 4. The leaderboard compares several publicly available state-of-the-art language models, each comprising around 7 billion parameters, on tasks such as logical reasoning, commonsense understanding, multi-task learning, truthfulness, and translation 4. To ensure comparability, common benchmarks like ARC, HellaSwag, TruthfulQA, GSM8K, and MMLU were machine-translated using DeepL, with additional multilingual benchmarks incorporated 4. This effort promotes more versatile approaches in language technology and supports the development of AI models that can effectively serve a wider European audience 4.

Search

AI Search Showdown

Perplexity AI has emerged as a formidable challenger to Google's long-standing dominance in web search, leveraging advanced AI technologies to provide a more intuitive and efficient search experience 2. Unlike Google's traditional list of links, Perplexity uses generative AI to deliver direct answers to user queries, offering a clean, minimalist interface that aims to reduce cognitive overload 2. The startup, valued at $1 billion, has gained significant traction, handling about 500 million queries in 2023 2. However, Google is not standing idle, integrating AI features into its search results and leveraging its vast resources to improve its generative AI capabilities 3. While Perplexity's innovative approach has garnered attention, experts note that unseating Google will require more than just a chatbot, as search encompasses a wide range of functions beyond information retrieval 3.

Mockingbird LLM

Mockingbird LLM Launch

Vectara, a pioneer in Retrieval-Augmented Generation (RAG) technology, has secured $25 million in Series A funding, bringing its total funding to $53.5 million. 5 Alongside this investment, the company unveiled Mockingbird LLM, a purpose-built large language model designed specifically for enterprise RAG applications. 5 Mockingbird aims to provide more transparent conclusions and adhere closely to factual information, addressing key challenges in enterprise AI such as hallucination risks and citation accuracy. 5 The model is optimized for structured outputs like JSON, crucial for agent-driven AI workflows, and integrates with Vectara's existing RAG pipeline, which includes advanced features like hallucination detection and security measures against prompt attacks. 5

· 6 min read
Alexander Carrington
Claude

Prompt Testing for Claude

Anthropic has introduced new prompt testing and evaluation features for its Claude 3.5 Sonnet language model, accessible through the Anthropic Console 1 4. Developers can now generate, test, and assess prompts using the built-in prompt generator, allowing them to optimize inputs and improve Claude's responses for specific tasks 2. The new Evaluate tab enables users to test prompts across various scenarios by uploading real-world examples or generating AI-driven test cases 3. This functionality streamlines the prompt engineering process, allowing developers to compare different prompts side-by-side and rate responses on a five-point scale 1 2. While these tools may not entirely replace prompt engineers, they aim to assist both novice and experienced users in rapidly improving their AI applications' performance 2 3.

AMD-Silo

AMD Acquires Silo AI

AMD has announced plans to acquire Silo AI, Europe's largest private AI lab, for approximately $665 million in an all-cash transaction 1 2. This strategic move aims to enhance AMD's AI capabilities and strengthen its position in the competitive AI hardware market 3. Silo AI, founded in 2017 and headquartered in Helsinki, Finland, brings extensive experience in developing tailored AI models, platforms, and solutions for leading enterprises 1. The acquisition is expected to accelerate AMD's AI strategy by providing access to top-tier AI talent, including over 300 AI experts with 125 PhDs, and expertise in large language models (LLMs), MLOps, and AI integration solutions 2. This marks AMD's third AI-focused acquisition within a year, following Mipsology and Nod.ai, as part of its efforts to build an end-to-end silicon-to-services platform and close the gap with market leader NVIDIA 2 4.

Oracle

Oracle's HeatWave GenAI Innovations

Oracle has announced the general availability of HeatWave GenAI, a database-as-a-service platform that introduces innovative features for generative AI applications. A key highlight is the integration of in-database large language models (LLMs), including quantized versions of Llama 3 and Mistral, which Oracle claims is an industry first 3 4. This approach allows organizations to deploy LLMs directly where their data resides, using standard CPUs rather than GPUs, potentially reducing infrastructure costs 2. HeatWave GenAI also includes an automated vector store that simplifies the process of converting existing data into vector embeddings, enabling semantic search and other natural language applications without requiring extensive developer expertise 3. These features, combined with HeatWave's existing capabilities like AutoML, aim to streamline the development of AI-powered applications and provide more synergy between different AI functionalities within the database environment 3 4.

Chinese AI Model Advancements

China's AI giants Alibaba and SenseTime have recently unveiled new AI models, intensifying competition in the country's rapidly evolving AI landscape. SenseTime introduced SenseNova 5.5, claiming a 30% performance improvement over its predecessor and surpassing GPT-4 in several key metrics 1. The company also launched multimodal and terminal-based models, demonstrating advanced cross-modal information integration capabilities 1. Meanwhile, Alibaba Cloud reported significant growth in its Tongyi Qianwen model, with downloads doubling to over 20 million in two months 1. The company emphasized its commitment to open-source initiatives, aiming to narrow the gap between open-source and closed-source models 1. These developments highlight the fierce competition among Chinese tech companies to establish dominance in the domestic AI market, with both firms leveraging major industry events to showcase their progress 1 2.

GraphRAG

Microsoft's Graph-Based RAG

Microsoft Research has introduced GraphRAG, a novel approach to Retrieval Augmented Generation (RAG) that leverages knowledge graphs to enhance reasoning capabilities over complex information and private datasets 1 3. Unlike traditional RAG methods that rely on vector similarity searches, GraphRAG uses Large Language Models (LLMs) to automatically extract rich knowledge graphs from text documents 3. This approach creates a hierarchical structure of "communities" within the data, allowing for more comprehensive and diverse responses to queries 3. GraphRAG has demonstrated superior performance in answering holistic questions about large datasets and connecting disparate pieces of information, outperforming baseline RAG methods in comprehensiveness and diversity with a 70-80% win rate 3. Microsoft has made GraphRAG publicly available on GitHub, along with a solution accelerator for easy deployment on Azure, aiming to make graph-based RAG more accessible for users dealing with complex data discovery tasks 3.

LLM Search Integration

Recent developments in integrating search capabilities with Large Language Models (LLMs) have led to two distinct approaches: Search4LLM and LLM4Search. Search4LLM enhances LLMs by incorporating external search capabilities, allowing models to access up-to-date information beyond their training data. This approach is exemplified by projects like pyLLMSearch, which offers advanced RAG (Retrieval-Augmented Generation) systems with features such as hybrid search, deep linking, and support for multiple document collections 1. Conversely, LLM4Search explores using LLMs to improve search experiences, as demonstrated by the ONS's "StatsChat" project, which uses embedding search and generative question-answering to provide more relevant and context-aware search results 2. However, challenges remain in implementing web search for local LLMs, with current solutions often relying on pre-indexed datasets rather than real-time web browsing 3 4. Additionally, using LLMs solely for search presents efficiency concerns, leading to the development of hybrid approaches like Retrieval Augmented LLMs (raLLM) that combine traditional information retrieval systems with LLM capabilities 5.

Amazon's RAG Benchmark

Amazon's AWS researchers have proposed a new benchmarking process to evaluate the performance of Retrieval-Augmented Generation (RAG) systems in answering domain-specific questions. The method, detailed in a paper titled "Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation," aims to provide a standardized, scalable, and interpretable approach to scoring different RAG systems 2. The benchmark generates multiple-choice exams tailored to specific document corpora, testing large language models (LLMs) in closed-book, "Oracle" RAG, and classical retrieval scenarios 2. Key findings from the study suggest that optimizing RAG algorithms can lead to performance improvements surpassing those achieved by simply using larger LLMs, highlighting the importance of efficient retrieval methods in AI development 2. Additionally, the research emphasizes the potential risks of poorly aligned retriever components, which can degrade LLM performance compared to non-RAG versions 2.

· 6 min read
Alexander Carrington
Grok

Grok AI Upgrade Timeline

Elon Musk has announced that xAI's next large language model, Grok 2, will be released in August 2024, with Grok 3 following by the end of the year 1 3. Musk claims Grok 2 will be "a giant improvement" in purging other LLMs from its internet training data, addressing concerns about AI models training on each other's outputs 1 4. Grok 3 is set to be trained on 100,000 Nvidia H100 GPUs, which Musk suggests "should be really something special" 1 3. While Grok 1.5 showed strong performance on certain benchmarks, Grok remains less popular than competitors like ChatGPT and Gemini, largely due to its lack of a free version and high subscription costs tied to X Premium+ 3 5.

GPT-5

GPT-5 Development Insights

OpenAI CEO Sam Altman has provided insights into the development of GPT-5, describing it as a "significant leap forward" over its predecessor, GPT-4. According to Altman, GPT-5 aims to address many of the shortcomings of GPT-4, including its limitations in reasoning and tendency to make obvious mistakes 3 5. While specific details and a launch date remain undisclosed, Altman indicated that there is still substantial work to be done on the model 3. He compared the development process to that of the iPhone, suggesting that like early iPhones, initial versions may have imperfections but will be sufficiently useful 3. Altman's comments hint at GPT-5 being in the early stages of development, with the potential to revolutionize AI capabilities once released 1 3.

Open LLM Leaderboard

Hugging Face's Open LLM Leaderboard Upgraded

Hugging Face has unveiled the Open LLM Leaderboard v2, a significant upgrade designed to address the limitations of its predecessor in evaluating language models 1. The new leaderboard introduces six more rigorous benchmarks, including MMLU-Pro, GPQA, MuSR, MATH, IFEval, and BBH, to test a wider range of model capabilities and counter benchmark saturation issues 1. A key improvement is the adoption of normalized scores for fairer model ranking, replacing the previous method of summing raw scores 1. This revamp aims to provide more reliable insights into model capabilities, push the boundaries of model development, and enhance reproducibility in the field of language model evaluation 1. The Hugging Face team anticipates continued innovation as more models are assessed on this new, more challenging leaderboard 1.

Gemma 2

Google Unveils Gemma 2

Google has officially released Gemma 2, the latest iteration of its open-weight AI model family, to researchers and developers worldwide. Available in 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 offers improved performance and efficiency compared to its predecessor 3 4. The 27B model is particularly noteworthy, delivering performance competitive with proprietary models more than twice its size, approaching the capabilities of larger models like Llama 3 70B and Claude 3 Sonnet 4. Gemma 2 is designed for broad compatibility, integrating with popular AI frameworks and optimized for rapid inference across various hardware setups, from high-end cloud systems to consumer-grade gaming laptops 4. The model is now accessible through Google AI Studio and will soon be available in the Vertex AI Model Garden, with model weights downloadable from Kaggle and Hugging Face Models 3 4.

CriticGPT

OpenAI CriticGPT

OpenAI has introduced CriticGPT, a new AI model based on GPT-4 designed to detect bugs in code generated by ChatGPT 1 3. This tool aims to enhance the process of AI alignment through Reinforcement Learning from Human Feedback (RLHF) 3. CriticGPT analyzes code and flags potential errors, with its critiques preferred by annotators over human critiques in 63% of cases involving naturally occurring LLM errors 1. The researchers also developed a technique called Force Sampling Beam Search (FSBS) to help CriticGPT produce more detailed code reviews while allowing users to adjust its accuracy and control false positive rates 1 3. While CriticGPT shows promise in improving AI-generated code quality, it may struggle with evaluating longer and more complex tasks 3.

GPT-5

Meta's LLM Compiler, AI-Powered Code Optimization

Meta has unveiled the Meta Large Language Model (LLM) Compiler, a suite of open-source models designed to revolutionize code optimization and compiler design. Trained on 546 billion tokens of LLVM-IR and assembly code, the LLM Compiler demonstrates impressive capabilities in code size optimization and disassembly tasks 1. In tests, it achieved 77% of the optimizing potential of an autotuning search and showed a 45% success rate in round-trip disassembly 1. Meta's decision to release the LLM Compiler under a permissive commercial license allows both researchers and industry practitioners to build upon this technology, potentially accelerating innovation in AI-driven compiler optimizations 1. However, some experts remain skeptical about the practical applications and accuracy of using LLMs for compiler tasks that traditionally require determinism and 100% accuracy 4.

SLMs

Shift to Smaller Models

Apple and Microsoft are leading a shift in focus from Large Language Models (LLMs) to Small Language Models (SLMs), emphasizing on-device AI capabilities and privacy. Apple recently introduced Apple Intelligence, powered by Apple Reference Resolution As Language Modeling (ReALM), which combines a Small Language Model (SLM) for on-device processing with a larger cloud-based LLM 1. This approach allows for personalized AI experiences while preserving user privacy. Similarly, Microsoft has launched Phi-3, an SLM with 3.8 billion parameters designed to run on resource-constrained devices like smartphones 4. Phi-3 demonstrates competitive performance comparable to much larger models, achieving 69% on the MMLU benchmark 4. Both companies are leveraging techniques such as quantization to reduce model size while maintaining accuracy, enabling AI capabilities on edge devices and addressing privacy concerns 2 4. This shift towards SLMs represents a significant development in AI technology, offering benefits such as reduced latency, improved response times, and enhanced data security.