LLAMAFILE V0.8.14: A new UI, performance gains, and more - Mozilla Hacks

We’ve just released LLAMAFILE 0.8.14The latest version of our popular open source ai tool. A Mozilla Builders ProjectLLAMAFILE TURANS MODEL Weights Into Fast, Convenient Executables that Run on Most Computers, Making it Easy for Anyone to Get the Most Out of Open Llms Using The Hardware A Alredy Ph.

New chat interface

Table of Contents

The key feature of this new release is Our colorful new command line chat interfaceWhen you launch a lLAMAFILE We Now Automatically Open This New Chat Ui for You, Right there in the terminal. This new interface is fast, easy to use, and an all around Simpler Experience Than the Web-Based Interface We Previous Launched by default. (That interface, which our project inherits from the upstream llama.cpp project, is still available and supports a range of features, including image uploads. Simply Point Your Browser at Port 8080 on Localhost).

llamafile

Other recent improvements

This new chat ui is just the tip of the iceberg. In the months since our last blog post here, lead development Justine tunney Has been busy shipping a SLEW of New releases, Each of which have moved the project forward in important ways. Here are just a less of the highlights:

Llamafiler: We’re Building Our Own Clean Sheet Openai-Compatible API Server, Called LlamafilerThis new server will be more reliable, stable, and most of all faster Than the one it replaces. Weian Three times as fast As the one in llama.cpp. Justine is currently working on the completes endpoint, at which point llamafiler will become the default api server for llamafile.

Performance improvements: With the help of open source contributors like k-quant inventor @Kawrakow Llamafile has enjoyed a series of dramatic speed boots over the last few months. In Particular, Pre-Fill (Prompt evaluation)

Intel Core I9 Went from 100 Tokens/Second to 400 (4x).
AMD Threadripper Went from 300 Tokens/Second to 2,400 (8x).
Even the modest raspberry pi 5 jumped from 8 tokens/second to 80 (10x!).

When Combined with the New High-Speed Embedding Server Described Above, llamafile has become one of the fastest women (Rag).

Support for powerful new models: LLAMAFILE Continues to Keep Pace With Progress in open llms, adding support for dozens of new models and architecture, ranging in size from 405 billion parameters all the way down to 1 billion. Here are just a less of the new llamafiles available for download on hugging face,

LLAma 3.2 1b and 3B: Offering extramely impressive performance and quality for their small size. (Here’s a video from our oven mike heavers showing it in action.)
LLAma 3.1 405b: a true “Frontier Model” that’s Possible to Run at home With Supfficient System Ram.
Olmo 7B: from our friends at the Allen InstituteOlmo is one of the first truly open and transparent models available.
Trilm: A new “1.58 bit” tiny model that is optimized for cpu infection and points to a near future where Matrix Multiplication might no longer rule the day.

Whisperfile, speech-to-text in a single file: Thanks to contributions from Community Member @cjpaisWe’ve Created WhisperfileWhoch does for whisper.cp what llamafile did for llama.cp: that is, turns it into a multi-Platform execution Whisperfile Thus Makes it Easy to Use Openai’s Whisper Technology to Efficiently Convert Speech Into Text, No Matter which Kind of Hardware You Have.

Get involved

Our goal is for llamafile to become a Rock-Solid Foundation for Building Sophisticated Locally-Running Ai Applications. Justine’s work on the new llamafiler server is a big part of that equation, but so is the ongoing work of supporting new models and optimizing infection performance for as many users. We’re proud and grateful that some of the project’s biggest breakthroughs in these areas, and others, have come from the community, with contributors like @Kawrakow, @cjpais, @mofosyneand @Djip007 routinely leaveing their mark.

We invite you to join them, and us. We Welcome issues and prs in Our github repoAnd we welcome you to become a member of mozilla’s ai discord server, which has A dedicated channel just for llamafile Where you can get direct access to the project team. Hope to see you there!

Stephen leads open source ai projects (Including llamfile) in Mozilla Builders. He previous managed social bookmarking pioneer del.icio.us; Co-founded storium, Blockboard, and Farespin; And worked on yahoo search and bea weblogic.