Meta on Thursday released Code Llama, a new AI model built on top of Llama 2, designed to assist developers to autonomously generate programming code. The tool is meant for publicly available large language models (LLMs) on coding tasks.
Code Llama is a code-specialized version of Llama 2, which was created by further training Llama 2 on its code-specific datasets, sampling more data from that same dataset for longer.
“It has the potential to make workflows faster and more efficient for developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software,” Meta wrote in a blog post on Thursday.
According to Meta, Code Llama features improved coding capabilities, which means it can generate code and natural language about code from both code and natural language prompts. For example, users can say, “Write me a function that outputs the fibonacci sequence,” and Code Llama will generate code accordingly.
Meta is releasing three sizes of Code Llama with 7 billion, 13 billion, and 34 billion parameters. Each of these models is trained with 500 billion tokens of code and code-related data.
The 7B and 13B base and instruct models have also been trained with fill-in-the-middle (FIM) capability, which enables them to insert code into existing code. This means they can support tasks like code completion right out of the box.
These three models address a range of serving and latency requirements. For example, the smaller 7B and 13B models are faster and more appropriate for low latency cases like real-time code completion. On the other hand, the 34B model offers the best overall results and allows for better coding assistance.
In addition to the base Code Llama model, Meta also fine-tuned two additional variations of Code Llama: Code Llama – Python and Code Llama – Instruct.
While Code Llama-Python, a Python-specialized version, has been further fine-tuned on 100B tokens of Python code for code generation tasks, the Instruct version has been fine-tuned to generate helpful and safe answers in natural language.
“Programmers are already using LLMs to assist in a variety of tasks. The goal is to make developer workflows more efficient so that they can focus on the most human-centric aspects of their job, rather than repetitive tasks,” the company added.
“We believe that AI models, and LLMs for coding in particular, benefit most from an open approach, both in terms of innovation and safety. Publicly available, code-specific models can facilitate the development of new technologies that improve peoples’ lives. By releasing code models like Code Llama, the entire community can evaluate their capabilities, identify issues and fix vulnerabilities.”
Code Llama is available for both research and commercial use for free under the same community license as Llama 2.
It is also designed to support software engineers in all sectors, including research, industry, open-source projects, NGOs, and businesses.