How to Use Llama 3.1 with Langchain and Ollama?

Using Llama 3.1 with Langchain, Ollama & get Multi-Modal Capabilities

4 min readJul 27, 2024

**Source:** **https://ai.meta.com/blog/meta-llama-3/**

This is a tutorial where you will learn how to use Llama 3.1 and build some applications. This will be using Python.

In this tutorial, we will be covering the following:

Llama 3.1 Key Features
Llama 3.1 Usage
Llama 3.1 with Langchain
Llama 3.1 with Ollama
Llama 3.1 & Multi Modal Features

Llama 3.1 Key Features

Below are the features of Llama 3.1:

Largest Open Model: Llama 3.1 405B is the largest openly available model with 405 billion parameters.
Extended Context Length: Supports a context length of 128K tokens, enabling advanced use cases.
Multilingual Support: Handles eight languages, enhancing global accessibility and usability.
Synthetic Data Generation: Capable of generating high-quality synthetic data for model improvement and training.
Tool Integration: Includes Llama Guard 3 and Prompt Guard for security and safety in applications.
High-Performance Training: Trained on 15 trillion tokens using 16,000 H100 GPUs for optimized performance.
Instruction and Chat Fine-Tuning: Improved instruction-following and chat capabilities through iterative post-training.
Quantized Model: Uses 8-bit (FP8) numerics for efficient inference on single server nodes.
Extensive Benchmarking: Evaluated on over 150 datasets, showing competitiveness with leading models like GPT-4.
Ecosystem and Partnerships: Supported by over 25 partners including AWS, NVIDIA, and Google Cloud, facilitating immediate development and deployment.

Llama 3.1 Usage

Here is how you can access Llama from Meta and Hugging Face:

Direct Download from Meta:

Download the model weights from Meta’s official Llama website: llama.meta.com

Hugging Face:

Available on the Hugging Face platform for easy access to machine learning models.

Here is how you can use Llama 3.1 with Meta AI, HuggingChat and Groq:

Meta AI (US Users):

Visit meta.ai and sign in with your Facebook or Instagram account.
Alternatively, use Meta AI on WhatsApp.

HuggingChat (Non-US Users)

Visit HuggingChat and chat with the model without signing up.

Groq

Llama 3.1 with Langchain

Install Required Libraries

Run pip install transformers langchain.

2. Upgrade Transformers

Ensure you have the latest version of transformers by upgrading if necessary.

3. Set Up Hugging Face Token

Generate a Hugging Face token and set it up in your environment.

4. Choose Model

Use the instruct model variant for better question-answering capabilities.
Example: meta-llama/llama-3.1-8b-instruct.

5. Create Transformers Pipeline:

Initialize the pipeline with:

from transformers import pipeline model_id = "meta-llama/llama-3.1-8b-instruct" pipeline = pipeline("text-generation", model=model_id)

6. Configure Pipeline: Set parameters like max_length to control output length:

pipeline = pipeline("text-generation", model=model_id, max_length=50)

7. Wrap Pipeline with LangChain: Import necessary LangChain components:

from langchain import HuggingFacePipeline, PromptTemplate, LLMChain

Wrap the pipeline:

hf_pipeline = HuggingFacePipeline(pipeline)

8. Create Prompt Template: Define your prompt template for the application:

prompt = PromptTemplate("Tell me about {entity} in short.")

9. Build LLMChain: Combine the pipeline and prompt into an LLMChain:

llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)

10. Run Inference: Use the LLMChain to generate responses:

response = llm_chain.run({"entity": "Virat Kohli"}) print(response)

Llama 3.1 with Ollama

Install Ollama Software:

Download and install Ollama from the official website.
Ensure the Ollama instance is running in the background.

Load Llama 3.1 Model:

Run the command ollama run llama-3.1.
The default 8B model (5GB) will be loaded.

Start Using Llama 3.1:

Begin chatting by asking questions directly to the model.

Example Commands:

Basic Interaction: Hello, how are you?
Code Generation: Get me a Python code for string reversal.
Information Retrieval: Tell me about India in short.

Performance Notes:

Without GPU, inference might be slower.
For faster performance, use a GPU and try larger models.

This setup allows you to use Llama 3.1 locally in an offline mode.

Multimodal Capabilities of Llama 3.1

Text and Image Understanding: Combines text and image inputs for enhanced comprehension and interaction.
Extended Context Length: Handles longer context windows, improving coherence and context retention.
Multilingual Support: Supports multiple languages for broader accessibility and usability.
Code Generation: Enhanced capabilities in generating and understanding code.
Enhanced Reasoning: Improved reasoning capabilities for complex problem-solving tasks.
Advanced Instruction Following: Better at following detailed and complex instructions.
Safety and Security Tools: Includes Llama Guard 2, Code Shield, and CyberSec Eval 2 for secure applications.
Human Evaluation: Evaluated on 1,800 prompts across 12 key use cases to ensure high performance.

Below is a tutorial to witness the multi-modal capabilities of Llama 3.1:

Wrap up

Follow for more insights and comment on your experience of using Llama 3.1.