How to Use Llama 3.1 with Langchain and Ollama?
Using Llama 3.1 with Langchain, Ollama & get Multi-Modal Capabilities
This is a tutorial where you will learn how to use Llama 3.1 and build some applications. This will be using Python.
In this tutorial, we will be covering the following:
Llama 3.1 Key Features
Llama 3.1 Usage
Llama 3.1 with Langchain
Llama 3.1 with Ollama
Llama 3.1 & Multi Modal Features
Llama 3.1 Key Features
Below are the features of Llama 3.1:
- Largest Open Model: Llama 3.1 405B is the largest openly available model with 405 billion parameters.
- Extended Context Length: Supports a context length of 128K tokens, enabling advanced use cases.
- Multilingual Support: Handles eight languages, enhancing global accessibility and usability.
- Synthetic Data Generation: Capable of generating high-quality synthetic data for model improvement and training.
- Tool Integration: Includes Llama Guard 3 and Prompt Guard for security and safety in applications.
- High-Performance Training: Trained on 15 trillion tokens using 16,000 H100 GPUs for optimized performance.
- Instruction and Chat Fine-Tuning: Improved instruction-following and chat capabilities through iterative post-training.
- Quantized Model: Uses 8-bit (FP8) numerics for efficient inference on single server nodes.
- Extensive Benchmarking: Evaluated on over 150 datasets, showing competitiveness with leading models like GPT-4.
- Ecosystem and Partnerships: Supported by over 25 partners including AWS, NVIDIA, and Google Cloud, facilitating immediate development and deployment.
Llama 3.1 Usage
Here is how you can access Llama from Meta and Hugging Face:
Direct Download from Meta:
- Download the model weights from Meta’s official Llama website: llama.meta.com
Hugging Face:
- Available on the Hugging Face platform for easy access to machine learning models.
Here is how you can use Llama 3.1 with Meta AI, HuggingChat and Groq:
Meta AI (US Users):
- Visit meta.ai and sign in with your Facebook or Instagram account.
- Alternatively, use Meta AI on WhatsApp.
HuggingChat (Non-US Users)
Visit HuggingChat and chat with the model without signing up.
Groq
Sign up on groq.com and choose the Llama 3.1 model from the menu.
Llama 3.1 with Langchain
- Install Required Libraries
- Run
pip install transformers langchain
.
2. Upgrade Transformers
- Ensure you have the latest version of
transformers
by upgrading if necessary.
3. Set Up Hugging Face Token
- Generate a Hugging Face token and set it up in your environment.
4. Choose Model
- Use the instruct model variant for better question-answering capabilities.
- Example:
meta-llama/llama-3.1-8b-instruct
.
5. Create Transformers Pipeline:
- Initialize the pipeline with:
from transformers import pipeline model_id = "meta-llama/llama-3.1-8b-instruct" pipeline = pipeline("text-generation", model=model_id)
6. Configure Pipeline: Set parameters like max_length
to control output length:
pipeline = pipeline("text-generation", model=model_id, max_length=50)
7. Wrap Pipeline with LangChain: Import necessary LangChain components:
from langchain import HuggingFacePipeline, PromptTemplate, LLMChain
Wrap the pipeline:
hf_pipeline = HuggingFacePipeline(pipeline)
8. Create Prompt Template: Define your prompt template for the application:
prompt = PromptTemplate("Tell me about {entity} in short.")
9. Build LLMChain: Combine the pipeline and prompt into an LLMChain:
llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
10. Run Inference: Use the LLMChain to generate responses:
response = llm_chain.run({"entity": "Virat Kohli"}) print(response)
Llama 3.1 with Ollama
Install Ollama Software:
- Download and install Ollama from the official website.
- Ensure the Ollama instance is running in the background.
Load Llama 3.1 Model:
- Run the command
ollama run llama-3.1
. - The default 8B model (5GB) will be loaded.
Start Using Llama 3.1:
- Begin chatting by asking questions directly to the model.
Example Commands:
- Basic Interaction:
Hello, how are you?
- Code Generation:
Get me a Python code for string reversal.
- Information Retrieval:
Tell me about India in short.
Performance Notes:
- Without GPU, inference might be slower.
- For faster performance, use a GPU and try larger models.
This setup allows you to use Llama 3.1 locally in an offline mode.
Multimodal Capabilities of Llama 3.1
- Text and Image Understanding: Combines text and image inputs for enhanced comprehension and interaction.
- Extended Context Length: Handles longer context windows, improving coherence and context retention.
- Multilingual Support: Supports multiple languages for broader accessibility and usability.
- Code Generation: Enhanced capabilities in generating and understanding code.
- Enhanced Reasoning: Improved reasoning capabilities for complex problem-solving tasks.
- Advanced Instruction Following: Better at following detailed and complex instructions.
- Safety and Security Tools: Includes Llama Guard 2, Code Shield, and CyberSec Eval 2 for secure applications.
- Human Evaluation: Evaluated on 1,800 prompts across 12 key use cases to ensure high performance.
Below is a tutorial to witness the multi-modal capabilities of Llama 3.1:
Wrap up
Follow for more insights and comment on your experience of using Llama 3.1.