Using LangChain with Llama 2 | Generative AI Series - Knowledgebase

Introduction

Langchain is an Artificial Intelligence (AI) framework that simplifies coding when creating apps that implement external data sources and Large Language Models(LLMs).

You can use the framework to create personal assistants, chatbots, Q&A applications, and more. Langchain ships with different libraries that allow you to interact with various data sources like PDFs, spreadsheets, and databases (For instance, Chroma, Pinecone, Milvus, and Weaviate).

Chroma is an open-source embedding database that accelerates building LLM apps that require storing vector data and performing semantic searches.

In this guide, you'll implement the Langchain framework to orchestrate LLMs with a Chroma database.

Prerequisites

Before you begin:

Deploy a new Ubuntu 22.04 A100 Rcs Cloud GPU Server with at least:
- 80 GB GPU RAM
- 12 vCPUs
- 120 GB Memory
Establish an SSH connection to the server.
Create a non-root user with sudo rights and switch to the account.
Create a HuggingFace account.
Create a Hugging Face user access token.

Install the Required Python Libraries

You need to install the following Python libraries. Later on this guide, you'll run some Python scripts that use the libraries to work:

                        console
                        
                    
$ sudo apt update
$ pip install huggingface-hub pypdf langchain text_generation sentence-transformers chromadb gradio 
$ pip install -U jinja2

Run the Hugging Face Text Generation Inference Container

This guide requires Llama 2 model API. You'll expose the API by running the Hugging Face text generation inference Docker container. Follow the steps below:

Initialize the Docker container variables. Replace $HF_TOKEN with your correct Hugging Face token.
console
```
$ model=meta-llama/Llama-2-7b-chat-hf
volume=$PWD/data
token=$HF_TOKEN
```

Run the Hugging Face text generation Docker container.

                        console
                        
                    
$ sudo docker run -d  \
--name hf-tgi  \
--runtime=nvidia  \
--gpus all  \
-e HUGGING_FACE_HUB_TOKEN=$token  \
-p 8080:80  \
-v $volume:/data  \
ghcr.io/huggingface/text-generation-inference:1.1.0  \
--cuda-memory-fraction 0.5  \
--model-id $model  \
--max-input-length 2048  \
--max-total-tokens 4096

Wait for the container to download and load. Then, check the logs to ensure the container listens for incoming connections.
console
```
$ sudo docker logs -f hf-tgi
```

Ensure you get the following output.

Status: Downloaded newer image for ghcr.io/huggingface/text-generation-inference:1.1.0
c820c75b4a29ebff7a889443e65305bbf2b632b9299ffc6f3d445353ac1dbbd4

Implement a Basic Langchain Script

Follow the steps below to create a sample Langchain application to generate a query based on a prompt:

Create a new langchain-llama.py file using a text editor like nano.
console
```
$ nano langchain-llama.py
```

Enter the following information into the langchain-llama.py file.

                        python
                        
from langchain.llms import HuggingFaceTextGenInference

import warnings
warnings.simplefilter('ignore')

URI = "http://127.0.0.1:8080/"

llm = HuggingFaceTextGenInference(inference_server_url = URI)

print(llm("What is the capital of France?").strip())

Save and close the file.

Run the file.

console

$ python3 langchain-llama.py

Output:

The capital of France is Paris.

Prompt a Langchain Application

The Langchain framework accepts LLM prompts. Incorporate the prompt in your Python code by following the steps below:

Open a new langchain-llama-prompt.py file.
console
```
$ nano langchain-llama-prompt.py
```

Enter the following information into the langchain-llama-prompt.py file.

                        python
                        
                    
from langchain.llms import HuggingFaceTextGenInference
from langchain.llms import HuggingFaceTextGenInference
from langchain import PromptTemplate
from langchain.schema import StrOutputParser

import warnings
warnings.simplefilter('ignore')

URI = "http://127.0.0.1:8080/"
llm = HuggingFaceTextGenInference(inference_server_url = URI)

template = """
    <s>[INST] <<SYS>>
    {role}
    <</SYS>>       
    {text} [/INST]
"""

prompt = PromptTemplate(
    input_variables = [
        "role", 
        "text"
    ],
    template = template,
)

role = "Act as a Machine Learning engineer who is teaching high school students."

text = "Explain what is artificial intelligence in 2-3 sentences"

print(prompt.format(role = role, text = text))

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"role": role,"text":text}))

Save and close the file.

Run the file

console

$ python3 langchain-llama-prompt.py

Output:

Hey there, young minds! *excited tone* Artificial intelligence (AI) is like a superhero for computers - it gives them the power to think and learn like humans! AI is a field of computer science that focuses on creating machines that can perform tasks that typically require human intelligence, like recognizing images, understanding speech, and making decisions. Just like how superheroes have special powers, AI algorithms have the ability to learn from data and improve their performance over time!

Knowledgebase

Categories

Categories

Support

Using LangChain with Llama 2 | Generative AI Series Print

Introduction

Prerequisites

Install the Required Python Libraries

Run the Hugging Face Text Generation Inference Container

Implement a Basic Langchain Script

Prompt a Langchain Application

Was this answer helpful?

Related Articles

Support

Knowledgebase

Categories

Categories

Support

Using LangChain with Llama 2 | Generative AI Series Print

Introduction

Prerequisites

Install the Required Python Libraries

Run the Hugging Face Text Generation Inference Container

Implement a Basic Langchain Script

Prompt a Langchain Application

Was this answer helpful?

Related Articles

Support

Generate Password