Skip to main content

How To Build RAG Application

RAG applications combine information retrieval with generative AI to provide intelligent responses based on a knowledge base. This guide walks you through the process of creating a RAG application using the initializ.ai platform.

Transforming PDFs into Knowledge: Our Application Idea

We are building an innovative Retrieval-Augmented Generation (RAG) application that transforms how users interact with PDF documents. With the power of AI, our app allows users to upload a PDF, ask questions related to its content, and receive precise, context-aware responses.

Whether it’s extracting insights from a research paper, finding critical data in a report, or simply navigating a manual with ease, our solution makes static documents dynamic, empowering users with immediate access to the knowledge they need.

Prerequisites

  1. Set up a database capable of storing embeddings.
  2. Embedding model.
  3. LLM model.

Visualizing the Flow

RAG application flow

Step-by-step process

STEP 01: Setup a database

  1. Log in to the initializ.ai platform.
  2. Navigate to the Database section.
  3. Create a PostgreSQL database with the pgvector extension pre-installed on our Initializ.ai platform, allowing you to store embeddings efficiently.(How to create database)
  4. Once database get active you can connect to your database using PGAdmin.

STEP 02: Deploy Embedding Model

As the one-stop solution, you can deploy your embedding model on initializ.ai. Simply log in to the Initializ.ai platform to get started.

  1. Create an AI Endpoint by selecting the model sentence-transformers/all-MiniLM-L6-v2. To create an AI Endpoint, you need a GPU-enabled workspace.

    embedding model deploy

  2. Fill out the details.

    embedding model deploy

  3. Configure AI Endpoint, and click on "Next".

    embedding model deploy

  4. Review all your details and click on "Submit".

    embedding model deploy

  5. Once the model is deployed, you will receive an endpoint (URL). This URL corresponds to a Swagger interface. By opening the Swagger interface, you can view the endpoint details. Using this endpoint URL, you can send your data chunks, and it will return embeddings in the response.

    embedding model deploy

STEP 03: Deploy LLM Model

To deploy an LLM model, you need to follow the same steps as you would when deploying an embedding model. The only difference is that instead of using sentence-transformers/all-MiniLM-L6-v2, you can select any LLM model of your choice and deploy it.

Once the LLM model is deployed, you will receive a Swagger URL. By accessing this interface, you can retrieve the endpoint to make requests.

STEP 04: Let's create an endpoint for the RAG application

We will implement an endpoint using Python. Start by creating a folder for your project, and then create a file named app.py, where you will write the code for the endpoint.

  • STEP 04.01: Necessary Dependencies

    First, let’s import the necessary dependencies in app.py.

    Necessary Dependencies
    from flask import Flask, request, jsonify, Response
    import os
    import json
    from PIL import Image, ImageEnhance
    import fitz # PyMuPDF, fitz allows you to extract text, images, and other content from PDF files
    import requests
    import io
    import psycopg
    from langchain.text_splitter import CharacterTextSplitter
    from pgvector.psycopg import register_vector
    from flask_cors import CORS
    import numpy as np
    from dotenv import load_dotenv
  • STEP 04.02: Initialize Flask app

    Flask is a micro web framework in Python that allows you to build web applications, APIs, or dynamic websites. It provides the tools and features needed to handle HTTP requests, render templates, and more.

     app = Flask(__name__)
  • STEP 04.03: Create .env file

    Create a .env file in the root of your project, and include all environment variables such as the database hostname, password, embedding model URL, and LLM model URL inside it.

    .env file
    .env
      DB_HOST=xyzdb.test.xyzorg.db.psi.initz.run
    DB_PORT=5432
    DB_NAME=xyzdb
    DB_USER=xyzdb
    DB_PASSWORD=bqdhkFShqH8lKL
    MODEL_URL=<put your LLM model end point>
    EMBEDDING_MODEL_URL=<Put your EMBEDDING_MODEL url>
    EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
    TOKEN=<Put your token>
  • STEP 04.04: CORS setup

    The line is typically used in a Flask application to enable Cross-Origin Resource Sharing (CORS).

    CORS(app, resources={r"/*": {"origins": "*"}})
  • STEP 04.05: Loding environment variables

    The load_dotenv() function loads environment variables from .env file into your application's environment. It is provided by the python-dotenv library.

    Environment Variables
    # Load environment variables from a .env file
    load_dotenv()

    # Load the environment variable for the model URL
    MODEL_URL = os.getenv("MODEL_URL")

    # Load the environment variable for the authentication token
    TOKEN = os.getenv("TOKEN")

    # Load the environment variable for the embedding model URL
    EMBEDDING_MODEL_URL = os.getenv("EMBEDDING_MODEL_URL")

    # Load the environment variable for the embedding model name
    EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL")

    # Load environment variables for the database configuration
    DB_CONFIG = {
    "host": os.getenv("DB_HOST"), # Database host
    "port": os.getenv("DB_PORT"), # Database port
    "dbname": os.getenv("DB_NAME"), # Database name
    "user": os.getenv("DB_USER"), # Database user
    "password": os.getenv("DB_PASSWORD"), # Database password
    }
  • STEP 04.06: Set Up database connection

    Database connection setup
    # PostgreSQL connection setup
    def get_db_connection():
    try:
    # Connect to the database using psycopg
    conn = psycopg.connect(
    host=DB_CONFIG["host"],
    port=DB_CONFIG["port"],
    dbname=DB_CONFIG["dbname"],
    user=DB_CONFIG["user"],
    password=DB_CONFIG["password"]
    )
    return conn
    except Exception as e:
    print(f"Error connecting to the database: {e}")
  • STEP 04.07: Run the Flask app

    The line app.run(debug=False, host="0.0.0.0", port=8000) is used to start a Flask application, and the arguments specify how the Flask development server behaves.

    Run the Flask app
    # Run the Flask app
    if __name__ == '__main__':
    # connects database
    conn = get_db_connection()
    # It instructs PostgreSQL to create an extension called "vector" if it does not already exist in the database.
    conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
    # ensures pgvector extension is installed and available in the PostgreSQL database.
    create_pgvector_extension()
    # creates a vector table in PostgreSQL
    create_document_vectors_table()
    register_vector(conn)
    app.run(debug=False, host="0.0.0.0", port=8000)
  • STEP 04.08: Let's implement the helper functions

    create_pgvector_extension()
    # ensures pgvector extension is installed and available in the PostgreSQL database.

    def create_pgvector_extension():
    # connects database
    conn = get_db_connection()
    #executes SQL queries, retrieve results, and manage transactions.
    cursor = conn.cursor()

    cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    # to save changes made to the database during the current transaction
    conn.commit()

    cursor.close()
    conn.close()
    create_document_vectors_table()
    # creates a vector table in PostgreSQL
    def create_document_vectors_table():
    conn = get_db_connection()
    # with get_db_connection() as conn:
    cursor = conn.cursor()

    create_table_query = """
    CREATE TABLE IF NOT EXISTS document_vectors (
    id SERIAL PRIMARY KEY,
    document_name TEXT NOT NULL,
    chunk TEXT NOT NULL,
    embedding VECTOR(1536)
    );
    """
    cursor.execute(create_table_query)
    conn.commit()

    cursor.close()
    conn.close()
    Route to upload multiple PDFs and ask questions

    The line @app.route('/upload_pdf_and_ask', methods=['POST']) is a Flask route decorator that maps a specific URL (in this case,/upload_pdf_and_ask) to a function that handles requests sent to that URL.

    @app.route('/upload_pdf_and_ask', methods=['POST'])
    def upload_pdf_and_ask():
    if 'file' not in request.files:
    return jsonify({"error": "Missing files"}), 400

    files = request.files.getlist('file')
    question = request.form.get("question")
    if not question:
    return jsonify({"error": "Missing question"}), 400
    streaming = request.form.get("stream") == 'true'

    for file in files:
    file_path = os.path.join('uploads', file.filename)
    os.makedirs('uploads', exist_ok=True)
    file.save(file_path) #saves the input file to the local storage

    # custom function to extract the text from the input file
    text = extract_text_from_pdf(file_path)

    chunks = split_text_into_chunks(text, max_chunk_size=512)


    for chunk in chunks:
    # convert these chunks into embeddings
    embeddings = get_embeddings(chunk)

    # add embeddings into database
    add_to_pgvector(embeddings, chunk, file.filename)


    # creates embeddings of question
    question_embedding = get_embeddings(question)

    # search for the best possible match for the question
    search_results = search_pgvector(question_embedding)

    # removes the file from the local storage
    for file in files:
    file_path = os.path.join('uploads', file.filename)
    if os.path.exists(file_path):
    os.remove(file_path)

    # Prepare response context
    if search_results:
    best_match = search_results[0][1]
    matched_document = search_results[0][0]
    else:
    best_match = "No relevant match found."
    matched_document = "Unknown"

    max_tokens = 4096
    context = best_match[:max_tokens]

    # Prepare system and user messages for the model
    system_message = "You are a helpful assistant that answers questions to the point based on the provided documents.. Please limit your answer to 200 words."
    user_message = f"Question: {question}\nContext: {context}"

    for file in files:
    delete_from_pgvector(file.filename)

    if streaming:
    return Response(event_generator(system_message, user_message, TOKEN, streaming=True),
    content_type='text/event-stream;charset=utf-8', status=200 )
    else:
    return Response(event_generator(system_message, user_message, TOKEN, streaming=False),
    content_type='application/json', status=200 )
    delete_from_pgvector(file.filename)
    # Delete from pgvector
    def delete_from_pgvector(source):
    conn = get_db_connection()
    cursor = conn.cursor()

    delete_query = """
    DELETE FROM document_vectors
    WHERE document_name = %s;
    """
    cursor.execute(delete_query, (source,))
    conn.commit()

    cursor.close()
    extract_text_from_pdf(pdf_path)
    # Helper function to extract text from PDF
    def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    full_text = ""
    for page_num in range(doc.page_count):
    page = doc.load_page(page_num)
    page_text = page.get_text()


    if page_text.strip():
    full_text += page_text
    return full_text
    split_text_into_chunks(text, max_chunk_size=512)
    # Function to split text into smaller chunks using Langchain's splitter
    def split_text_into_chunks(text, max_chunk_size=512):

    # Initialize the text splitter with a chunk size of 512 characters
    text_splitter = CharacterTextSplitter(chunk_size=max_chunk_size, chunk_overlap=100)

    #split a large text into smaller, manageable chunks based on the settings defined in the text_splitter
    chunks = text_splitter.split_text(text)
    return chunks
    get_embeddings(text)
    # Helper function to get embeddings for a given text
    def get_embeddings(text):
    headers = {
    "accept": "application/json",
    "Content-Type": "application/json"
    }
    payload = {
    "model": EMBEDDING_MODEL,
    "input": [text],
    "encoding_format": "float",
    "truncate_prompt_tokens": 1,
    "add_special_tokens": False,
    "priority": 0
    }

    response = requests.post(EMBEDDING_MODEL_URL, json=payload, headers=headers)

    if response.status_code == 200:
    response_data = response.json()
    embedding = response_data.get("data", [])[0].get("embedding", [])

    if isinstance(embedding, list):
    adjusted_embeddings = adjust_embedding_size(embedding, desired_size=1536)
    return adjusted_embeddings
    else:
    raise TypeError("Embedding response should be a list of floats.")
    else:
    raise Exception(f"Failed to get embeddings: {response.status_code} - {response.text}")
    adjust_embedding_size(embedding, desired_size=1536)
    def adjust_embedding_size(embedding, desired_size=1536):
    if not isinstance(embedding, list):
    raise TypeError("Expected 'embedding' to be a list.")
    if len(embedding) > desired_size:
    embedding = embedding[:desired_size]
    elif len(embedding) < desired_size:
    embedding = np.pad(embedding, (0, desired_size - len(embedding)), 'constant')
    return embedding
    add_to_pgvector(embeddings, text_chunk, source)
    # Function to add embeddings to PostgreSQL using pgvector
    def add_to_pgvector(embeddings, text_chunk, source):
    conn = get_db_connection()
    cursor = conn.cursor()
    if isinstance(embeddings, np.ndarray) and embeddings.ndim == 1:
    embedding_vector = embeddings.tolist()

    else:
    raise TypeError("Expected embeddings[0] to be a list or array, but got: {}".format(type(embeddings[0])))

    insert_query = """
    INSERT INTO document_vectors (document_name, chunk, embedding)
    VALUES (%s, %s, %s); -- No casting in query
    """

    # Execute the insert query
    cursor.execute(insert_query, (source, text_chunk, embedding_vector))
    conn.commit()

    cursor.close()
    conn.close()
    search_pgvector(query_embedding, top_k=5)
    # Function to search PGVector
    def search_pgvector(query_embedding, top_k=5):
    if isinstance(query_embedding, np.ndarray):
    query_vector = query_embedding.tolist()
    else:
    raise TypeError(f"Expected query_embedding to be a numpy array, but got: {type(query_embedding)}")

    conn = get_db_connection()
    cursor = conn.cursor()

    search_query = """
    SELECT document_name, chunk, embedding
    FROM document_vectors
    ORDER BY embedding <=> %s::vector(1536)
    LIMIT %s;
    """

    # Execute the query with the proper casting of query_vector
    cursor.execute(search_query, (query_vector, top_k))
    results = cursor.fetchall()

    cursor.close()
    conn.close()

    return results
    def event_generator(system_message, user_message, token, streaming)
    # Event stream generator
    def event_generator(system_message, user_message, token, streaming):
    """This generator handles the event stream and yields data."""
    try:
    for chunk in get_custom_model_answer(system_message, user_message, token, streaming):
    yield f"{chunk}\n\n"
    except Exception as e:
    yield f"data: Error - {str(e)}\n\n"
    get_custom_model_answer(system_message, user_message, token, streaming)
    # Function to get the response from the model API
    def get_custom_model_answer(system_message, user_message, token, streaming):
    headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {token}'
    }

    def sanitize_input(input_text):
    return input_text.encode('utf-8', 'ignore').decode('utf-8', 'ignore')

    system_message = sanitize_input(system_message)
    user_message = sanitize_input(user_message)

    try:
    payload = {
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message}
    ],
    "max_tokens": 5000,
    "temperature": 0.7,
    "stream": streaming
    }
    response = requests.post(MODEL_URL, headers=headers, data=json.dumps(payload), stream=True)
    if response.status_code != 200:
    print(f"Error Response: {response.text}")
    return f"Error: {response.status_code}, {response.text}"

    # Yield chunks if response is streamed
    for chunk in response.iter_lines():
    if chunk:
    yield chunk.decode('utf-8')

    except requests.exceptions.RequestException as e:
    print(f"Error while making the API call: {e}")
    return f"Error: {e}"
    Complete Code Implementation
    app.py
       from flask import Flask, request, jsonify, Response
    import os
    import json
    from PIL import Image, ImageEnhance
    import fitz # PyMuPDF, fitz allows you to extract text, images, and other content from PDF files
    import requests
    import io
    import psycopg
    from langchain.text_splitter import CharacterTextSplitter
    from pgvector.psycopg import register_vector
    from flask_cors import CORS
    import numpy as np
    from dotenv import load_dotenv

    app = Flask(__name__)

    # Load environment variables from a .env file
    load_dotenv()

    # enables cross-origin resource sharing
    CORS(app, resources={r"/*": {"origins": "*"}})

    # Load the environment variable for the model URL
    MODEL_URL = os.getenv("MODEL_URL")

    # Load the environment variable for the authentication token
    TOKEN = os.getenv("TOKEN")

    # Load the environment variable for the embedding model URL
    EMBEDDING_MODEL_URL = os.getenv("EMBEDDING_MODEL_URL")

    # Load the environment variable for the embedding model name
    EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL")

    # Load environment variables for the database configuration
    DB_CONFIG = {
    "host": os.getenv("DB_HOST"), # Database host
    "port": os.getenv("DB_PORT"), # Database port
    "dbname": os.getenv("DB_NAME"), # Database name
    "user": os.getenv("DB_USER"), # Database user
    "password": os.getenv("DB_PASSWORD"), # Database password
    }

    CORS(app, resources={r"/*": {"origins": "*"}})

    # PostgreSQL connection setup
    def get_db_connection():
    try:
    # Connect to the database using psycopg
    conn = psycopg.connect(
    host=DB_CONFIG["host"],
    port=DB_CONFIG["port"],
    dbname=DB_CONFIG["dbname"],
    user=DB_CONFIG["user"],
    password=DB_CONFIG["password"]
    )
    return conn
    except Exception as e:
    print(f"Error connecting to the database: {e}")

    # ensures pgvector extension is installed and available in the PostgreSQL database.
    def create_pgvector_extension():
    # connects database
    conn = get_db_connection()
    #executes SQL queries, retrieve results, and manage transactions.
    cursor = conn.cursor()

    cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")
    # to save changes made to the database during the current transaction
    conn.commit()

    cursor.close()
    conn.close()

    # creates a vector table in PostgreSQL
    def create_document_vectors_table():
    conn = get_db_connection()
    # with get_db_connection() as conn:
    cursor = conn.cursor()

    create_table_query = """
    CREATE TABLE IF NOT EXISTS document_vectors (
    id SERIAL PRIMARY KEY,
    document_name TEXT NOT NULL,
    chunk TEXT NOT NULL,
    embedding VECTOR(1536)
    );
    """
    cursor.execute(create_table_query)
    conn.commit()

    cursor.close()
    conn.close()

    # Helper function to extract text from PDF
    def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    full_text = ""
    for page_num in range(doc.page_count):
    page = doc.load_page(page_num)
    page_text = page.get_text()


    if page_text.strip():
    full_text += page_text
    return full_text

    # Function to split text into smaller chunks using Langchain's splitter
    def split_text_into_chunks(text, max_chunk_size=512):

    # Initialize the text splitter with a chunk size of 512 characters
    text_splitter = CharacterTextSplitter(chunk_size=max_chunk_size, chunk_overlap=100)

    #split a large text into smaller, manageable chunks based on the settings defined in the text_splitter
    chunks = text_splitter.split_text(text)
    return chunks

    # Helper function to get embeddings for a given text
    def get_embeddings(text):
    headers = {
    "accept": "application/json",
    "Content-Type": "application/json"
    }
    payload = {
    "model": EMBEDDING_MODEL,
    "input": [text],
    "encoding_format": "float",
    "truncate_prompt_tokens": 1,
    "add_special_tokens": False,
    "priority": 0
    }

    response = requests.post(EMBEDDING_MODEL_URL, json=payload, headers=headers)

    if response.status_code == 200:
    response_data = response.json()
    embedding = response_data.get("data", [])[0].get("embedding", [])

    if isinstance(embedding, list):
    adjusted_embeddings = adjust_embedding_size(embedding, desired_size=1536)
    return adjusted_embeddings
    else:
    raise TypeError("Embedding response should be a list of floats.")
    else:
    raise Exception(f"Failed to get embeddings: {response.status_code} - {response.text}")

    # Adjust embedding size
    def adjust_embedding_size(embedding, desired_size=1536):
    if not isinstance(embedding, list):
    raise TypeError("Expected 'embedding' to be a list.")
    if len(embedding) > desired_size:
    embedding = embedding[:desired_size]
    elif len(embedding) < desired_size:
    embedding = np.pad(embedding, (0, desired_size - len(embedding)), 'constant')
    return embedding

    # Function to add embeddings to PostgreSQL using pgvector
    def add_to_pgvector(embeddings, text_chunk, source):
    conn = get_db_connection()
    cursor = conn.cursor()
    if isinstance(embeddings, np.ndarray) and embeddings.ndim == 1:
    embedding_vector = embeddings.tolist()

    else:
    raise TypeError("Expected embeddings[0] to be a list or array, but got: {}".format(type(embeddings[0])))

    insert_query = """
    INSERT INTO document_vectors (document_name, chunk, embedding)
    VALUES (%s, %s, %s); -- No casting in query
    """

    # Execute the insert query
    cursor.execute(insert_query, (source, text_chunk, embedding_vector))
    conn.commit()

    cursor.close()
    conn.close()


    # Function to search PGVector
    def search_pgvector(query_embedding, top_k=5):
    if isinstance(query_embedding, np.ndarray):
    query_vector = query_embedding.tolist()
    else:
    raise TypeError(f"Expected query_embedding to be a numpy array, but got: {type(query_embedding)}")

    conn = get_db_connection()
    cursor = conn.cursor()

    search_query = """
    SELECT document_name, chunk, embedding
    FROM document_vectors
    ORDER BY embedding <=> %s::vector(1536)
    LIMIT %s;
    """

    # Execute the query with the proper casting of query_vector
    cursor.execute(search_query, (query_vector, top_k))
    results = cursor.fetchall()

    cursor.close()
    conn.close()

    return results


    @app.route('/upload_pdf_and_ask', methods=['POST'])
    def upload_pdf_and_ask():
    if 'file' not in request.files:
    return jsonify({"error": "Missing files"}), 400

    files = request.files.getlist('file')
    question = request.form.get("question")
    if not question:
    return jsonify({"error": "Missing question"}), 400
    streaming = request.form.get("stream") == 'true'

    for file in files:
    file_path = os.path.join('uploads', file.filename)
    os.makedirs('uploads', exist_ok=True)
    file.save(file_path) #saves the input file to the local storage

    # custom function to extract the text from the input file
    text = extract_text_from_pdf(file_path)

    chunks = split_text_into_chunks(text, max_chunk_size=512)


    for chunk in chunks:
    # convert these chunks into embeddings
    embeddings = get_embeddings(chunk)

    # add embeddings into database
    add_to_pgvector(embeddings, chunk, file.filename)


    # creates embeddings of question
    question_embedding = get_embeddings(question)

    # search for the best possible match for the question
    search_results = search_pgvector(question_embedding)

    # removes the file from the local storage
    for file in files:
    file_path = os.path.join('uploads', file.filename)
    if os.path.exists(file_path):
    os.remove(file_path)

    # Prepare response context
    if search_results:
    best_match = search_results[0][1]
    matched_document = search_results[0][0]
    else:
    best_match = "No relevant match found."
    matched_document = "Unknown"

    max_tokens = 4096
    context = best_match[:max_tokens]

    # Prepare system and user messages for the model
    system_message = "You are a helpful assistant that answers questions to the point based on the provided documents.. Please limit your answer to 200 words."
    user_message = f"Question: {question}\nContext: {context}"

    for file in files:
    delete_from_pgvector(file.filename)

    if streaming:
    return Response(event_generator(system_message, user_message, TOKEN, streaming=True),
    content_type='text/event-stream;charset=utf-8', status=200 )
    else:
    return Response(event_generator(system_message, user_message, TOKEN, streaming=False),
    content_type='application/json', status=200 )

    # Delete source from pgvector
    def delete_from_pgvector(source):
    conn = get_db_connection()
    cursor = conn.cursor()

    delete_query = """
    DELETE FROM document_vectors
    WHERE document_name = %s;
    """
    cursor.execute(delete_query, (source,))
    conn.commit()

    cursor.close()

    # Event stream generator
    def event_generator(system_message, user_message, token, streaming):
    """This generator handles the event stream and yields data."""
    try:
    for chunk in get_custom_model_answer(system_message, user_message, token, streaming):
    yield f"{chunk}\n\n"
    except Exception as e:
    yield f"data: Error - {str(e)}\n\n"

    # Function to get the response from the model API
    def get_custom_model_answer(system_message, user_message, token, streaming):
    headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {token}'
    }

    def sanitize_input(input_text):
    return input_text.encode('utf-8', 'ignore').decode('utf-8', 'ignore')

    system_message = sanitize_input(system_message)
    user_message = sanitize_input(user_message)

    try:
    payload = {
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message}
    ],
    "max_tokens": 5000,
    "temperature": 0.7,
    "stream": streaming
    }
    response = requests.post(MODEL_URL, headers=headers, data=json.dumps(payload), stream=True)
    if response.status_code != 200:
    print(f"Error Response: {response.text}")
    return f"Error: {response.status_code}, {response.text}"

    # Yield chunks if response is streamed
    for chunk in response.iter_lines():
    if chunk:
    yield chunk.decode('utf-8')

    except requests.exceptions.RequestException as e:
    print(f"Error while making the API call: {e}")
    return f"Error: {e}"

    # Run the Flask app
    if __name__ == '__main__':
    # connects database
    conn = get_db_connection()
    # It instructs PostgreSQL to create an extension called "vector" if it does not already exist in the database.
    conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
    # ensures pgvector extension is installed and available in the PostgreSQL database.
    create_pgvector_extension()
    # creates a vector table in PostgreSQL
    create_document_vectors_table()
    register_vector(conn)
    app.run(debug=False, host="0.0.0.0", port=8000)

    STEP 04.09: Add "requirements.txt" file

    The requirements.txt file in a Flask application (or any Python application) is used to list all the Python dependencies (libraries and packages) required to run the application. It serves as a blueprint for managing and sharing the dependencies of the application.

    requirements.txt*
    flask
    numpy
    faiss-cpu
    PyMuPDF
    requests
    python-dotenv
    flask-cors
    pgvector
    langchain
    psycopg
    psycopg[binary]
    Pillow

    STEP 04.10: Add Procfile

    The Procfile tells the hosting platform how to start your Flask application, such as which server to use and which application module to load.

    web:python app.py

    STEP 04.11: Run your application

    Now, you have completed the implementation, open the terminal in your editor and run the command python app.py. After executing this command, your server will start, and you will receive a localhost URL that you can use to test your API on Postman Desktop.

    Localhost URL :

    embedding model deploy

    API for test :

    local_url/route_path

    In case of our application we can use API

    http://192.168.0.192:8000/upload_pdf_and_ask

STEP 05: Deploy your API (RAG Application Backend)

To deploy your application, create a GPU-enabled workspace. Click the Create button located to the left of the organization switcher, and then select Application.

embedding model deploy

Now follow these steps :

  1. Enter all details and select GPU enabled workspace.

    embedding model deploy

  2. Configure Runtime environment variables.

  • You can take all runtime environments by uploading .env file.

    embedding model deploy

  1. Configure your application.click on "Next".

    embedding model deploy

  2. Review the preview screen and confirm all your details. If you need to make any changes, click the "Back" button; otherwise, click "Submit."

    embedding model deploy

  3. Once your application get deployed, you will get an endpoint.

    embedding model deploy

    Application Endpoint :

    URL_getAfterApplicationDeployment/route_path

    In case of our application :

    https://ragapi.test.devapp.nyc1.initz.run/upload_pdf_and_ask

STEP 06: Integration of endpoint

To integrate your endpoint, you need a frontend. Design the frontend and then integrate it with the endpoint.