Thanks! We'll be in touch in the next 12 hours
Oops! Something went wrong while submitting the form.

Policy Insights: Chatbots and RAG in Health Insurance Navigation

Shreyash Panchal

Artificial Intelligence / Machine Learning

Introduction

Understanding health insurance policies can often be complicated, leaving individuals to tackle lengthy and difficult documents. The complexity introduced by these policies’ language not only adds to the confusion but also leaves policyholders uncertain about the actual extent of their coverage, the best plan for their needs, and how to seek answers to their specific policy-related questions. In response to these ongoing challenges and to facilitate better access to information, a fresh perspective is being explored—an innovative approach to revolutionize how individuals engage with their health insurance policies.

Challenges in Health Insurance Communication

Health insurance queries are inherently complex, often involving nuanced details that require precision. Traditional chatbots, lacking the finesse of generative AI (GenAI), struggle to handle the intricacies of healthcare-related questions. The envisioned health insurance chatbot powered by GenAI overcomes these limitations, offering a sophisticated understanding of queries and delivering responses that align with the complexities of the healthcare sphere.

Retrieval-Augmented Generation

Retrieval-augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is done by retrieving relevant data/documents relevant to a question or task and providing them as context for the LLM. RAG has shown success in supporting chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.

To know more about this topic, check here for technical insights and additional information.

1. https://www.oracle.com/in/artificial-intelligence/generative-ai/retrieval-augmented-generation-rag/

2. https://research.ibm.com/blog/retrieval-augmented-generation-RAG

The Dual Phases of RAG: Retrieval and Content Generation

Retrieval-augmented generation (RAG) smoothly combines two essential steps, carefully blending retrieval and content generation. Initially, algorithms diligently explore external knowledge bases to find relevant data that matches user queries. This gathered information then becomes the foundation for the next phase—content generation. In this step, the large language model uses both the enhanced prompt and its internal training data to create responses that are not only accurate but also contextually appropriate.

Advantages of Deploying RAG in AI Chatbots

Scalability is a key advantage of RAG over traditional models. Instead of relying on a monolithic model attempting to memorize vast amounts of information, RAG models can easily scale by updating or expanding the external database. This flexibility enables them to manage and incorporate a broader range of data efficiently.

Memory efficiency is another strength of RAG in comparison to models like GPT. While traditional models have limitations on the volume of data they can store and recall, RAG efficiently utilizes external databases. This approach allows RAG to fetch fresh, updated, or detailed information as needed, surpassing the memory constraints of conventional models.

Moreover, RAG offers flexibility in its knowledge sources. By modifying or enlarging the external knowledge base, a RAG model can be adapted to specific domains without the need for retraining the underlying generative model. This adaptability ensures that RAG remains a versatile and efficient solution for various applications.

The displayed image outlines the application flow. In the development of our health insurance chatbot, we follow a comprehensive training process. Initially, essential PDF documents are loaded to familiarize our model with the intricacies of health insurance. These documents undergo tokenization, breaking them into smaller units for in-depth analysis. Each of these units, referred to as tokens, is then transformed into numerical vectors through a process known as vectorization. These numerical representations are efficiently stored in ChromaDB for quick retrieval.

When a question is posed by a user, the numerical version of the query is retrieved from ChromaDB by the chatbot. Employing a language model (LLM), the chatbot crafts a nuanced response based on this numerical representation. This method ensures a smooth and efficient conversational experience. Armed with a wealth of health insurance information, the chatbot delivers precise and contextually relevant responses to user inquiries, establishing itself as a valuable resource for navigating the complexities of health insurance queries.

Role of Vector Embedding

Traditional search engines mainly focus on finding specific words in your search. For example, if you search "best smartphone," it looks for pages with exactly those words. On the other hand, semantic search is like a more understanding search engine. It tries to figure out what you really mean by considering the context of your words.

Imagine you are planning a vacation and want to find a suitable destination, and you input the query "warm places to visit in winter." In a traditional search, the engine would look for exact matches of these words on web pages. Results might include pages with those specific terms, but the relevance might vary.

Text, audio, and video can be embedded:

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness, and large distances suggest low relatedness.

For example:

bat: [0.6, -0.3, 0.8, ...]

ball: [0.4, -0.2, 0.7, ...]

wicket: [-0.5, 0.6, -0.2, ...]

In this cricket-themed example, each word (bat, ball, wicket) is represented as a vector in a multi-dimensional space, capturing the semantic relationships between cricket-related terms.

For a deeper understanding, you may explore additional insights in the following articles:

1. https://www.datastax.com/guides/what-is-a-vector-embedding

2. https://www.pinecone.io/learn/vector-embeddings/

3. https://weaviate.io/blog/vector-embeddings-explained/

A specialized type of database known as a vector database is essential for storing these numerical representations. In a vector database, data is stored as mathematical vectors, providing a unique way to store and retrieve information. This specialized database greatly facilitates machine learning models in retaining and recalling previous inputs, enabling powerful applications in search, recommendations, and text generation.

Vector retrieval in a database involves finding the nearest neighbors or most similar vectors to a given query vector. These are metrics for finding similar vectors:

1. The Euclidean distance metric considers both magnitudes and direction, providing a comprehensive measure for assessing the spatial separation between vectors.

2. Cosine similarity focuses solely on the direction of vectors, offering insights into their alignment within the vector space.

3. Dot product similarity metric takes into account both magnitudes and direction, offering a versatile approach for evaluating the relationships between vectors.

ChromaDB, PineCone, and Milvus are a few examples of vector databases.

For our application, we will be using LangChain, OpenAI embedding and LLM, and ChromaDB.

  1. We need to install Python packages required for this application.

CODE: https://gist.github.com/velotiotech/d77b50b04b73d48cb0d34239f4542dd5.js

A. LangChain is a tool that helps you build intelligent applications using language models. It allows you to develop chatbots, personal assistants, and applications that can summarize, analyze, or respond to questions about documents or data. It's useful for tasks like coding assistance, working with APIs, and other activities that gain an advantage from AI technology.

B. OpenAI is a renowned artificial intelligence research lab. Installing the OpenAI package provides access to OpenAI's language models, including powerful ones like GPT-3. This library is crucial if you plan to integrate OpenAI's language models into your applications.

C. As mentioned earlier, ChromaDB is a vector database package designed to handle vector data efficiently, making it suitable for applications that involve similarity searches, clustering, or other operations on vectors.

D. LangChainHub is a handy tool to make your language tasks easier. It begins with helpful prompts and will soon include even more features like chains and agents.

E. PyPDF2 is a library for working with PDF files in Python. It allows reading and manipulating PDF documents, making it useful for tasks such as extracting text or merging PDF files.

F. Tiktoken is a Python library designed for counting the number of tokens in a text string without making an API call. This can be particularly useful for managing token limits when working with language models or APIs that have usage constraints.

  1. Importing Libraries

CODE: https://gist.github.com/velotiotech/6aa6abe9dcb09844ab001454fd64fc4f.js

  1. Initializing OpenAI LLM

CODE: https://gist.github.com/velotiotech/5915bab23f151385a2e53e80045e87cd.js

This line of code initializes a language model (LLM) using OpenAI's GPT-4 model with 8192 tokens. Temperature parameter influences the randomness of text generated, and increased temperature results in more creative responses, while decreased temperature leads to more focused and deterministic answers.

  1. We will be loading a PDF consisting of material for training the model and also need to divide it into chunks of texts that can be fed to the model.

CODE: https://gist.github.com/velotiotech/b77325619b91daa216bd1835984ef0b1.js

  1. We will be loading this chunk of text into the vector Database Chromadb, later used for retrieval and using OpenAI embeddings.

CODE: https://gist.github.com/velotiotech/722a8803fced89ba17293b4a5e221478.js

  1. Creating a retrieval object will return the top 3 similar vector matches for the query.

CODE: https://gist.github.com/velotiotech/ae8954ac67b4903614f3c80473802488.js

7. Creating a prompt to pass to the LLM for obtaining specific information involves crafting a well-structured question or instruction that clearly outlines the desired details. RAG chain initiates with the retriever and formatted documents, progresses through the custom prompt template, involves the LLM , and concludes by utilizing a string output parser (StrOutputParser()) to handle the resulting response.

CODE: https://gist.github.com/velotiotech/23984ff2c09a42d679549326e574b8f7.js

  1. Create a function to get a response from the chatbot.

CODE: https://gist.github.com/velotiotech/ae9964370ff5e2cbb786524d7ed245a8.js

We can integrate the Streamlit tool for building a powerful generative app, using this function in Streamlit application to get the AI response.

CODE: https://gist.github.com/velotiotech/2991be28b53f5a98696d3e73a3dad8c5.js

Performance Insights

Conclusion

In our exploration of developing health insurance chatbots, we've dived into the innovative world of retrieval-augmented generation (RAG), where advanced technologies are seamlessly combined to reshape user interactions. The adoption of RAG has proven to be a game-changer, significantly enhancing the chatbot's abilities to understand, retrieve, and generate contextually relevant responses. However, it's worth mentioning a couple of limitations, including challenges in accurately calculating premium quotes and occasional inaccuracies in semantic searches.

Get the latest engineering blogs delivered straight to your inbox.
No spam. Only expert insights.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings

Policy Insights: Chatbots and RAG in Health Insurance Navigation

Introduction

Understanding health insurance policies can often be complicated, leaving individuals to tackle lengthy and difficult documents. The complexity introduced by these policies’ language not only adds to the confusion but also leaves policyholders uncertain about the actual extent of their coverage, the best plan for their needs, and how to seek answers to their specific policy-related questions. In response to these ongoing challenges and to facilitate better access to information, a fresh perspective is being explored—an innovative approach to revolutionize how individuals engage with their health insurance policies.

Challenges in Health Insurance Communication

Health insurance queries are inherently complex, often involving nuanced details that require precision. Traditional chatbots, lacking the finesse of generative AI (GenAI), struggle to handle the intricacies of healthcare-related questions. The envisioned health insurance chatbot powered by GenAI overcomes these limitations, offering a sophisticated understanding of queries and delivering responses that align with the complexities of the healthcare sphere.

Retrieval-Augmented Generation

Retrieval-augmented generation, or RAG, is an architectural approach that can improve the efficacy of large language model (LLM) applications by leveraging custom data. This is done by retrieving relevant data/documents relevant to a question or task and providing them as context for the LLM. RAG has shown success in supporting chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.

To know more about this topic, check here for technical insights and additional information.

1. https://www.oracle.com/in/artificial-intelligence/generative-ai/retrieval-augmented-generation-rag/

2. https://research.ibm.com/blog/retrieval-augmented-generation-RAG

The Dual Phases of RAG: Retrieval and Content Generation

Retrieval-augmented generation (RAG) smoothly combines two essential steps, carefully blending retrieval and content generation. Initially, algorithms diligently explore external knowledge bases to find relevant data that matches user queries. This gathered information then becomes the foundation for the next phase—content generation. In this step, the large language model uses both the enhanced prompt and its internal training data to create responses that are not only accurate but also contextually appropriate.

Advantages of Deploying RAG in AI Chatbots

Scalability is a key advantage of RAG over traditional models. Instead of relying on a monolithic model attempting to memorize vast amounts of information, RAG models can easily scale by updating or expanding the external database. This flexibility enables them to manage and incorporate a broader range of data efficiently.

Memory efficiency is another strength of RAG in comparison to models like GPT. While traditional models have limitations on the volume of data they can store and recall, RAG efficiently utilizes external databases. This approach allows RAG to fetch fresh, updated, or detailed information as needed, surpassing the memory constraints of conventional models.

Moreover, RAG offers flexibility in its knowledge sources. By modifying or enlarging the external knowledge base, a RAG model can be adapted to specific domains without the need for retraining the underlying generative model. This adaptability ensures that RAG remains a versatile and efficient solution for various applications.

The displayed image outlines the application flow. In the development of our health insurance chatbot, we follow a comprehensive training process. Initially, essential PDF documents are loaded to familiarize our model with the intricacies of health insurance. These documents undergo tokenization, breaking them into smaller units for in-depth analysis. Each of these units, referred to as tokens, is then transformed into numerical vectors through a process known as vectorization. These numerical representations are efficiently stored in ChromaDB for quick retrieval.

When a question is posed by a user, the numerical version of the query is retrieved from ChromaDB by the chatbot. Employing a language model (LLM), the chatbot crafts a nuanced response based on this numerical representation. This method ensures a smooth and efficient conversational experience. Armed with a wealth of health insurance information, the chatbot delivers precise and contextually relevant responses to user inquiries, establishing itself as a valuable resource for navigating the complexities of health insurance queries.

Role of Vector Embedding

Traditional search engines mainly focus on finding specific words in your search. For example, if you search "best smartphone," it looks for pages with exactly those words. On the other hand, semantic search is like a more understanding search engine. It tries to figure out what you really mean by considering the context of your words.

Imagine you are planning a vacation and want to find a suitable destination, and you input the query "warm places to visit in winter." In a traditional search, the engine would look for exact matches of these words on web pages. Results might include pages with those specific terms, but the relevance might vary.

Text, audio, and video can be embedded:

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness, and large distances suggest low relatedness.

For example:

bat: [0.6, -0.3, 0.8, ...]

ball: [0.4, -0.2, 0.7, ...]

wicket: [-0.5, 0.6, -0.2, ...]

In this cricket-themed example, each word (bat, ball, wicket) is represented as a vector in a multi-dimensional space, capturing the semantic relationships between cricket-related terms.

For a deeper understanding, you may explore additional insights in the following articles:

1. https://www.datastax.com/guides/what-is-a-vector-embedding

2. https://www.pinecone.io/learn/vector-embeddings/

3. https://weaviate.io/blog/vector-embeddings-explained/

A specialized type of database known as a vector database is essential for storing these numerical representations. In a vector database, data is stored as mathematical vectors, providing a unique way to store and retrieve information. This specialized database greatly facilitates machine learning models in retaining and recalling previous inputs, enabling powerful applications in search, recommendations, and text generation.

Vector retrieval in a database involves finding the nearest neighbors or most similar vectors to a given query vector. These are metrics for finding similar vectors:

1. The Euclidean distance metric considers both magnitudes and direction, providing a comprehensive measure for assessing the spatial separation between vectors.

2. Cosine similarity focuses solely on the direction of vectors, offering insights into their alignment within the vector space.

3. Dot product similarity metric takes into account both magnitudes and direction, offering a versatile approach for evaluating the relationships between vectors.

ChromaDB, PineCone, and Milvus are a few examples of vector databases.

For our application, we will be using LangChain, OpenAI embedding and LLM, and ChromaDB.

  1. We need to install Python packages required for this application.

CODE: https://gist.github.com/velotiotech/d77b50b04b73d48cb0d34239f4542dd5.js

A. LangChain is a tool that helps you build intelligent applications using language models. It allows you to develop chatbots, personal assistants, and applications that can summarize, analyze, or respond to questions about documents or data. It's useful for tasks like coding assistance, working with APIs, and other activities that gain an advantage from AI technology.

B. OpenAI is a renowned artificial intelligence research lab. Installing the OpenAI package provides access to OpenAI's language models, including powerful ones like GPT-3. This library is crucial if you plan to integrate OpenAI's language models into your applications.

C. As mentioned earlier, ChromaDB is a vector database package designed to handle vector data efficiently, making it suitable for applications that involve similarity searches, clustering, or other operations on vectors.

D. LangChainHub is a handy tool to make your language tasks easier. It begins with helpful prompts and will soon include even more features like chains and agents.

E. PyPDF2 is a library for working with PDF files in Python. It allows reading and manipulating PDF documents, making it useful for tasks such as extracting text or merging PDF files.

F. Tiktoken is a Python library designed for counting the number of tokens in a text string without making an API call. This can be particularly useful for managing token limits when working with language models or APIs that have usage constraints.

  1. Importing Libraries

CODE: https://gist.github.com/velotiotech/6aa6abe9dcb09844ab001454fd64fc4f.js

  1. Initializing OpenAI LLM

CODE: https://gist.github.com/velotiotech/5915bab23f151385a2e53e80045e87cd.js

This line of code initializes a language model (LLM) using OpenAI's GPT-4 model with 8192 tokens. Temperature parameter influences the randomness of text generated, and increased temperature results in more creative responses, while decreased temperature leads to more focused and deterministic answers.

  1. We will be loading a PDF consisting of material for training the model and also need to divide it into chunks of texts that can be fed to the model.

CODE: https://gist.github.com/velotiotech/b77325619b91daa216bd1835984ef0b1.js

  1. We will be loading this chunk of text into the vector Database Chromadb, later used for retrieval and using OpenAI embeddings.

CODE: https://gist.github.com/velotiotech/722a8803fced89ba17293b4a5e221478.js

  1. Creating a retrieval object will return the top 3 similar vector matches for the query.

CODE: https://gist.github.com/velotiotech/ae8954ac67b4903614f3c80473802488.js

7. Creating a prompt to pass to the LLM for obtaining specific information involves crafting a well-structured question or instruction that clearly outlines the desired details. RAG chain initiates with the retriever and formatted documents, progresses through the custom prompt template, involves the LLM , and concludes by utilizing a string output parser (StrOutputParser()) to handle the resulting response.

CODE: https://gist.github.com/velotiotech/23984ff2c09a42d679549326e574b8f7.js

  1. Create a function to get a response from the chatbot.

CODE: https://gist.github.com/velotiotech/ae9964370ff5e2cbb786524d7ed245a8.js

We can integrate the Streamlit tool for building a powerful generative app, using this function in Streamlit application to get the AI response.

CODE: https://gist.github.com/velotiotech/2991be28b53f5a98696d3e73a3dad8c5.js

Performance Insights

Conclusion

In our exploration of developing health insurance chatbots, we've dived into the innovative world of retrieval-augmented generation (RAG), where advanced technologies are seamlessly combined to reshape user interactions. The adoption of RAG has proven to be a game-changer, significantly enhancing the chatbot's abilities to understand, retrieve, and generate contextually relevant responses. However, it's worth mentioning a couple of limitations, including challenges in accurately calculating premium quotes and occasional inaccuracies in semantic searches.

Did you like the blog? If yes, we're sure you'll also like to work with the people who write them - our best-in-class engineering team.

We're looking for talented developers who are passionate about new emerging technologies. If that's you, get in touch with us.

Explore current openings