Intro to LLM Applications
Explore a micro illustration of a common LLM app pattern, using Python and OpenAI's GPT-3.5-Turbo model via API. Understand the key ideas behind tools like vector databases, Langchain/LlamaIndex, data extraction, and function calling.
from pathlib import Path
import openai
We’ll demo a micro illustration of a common LLM app pattern, using basic Python along with access to OpenAI’s GPT-3.5-Turbo model (accessed via API).
The purpose is to understand the key ideas underlying more complex tools like vector databases, Langchain/LlamaIndex, structure data extraction and function calling, etc.
We won’t cover creating and training new large language models – we’ll assume that we already have one.
Roadmap: end-to-end overview
openaikey = Path('openaikey.txt').read_text()
openai.api_key = openaikey
model="gpt-3.5-turbo"
Most apps are based around two kinds of prompts:
There are various tricks and techniques for eliciting various behaviors from different models … but the basics are straightforward.
# Define the system message
system_msg = 'You are a helpful assistant.'
# Define the user message -- i.e., the prompt
user_msg = 'What is your favorite place to visit in San Francisco?'
Now we can ask the LLM to respond. OpenAI’s ChatCompletion
API simplifies and implements the pattern.
# call GPT
response = openai.ChatCompletion.create(model=model,
messages=[{"role": "system", "content": system_msg},
{"role": "user", "content": user_msg}])
response.choices[0].message["content"]
Since we’ll be interacting a lot, we can wrap this logic in a helper function. We’ll hide most of the params for now, but expose an optional “temperature” which specifies how creative (or chaotic) we would like the model to be.
def quick_chat(user, temp=1.0):
response = openai.ChatCompletion.create(model=model, temperature=temp,
messages=[{"role": "system", "content": 'You are a helpful assistant.'},
{"role": "user", "content": user}])
return response.choices[0].message["content"]
quick_chat(user_msg)
A low temperature may produce more spare, conservative responses with less likelihood of hallucination
quick_chat(user_msg, temp=0.1)
A higher temperature produces more creative responses … but there may not be a huge difference
quick_chat(user_msg, temp=1.8)
Many common facts are heavily covered in the LLM training data, so the model can easily return them.
But what happens if we ask an unusual or impossible question?
quick_chat("Who is the CFO of Monkeylanguage LLC?")
Well-tuned LLMs should decline to provide an answer … although less-well-tuned ones may simply make up (“hallucinate”) an answer.
A common category of LLM apps attempts to use the LLM as a sort of natural language user interface to query specific information. Where the information is not likely in the training data, and we don’t want hallucinated answers, there is a simple trick:
Jam the relevant facts into the prompt.
Let’s try that by adding in some organization info for a fictional company, Monkeylanguage LLC, into our chatbot prompt.
base_prompt = """
You are a helpful assistant who can answer questions about the team at Monkeylanguage LLC, an AI startup.
When answering questions, use the following facts about Monkeylanguage LLC employees:
1. Juan Williams is the CEO
2. Linda Johnson is the CFO
3. Robert Jordan is the CTO
4. Aileen Xin is Engineering Lead
If you don't have information to answer a question, please say you don't know. Don't make up an answer
"""
Since we’re modifying the base prompt now, we’ll need to update our quick chat shortcut function to allow us to pass the new system prompt along with a user prompt
def chat(system, user):
response = openai.ChatCompletion.create(model=model,
messages=[{"role": "system", "content": system},
{"role": "user", "content": user}])
return response.choices[0].message["content"]
Now we can ask about our fictional company
chat(base_prompt, "Who is the CFO of Monkeylanguage LLC?")
chat(base_prompt, "Who are all of the technical staff members at Monkeylanguage LLC?")
But how do we get the right content to insert into the prompt?
We use a trick:
In production apps, we usually use a database that supports semantic matching to natural language texts via embedding vector similarity – “vector databases”
But we can demonstrate this with a toy database
database = {
'Monkeylanguage LLC' : ['Juan Williams is the CEO', 'Linda Johnson is the CFO', 'Robert Jordan is the CTO', 'Aileen Xin is Engineering Lead'],
'FurryRobot Corp' : ['Ana Gonzalez is the CEO', 'Corwin Hall is the CFO', 'FurryRobot employs no technical staff', 'All tech is produced by AI'],
'LangMagic Inc' : ["Steve Jobs' ghost fulfills all roles in the company"]
}
prompt = 'Who is the CFO at Monkeylanguage LLC?'
We’ll define a trivial lookup
helper that returns all of the facts for the first company whose name (the dict key) is in the query
def lookup(prompt, database):
for k in database.keys():
if k in prompt:
return database[k]
docs = lookup(prompt, database)
docs
We can code a helper to build the system prompt from a set of relevant documents
def make_base_prompt(docs):
return """
You are a helpful assistant who can answer questions about the team at some AI startup companies.
When answering questions, use the following facts about employees at the firm:
""" + '\n'.join([doc for doc in docs]) + """
If you don't have information to answer a question, please say you don't know. Don't make up an answer"""
make_base_prompt(docs)
And now we can “chat” with our “data”
def retrieve_and_chat(prompt, database):
docs = lookup(prompt, database)
base_prompt = make_base_prompt(docs)
return chat(base_prompt, prompt)
retrieve_and_chat(prompt, database)
retrieve_and_chat('Who is the CFO at FurryRobot Corp?', database)
Some queries are “harder” … and the model may not get it right on the first try without either more data or more sophisticated prompting.
But in this example, the model usually gets the right answer in one or two tries
retrieve_and_chat('Who is the CFO at LangMagic Inc?', database)
The process we’ve just implemented – albeit with more data, a more sophisticated approach to storing and querying, and more complex prompts – is at the heart of many LLM-powered apps.
It’s a pattern called “Retrieval Augmented Generation” or RAG
In order to interface the LLM to the “real world” we can ask the LLM to generate a function call or API call based on our interaction.
We can then use that API or function call to trigger a real-world result, like a grocery order.
How does this work?
The essence of teaching a LLM to use functions is just more prompt engineering.
We can either provide the LLM with all of the available tools, or we can retrieve relevant tools from a larger collection based on the user prompt. We can even have the LLM itself choose the tools via patterns like RAG that we saw earlier
tools = ['If you wish to email, return the function call EMAIL(email_subject, email_body), inserting the relevant email_subject and email_body.']
We’ll inject the tool description(s) into the base prompt
def make_enhanced_base_prompt(docs, tools):
return """
You are a helpful assistant who can answer questions about the team at some AI startup companies.
When answering questions, use the following facts about employees at the firm:
""" + '\n'.join([doc for doc in docs]) + """
If you don't have information to answer a question, please say you don't know. Don't make up an answer.
You can also use tools to accomplish some actions.
""" + '\n'.join([tool for tool in tools]) + """
If you use a tool, return the tool function call and nothing else.
"""
make_enhanced_base_prompt(docs, tools)
And now we can ask the AI to do something … and hopefully it will produce the right invocation
chat(make_enhanced_base_prompt(docs, tools),
'Please send an email advertising a new role as assistant to the CFO of Monkeylanguage LLC. Name the CFO, and send the email from the CEO')
In a nutshell, that is many (maybe most) of the AI-powered apps that are being built today.
Packages like LlamaIndex, LangChain, and others help automating sophisticated patterns of prompt generation and content/tool merging.
And semantic vector databases (along with proper “chunking” and ingestion of relevant datasets) provide knowledge to the LLM beyond what it learned in training.