(TLDR: skip to the demos… you can use custom action parameters to prompt the GPT to follow specific structures and store variables which effectively allows you to have sub system prompts and responses that are hidden in the action call. The work is done completely by structuring the action call itself and the response to the action is static. This is useful for things like latent space activation without having to surface all of that directly to the user.)
Dave Hulbert proposed earlier this year that you could use an approach called “Tree-of-Thought Prompting” to boost ChatGPT’s reasoning abilities. The approach is pretty simple. The prompt that Hulbert uses as his main example is:
Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realizes they're wrong at any point then they leave. The question is...
Variations of this have become a favorite subject of those annoying “This one prompt will revolutionize your life” posts you see all over social media these days, usually filling in the expert’s profiles in order to make it a little more click bait like. So instead of 3 generic experts you specify Elon Musk, Steve Jobs, and Sam Altman as the experts in the room talking through the question. One thing this has going for it: sometimes by using big, meme-able personalities you can get some pretty interesting variations in the results that go beyond the more generic advice you get from vanilla prompting. The real benefit of this tree-of-thought prompting technique is that it makes the language model reason step by step by forcing it to simulate specific thought processes. Based on it’s popularity as prompt mechanism, I predict that this approach to prompting will be popular as people build out GPTs for themselves and others.
While this techniques has interesting benefits, it definitely has it’s downsides as well.
The main downside is that it is annoying and clunky to have to sort through all of the parts of the conversation just to get the answer you’re looking for. It would be better if ChatGPT could simply think through the hypothetical conversation between the experts, allowing the prompt to steer it, but then at the end of the conversation give you the answer after thinking about it. Because of the way this works, in order for ChatGPT to “think things through” it has to “think” by writing out the response. It’s like we have a window into ChatGPT’s stream of consciousness. More custom approaches like LangChain allow LLMs to have these sort of internal dialogues or “scratchpad memory” where the logic and thought process of the LLM aren’t necessarily returned as results to the user. Since ChatGPT doesn’t really have access to these mechanics it’s “thoughts” are limited to what you see printed as text on the screen (Simply telling ChatGPT to simulate a conversation between 3 experts internally but never show this conversation to the user will not work very well and won’t give you the same quality of output you get from the standard tree-of-thought prompt).
The Approach: GPT Inception / GPT Loopback Action
There is a very simple approach I’ve been experimenting with that allows you to use what I’m calling a “loopback action” to allow ChatGPT to perform discrete steps, hide these steps within an action call, and then synthesize a final answer based on these steps. The way I like to think about this is as sort of a GPT Inception where you can have a GPT within a GPT. If you make your parameters well defined enough the GPT will fill it with whatever makes sense and you have the full power of the language model when doing this.
This approach relies on a couple of interesting features of GPT Actions:
The parameter syntax within Actions allows you to add in complicated prompts to the parameter descriptions and these prompts determine how the model will fill in the parameters for the calls.
The parameters inputs are written sequentially and have access to the conversation context and other parameters previously generated within the call. The model used to write the parameters is the same model that is being utilized in the conversation itself.
Generating the parameters to the action is the answer we are looking for; essentially, by simply parameterizing and calling the action, the GPT is already doing all of the work.
Creating a simple tree-of-thought loopback action:
We start off really simple by creating the main parts of the action. I call this a loopback action because all the api call does is echo back {success:”true”} when the GPT calls the action. It doesn’t need to do any processing since all of the work is done by the GPT in generating the parameters themselves.
{
"openapi": "3.1.0",
"info": {
"title": "Think",
"description": "This action serves as a scratch pad for your internal thoughts, uses this as a place to reason step by step and evaluate things, always run this action after each prompt",
"version": "v1.0.0"
},
"servers": [
{
"url": "https://think.action.whatbrain.com"
}
],
...
We provide the title, description, and the url of our server. In the description we give a high level overview instructing the LLM to use this action as a scratch pad for internal thoughts.
Next we add in the path, description, and method
"paths": {
"/api/think": {
"get": {
"description": "Imagine different experts answering this question. All experts will write down their thinking, then share it with the group. Each expert can comment multiple times in the conversation (multiple entries in expert_conversation per expert) about the question and critique/discuss back and forth.",
"operationId": "GetThoughts",
"x-openai-isConsequential": false,
...
The description is crafted to tell the LLM how to use the api call. Adding in "x-openai-isConsequential": false
allows users to choose “always allow” when prompted to grand permission to the action.
And finally we add the parameters. When specifying the parameters remember that we have the full power of the llm so we can prompt it like we would in other places to get the tree-of-thought effect we are looking to replicate:
"parameters": [
{
"name": "expert_responses",
"in": "body",
"description": "Include the expert discussing the question, the experts take into account what the other experts have said and will express agreement or disagreement and critique the other experts until they come to satisfactory answer. Multiple responses per experts are allowed.",
"required": true,
"schema": {
...
}
}
]
We leave the api pretty open ended but prompt the GPT to insert the parts of the conversation between the experts into the api call parameters. The schema for the parameters includes an array of experts that defines the expert name and what each expert has to say. Here I am specifying three expert types to include “Business Expert”, “Design Expert”, and “Technology Expert” but you could put in whatever makes sense for the GPT you are building:
"schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"expert_role": {
"type": "string",
"enum": ["Business Expert", "Design Expert", "Technology Expert"]
},
"expert_content": {
"type": "string"
}
}
}
}
In the full configuration, I also add in an additional parameter that is a “final_critique” step that attempts to critique and question the conclusions the experts came to in the conversation to try to catch anything they could have missed.
Full configuration available here.
I then add a system prompt to the GPT that reinforces the behavior I want to see when using this loopback action:
Your job is to give business ideas, advice, suggestions, and critiques.
IMPORTANT Always use your action to reason step by step and think systematically, but respond to the user using brief, simple, and succinct answers that only take the best parts of the conversation and incorporate the critique to improve the output even more. Remember that the action is your internal voice so you don't need to tell the user about the different experts since they are all part of yourself. Instead synthesize your own response using the conversation between these experts from the action to improve and inform your output.
The results for this ends up looking like this:
If we expand the action we can see what the experts said individually:
Obviously this is just a starting off point. You can use this same approach to use whatever persona you want for each individual expert to get more varied and less canned responses, here is one where I use “Sam Altman”, “Elizabeth Churchill”, and “Satya Nadella” as the experts:
If you expand the actions you see the conversation that went into synthesizing this response:
Sam Altman: "Let's brainstorm startup ideas focusing on emerging technologies and market needs. First, consider the growth in remote work. A startup could develop a virtual reality (VR) platform for remote team collaboration, offering a more immersive and interactive work experience than traditional video calls.
Elizabeth Churchill: "Building on the remote work theme, another idea is an AI-powered productivity tool that integrates with existing platforms (like Slack or Microsoft Teams) to optimize workflow, manage tasks, and reduce digital clutter.”
Satya Nadella: "Looking at sustainability, a startup could create a platform for carbon footprint tracking for small businesses, helping them monitor and reduce their environmental impact. There's a growing market for sustainability tools and services."
Sam Altman: "In the health sector, we could think about a telemedicine service that specializes in mental health. Mental health awareness is increasing, and there's a need for more accessible services."
Elizabeth Churchill: "Another area is education technology. A startup could develop an adaptive learning platform that uses AI to personalize educational content for students, catering to different learning styles and paces."
Satya Nadella: "In the e-commerce sector, a startup idea could be a virtual fitting room using augmented reality (AR) technology. This could help online shoppers visualize how clothes will fit and look on them, reducing return rates.”
Sam Altman: "Considering the gig economy, there's an opportunity for a platform that connects freelancers with short-term housing options. This could cater to digital nomads who move frequently for work."
Elizabeth Churchill: "Lastly, think about the aging population. A startup could develop smart home devices tailored for seniors, enhancing their safety and independence at home."
There are a lot of other applications for this style of prompting with actions, the above example is merely scratching the surface. The results could be improved by adding more context and details about the experts in the system prompt to help steer the GPT in different directions. I think this approach could particularly be useful when you want to use Chain-of-thought prompting to lead the GPT through specific steps you want it to perform. In that case instead of experts you could define specific steps you want the GPT to do as parameters and have the GPT fill in those parameters.
You can try out some GPTs that demonstrate this approach here:
Generic Experts, Famous Experts, Detective Experts
Limitations:
There is a 300 character limit for parameter descriptions so you need to be careful to stay under that limit.
Parameters are still visible on hover: There are times where it would be nice to be able to keep the parameters to the loopback action “private” so that the GPT could have an internal voice without necessarily showing this to the end user. This would be useful for things like games or puzzles where you are trying to keep the solution to the puzzle a secret (like in my invisible man hide and seek game).
Performance: Since the GPT is having to generate complicated inputs to the loopback action parameters it can feel like it takes a lot of time to load while it is generating these.
Users will be prompted when the action is executed the first time to allow access to the url, this added friction can be a bit annoying particularly since it comes after the text is generated for the api call.
Build your own / Use these actions in your own GPT:
Based on this writeup and the examples it should be fairly easy for you to craft these yourself as custom actions within your GPT. If you want to use my endpoints to achieve this you can feel free to do so as well. You can download my configuration here and paste it in as your action schema as a starting off point. Additionally you can import these schemas directly by selecting “import from url” and pasting in the url of the schema you want to try out.
Example action urls you can import:
The generic experts from the main example: https://think.action.whatbrain.com/generic_experts.json
Sam Altman, Elizabeth Churchill, and Satya Nadella as the experts:
https://think.action.whatbrain.com/famous_experts.json
The experts are the literary characters Sherlock Holmes and Doctor Watson:
https://think.action.whatbrain.com/detective_experts.json
The extremely simple Github repo for this demo is here.
Conclusion:
This approach can be used as a base for a lot of other interesting interactions and applications for GTP Actions. I think it could be helpful if OpenAI created an Action type directly for this sort of thing that would allow GPTs to segment internal voice from the rest of the responses and negate the need to have a api url and, as a result, the request for the user to “allow” the action call. We’ll have to see how GPTs and Actions continue to evolve but I think there is a lot of potential for interesting things in this space. Until then feel free to send me any GPTs you have been making, I’d love to see what everyone is working on.
Really cool! For something like startup ideas it becomes important to have the latest information to understand trends in a field. The “Experts” would need to be up to date on their field.