- HeyCloud
- Posts
- Agent Oriented Programming: A Framework for building LLM Agents
Agent Oriented Programming: A Framework for building LLM Agents
How to build and orchestrate autonomous LLM agents
Agent Oriented Programming: A Framework for building LLM Agents
Hello, Hadi here 👋 I’m just back from vacation, so apologies for not sending articles in the past couple weeks.
Today’s issue is a technical one. I’m sending you a summary of my paper on building and chaining together LLM agents. The full paper can be read here: https://github.com/HeyCLI/heyCLI_client/blob/main/AOP_short.pdf
Introduction
Powerful LLMs, such as GPT-4, are very good at reasoning over text, which opens what seems to be endless possibilities when hooked to code. For example, you can ask GPT-4 to scan a web page for vulnerabilities and propose fixes. You can then hook the output to some code that executes these fixes. LLMs should not be looked at as databases, where data is stored at training time and retrieved at inference time. Instead, LLMs should be regarded as reasoning engines that need external tools to reliably achieve objectives.
For example, LLMs may hallucinate WW2 events or mess up Gmail API calls. However, given access to Wikipedia and the Gmail API documentation, LLMs have higher chance of successfully responding to questions or executing tasks. With that in mind, we suggest that a new Software Paradigm is emerging and will be centered around LLM agents.
An LLM agent is a software entity, or a programme, that uses an LLM for reasoning and a set of tools to execute tasks. In this paper, we attempt to define this new paradigm.
Definitions
We define the following entities:
• Agent: a software entity that is defined by its context, objective and state.
• Context: text that describe the agent, its environment and its memories
• Agent description: text that describes the capabilities and tools available to the agent
• Local environment: a collection of textual passages that describe parts of the global environment that are relevant to the agent. The environment (global and local) evolves over time.
• Agent’s memory: a collection of textual passages that record tasks, results and other information that the agent encounters during its lifetime.
• Objective: text that describes either:
- A task that the agent needs to perform (usually a high level, complex task). Eg, send prospecting emails to clients
- Or a result that the agent needs to achieve, Eg. emails sent to clients. An agent can only have one objective at a given time.
• State: text (word) that indicates if the agent is: running, idle (stopped) or terminated. An agent has only one state at a given time.
Notes:
The only unique attribute of an agent is its description. Two agents can have the same objective, global context etc but two agents cannot have the same description. If two agents have the same description then they are considered the same agent.
LLMs are specialised tools, as they can only output text from an input text. A function that calls an LLM is a specialised tool as well.
Tools can use other tools, just like programmes in general can use other programmes. For example, the python interpreter can use the Airbnb API.
LLM agents, as described above, are universal tools. An LLM agent can use other LLM agents as tools.
When to create an agent vs a function
A reasonable question to ask when working with LLMs is: should a given task be implemented as a function that uses an LLM or should it be implemented as an agent. We suggest the following rule of thumb: If the objective requires cognitive planning into tasks and iterative reasoning over intermediary results, then an agent is more suitable. In all other cases, simple functions are simpler and enough.
For example:
Objective1: “Parse a text file and extract all references to food.”. Available tools: file system. This objective could be achieved in one shot by the LLM, so no need to build an agent for it. In this case, the function using the LLM should be considered as a tool not an agent.
Objective2: “Create an EC2 instance.” Available tools: AWS CLI. This objective is a high level task that needs to be planned into lower level tasks, including for example:
• translate the objective to an AWS CLI command: awsec2 run-instances –image-id <image-id> –count 1 –instance-type <instance-type> –keyname <key-name> –security-group-ids <security-group-id> –subnet-id <subnet-id>
• List the available images: ”aws ec2 describe-images” then pick and return one of them.
• List the available instance types: ”aws ec2 describe-instance-types” then pick and return one of them.
• List the available key pairs: ”aws ec2 describe-key-pairs” then pick and return one of them.
• List the available security groups: ”aws ec2 describe-security-groups” then pick and return one of them.
• List the available subnets: ”aws ec2 describe-subnets” then pick and return one of them.
• Replace the placeholders with valid values
• Execute the command An agent is clearly more suitable for this case.
A framework for building a (universal) LLM agent
Here we propose an intuitive framework for building a universal LLM agent, that is coherent with the AOP paradigm. As defined in the “Definitions” sections, there are two types of tasks:
• Atomic: that can be executed by tools without need for planning and reasoning
• Composite: that need LLM intervention for planning and reasoning over the input and intermediary results.
Agents are only relevant when solving complex tasks. To solve a complex task, the agent does the following:
Input: Given an objective O:
Plan: this step is similar to the Map step in Map-Reduce. • Divide the task into subtasks, with defined inputs and outputs • For each subtask, decide and attribute priority • Decide concurrency amongst tasks • For each subtask, pick the most useful tool (Note: it can be another agent, slave agent) Note: it is important to note that the Plan can be updated over time based on changes in the context that is accessible to the agent.
For each subtask, launch execution according to decided priority and concurrency
For each subtask execution, evaluate the returned result: (a) If the result is good: keep the result (b) Else, launch a corrective task
Aggregate and return: this step is similar to the Reduce step of MapReduce. It consists of Aggregating results of all subtask executions and return the final result.
AOP: Agent Oriented Programming framework
Output: Return a final result for O.
As can be noticed, this framework is an adaptation of the ReAct framework to distributed agents and complex tasks. We suggest the name: PAIR for: Plan, Act, Iterate, Return. Note that this framework allows for recursion: a subtask can be the objective of another agent, which also can use other agents as tools and so on.
Conclusion
LLMs are changing the way we write software, moving from pre-defined static code written by humans to dynamically generated code based on a changing context. The software building block used to be functions, classes and objects. In the new paradigm, the building blocks are: • a reasoning engine: LLM • LLM agents • and tools In this paper we contributed some ideas to inch closer to this new paradigm. We will keep iterating on this paper and we will release the AgentChain code in the near future.