The Rise of Autonomous Agents: AutoGPT, AgentGPT, and BabyAGI

In 2022, AMC released one of the best sci-fi series of this century, a short-lived animation called Pantheon, based on the short stories from author Ken Liu (The Hidden Girl and Other Stories). The story follows a 14 year-old girl called Maddie Kim, an introverted girl who one day discovers that her deceased father was turned into a UI (uploaded intelligence), this sends Maddie into a web of deceit and conspiracies, as businesses and governments all across the globe compete to create the first fully functional autonomous agent. The end result? A singularity, and the collapse of the world as we know it.

Take away some of the most dramatic aspects of Pantheon, and the story is almost prophetic, it’s 2023 and it seems that this is a year that is going down in history as the year of artificial intelligence. One day we were unaware of just how much AI was part of our daily life, and the next we are seeing hundreds of articles and social media threads about large language models and what they are accomplishing. We are seeing every tech juggernaut dropping whatever project they were working on and putting AI at the forefront.

OpenAI may have had the lead, but everyone wants a piece of the pie, even small startups. Just a few months ago, it seemed impossible to run an LLM on anything but a massive server farm, and yet here we are with numerous contenders running LLaMA-derived models on modest resources. Smaller players are innovating rapidly, leveraging new strategies—sometimes including software development outsourcing to specialized teams—to scale faster and compete with the giants.

And that’s not even taking into account the army of small businesses using APIs to work with big models. I don’t know how much data OpenAI is processing daily, but with the amount of “too many requests” errors our team has faced when working with GPT-3.5 it’s a safe guess to say that they are almost at their current capacity.

What’s the next step? Who is going to win this arms race? According to a Google employee the odds favor the small developers and the open-source community. It would be a massive mistake to focus on the big fish and lose sight of some of the most interesting and powerful implementations of AI coming from small communities, for example, Autonomous Agents.

Autonomous Agents

If you’ve played around with any of the modern large language models, you already know the gist of it. It’s a chat-like environment where you write some text, and the model returns some text. For example, if I were to write “Please write an article about AutoGPT” it would do its best to talk about it. In this particular case, if we were to use ChatGPT, it would either reply that it doesn’t know what that is, or it would hallucinate some very creative but made-up answer. Why? Because ChatGPT’s cut-off point is 2021, in other words, it hasn’t been trained on anything afterward.

Now, of course, there are ways around this. For example, I could write a Python program that makes a web search, gathers the top 10 results, passes it to chatGPT for short burst training, and then prints out the output. Not a perfect solution by any means, but good enough for a quick and dirty way to escape the OpenAI sandbox.

With that, we have an AI that is “connected to the internet” (ok, not really, but it’s good enough for this example). Now imagine that I extend my Python script so that it grabs the chatGPT output, checks if it’s a Python code, and runs it. Now we have an AI that is connected to the internet and is able to run code (for those enthusiasts out there, if you want to try something like that, please use a virtual machine).

At this point, we have a rudimentary agent.

A computer agent is a software program that can perform tasks on behalf of a user or another computer program. Agents are typically designed to be autonomous and proactive, meaning that they can make decisions and take action without the need for human intervention.

While not fully autonomous, our baby agent has enough independence to pull off some really quirky stuff. That’s why you should run it on a virtual machine, we really don’t know what kind of code it’s going to execute at the end of the day. We can keep on building on top of this program, for example, we could introduce a way for our language model to first create a series of steps to achieve our objective. Then we could pass each step to our language model, test the outcome and retry it, go to the next task or create sub-tasks depending on the solution.

And little by little, layer by layer, we keep on adding functionalities to our agent. Notice how after our first instruction (the one that gets the ball rolling), our agent is going to start using inner dialogue to keep on working on each task. For example, if the code returns an error, the agent will tell itself, “Oops, something went wrong; let’s debug this and try again”, no need for a pesky human supervising their work. If that’s sending a chill down your spine, that’s good, it means that you are already starting to see the implications.

Computer agents are like little helpers that make our digital lives easier. They can do all sorts of tasks for us without us even realizing it – kind of like how a personal assistant takes care of things behind the scenes so their boss doesn’t have to stress out about everything.

There are three main types of computer agents: reactive agents, deliberative agents, and hybrid agents.

Reactive Agents

These guys are like pure instinct. They react to specific stimuli in their environment without any awareness or analysis of context beyond what they’re explicitly programmed for. It’s like when you install antivirus software on your laptop – it immediately jumps into action when there’s a suspicious file detected in its system.

Deliberative Agents

On the flip side, we’ve got deliberate agents – these guys think before they act (just like we should!). They reason through problems by using past experiences and knowledge stored within their databases to make informed decisions based on current circumstances. Think of Siri or Alexa when asking them questions – they respond after processing multiple sources for data before providing an answer.

Hybrid Agents

The third type is where things get wild: hybrid combinations! These bad boys mash up both reactive and deliberative agent characteristics allowing them to handle dynamic environments with constantly changing conditions as well as problem-solving mission-related scenarios with efficiency unmatched by other types.

Our example would fall somewhere between hybrid and deliberative. But with enough effort and dedication, we could turn it into a fully-fledged hybrid agent like autoGPT, babyAGI or agentGPT.

A New Challenger Arrives: AutoGPT Vs. ChatGPT

AutoGPT is an open-source, experimental application that uses OpenAI’s GPT-4 language model to achieve autonomous goals. It was created by game developer Toran Bruce Richards and released in March 2023.

Much like our example, AutoGPT works by breaking down a user-defined goal into a series of sub-tasks. It then uses GPT-4 to generate text and code that can be used to complete these sub-tasks. AutoGPT can be used to perform a variety of tasks, including:

Writing code
Generating text
Translating languages
Answering questions
Solving problems

AutoGPT is still under development, in fact, if you go and visit the GitHub for the project, it has more warnings than a medicine bottle. It’s unstable, unreliable, and can absolutely destroy your wallet with queries to the OpenAI API. But it also has the potential to be a powerful tool for automating tasks, and improving efficiency. It is also a valuable tool for developers who want to learn more about GPT-4 and how it can be used to create autonomous applications.

Here are some of the benefits of using AutoGPT:

It can automate tasks: AutoGPT can be used to automate a variety of tasks, such as writing code, generating text, translating languages, answering questions, and solving problems. This can save you time and effort, and it can also help you to be more productive.
It is easy to use: AutoGPT is very easy to use. You simply need to define a goal, and AutoGPT will do the rest. There is no need to write any code or learn any complex commands.
It is powerful: AutoGPT is powered by GPT-4, which is one of the most powerful language models in the world. This means that AutoGPT can be used to perform a wide variety of tasks, and it can do so with a high degree of accuracy.

I can’t stress this enough, autoGPT is the first of its kind, and it’s absolutely unreliable, it would be absolute madness to actually try and deploy it in a production environment. But on the other hand, if you are thinking about building autonomous agents, checking the GitHub repository for this project is a must. There are so many good ideas in this project that can be taken, redefined, and adapted for other environments.

The Simple Solution: AgentGPT

AgentGPT is like a Swiss army knife for any CTO looking to supercharge their team’s productivity. Imagine a super-efficient assistant that can help you with tasks ranging from developing a marketing strategy to building a website with very little human input – that’s AgentGPT for you.

You see, AgentGPT is a platform that creates AI agents to cater to your goals, much like autoGPT. It’s an open-source project that leverages OpenAI’s GPT-3.5 and GPT-4 models. Think of it as an evolved cousin of ChatGPT that can not only converse but also autonomously create its own tasks, browse the web, and even send new agents into the digital battlefield to accomplish its assigned mission.

The best part? It’s like a friendly neighborhood superhero. You don’t need to be a coding wizard or possess any special technical knowledge to use AgentGPT. Don’t want to deal with dockers, setting up environments, and other tech stuff? Want to try out what autonomous agents have to offer right away? Then AgentGPT is the simplest solution.

Accessing AgentGPT is as straightforward as ordering a pizza. All you need to do is visit the AgentGPT website, or, if you’re more of a DIY person, you can grab the code from the official GitHub repository and install it on your local system.

Once you’re in, you have three levels of access. You can play as a guest with limited tokens and no ability to save agents. Level up by creating an account, and you can manage accounts and save deployed agents. The top level requires an OpenAI API key and unlocks advanced features like setting the agent focus level and the maximum number of loops.

Getting AgentGPT requires almost no work at all. You need to assemble and configure an agent, assign it a goal, and deploy it. It’s literally just giving it a name and a goal. It’s like naming your new pet and teaching it tricks. When I created my first agent, I named it “Deal Finder”. You can choose any name as long as it’s related to the agent’s function or goal.

Now, this is where it gets interesting, configuring your agent. This is where you fine-tune your agent’s behavior. It’s like choosing the ingredients for a complex recipe. You have the option to select the GPT model, execution mode, level of focus, tokens, and maximum loops. It’s crucial to strike the right balance – too high or too low, and you might end up with a burnt dish or undercooked pasta.

In this case, aim too high you’ll have an erratic and unfocused AI, too low and your AI is a rather tame and predictable agent who’s going to do the absolute minimum. Once you’ve assembled and configured your agent, it’s time to let it loose in the digital wilderness. Deploy your agent, and then you can monitor its journey on the main console of the website.

Sounds fantastic, right? Well, much like its tech-savvy cousin AutoGPT, it’s still quite unreliable and depends on OpenAI’s model. Still, in my personal experience, it’s a fun little experiment that can really grow into a user-friendly industry leading tool.

The Best for Last: BabyAGI

One of the main issues with large language models is that they are amnesiacs. Close the window or delete the chat, and your trusty AI companion is gone forever. But what if we could take inspiration from humanity and give it a long-term memory? Enter Yohei Nakajima’s BabyAGI based on his paper “Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications”. As the name implies, it’s a tech stack featuring three key components. GPT, Pinecone, and LangChain

Pinecone

Pinecone is a vector database service designed to provide efficient and scalable vector search capabilities. It was launched with the goal of enabling businesses to build applications that leverage machine learning more easily and effectively. The service is cloud-based and fully managed, which means users don’t need to worry about managing infrastructure, scaling, or updating systems — Pinecone handles all that.

Here’s a more detailed look at how Pinecone works:

Embedding and Indexing:

Pinecone starts by embedding data into a vector space using a machine learning model. This embedding process turns text, images, or other data into a numerical vector that captures its essential features. Pinecone indexes your embedded vectors for efficient search.

Vector Search:

Enter a query vector and search your database for similar vectors. Pinecone uses an approximate nearest neighbor (ANN) search algorithm to efficiently and scalable search large databases.

Updating the Index:

The index can embed new data without rebuilding it. Pinecone is perfectly suited for applications with changing data.

Scaling and Management:

Pinecone is built for large applications. As your database grows, it manages infrastructure, scales, and optimizes search operations. Developers can focus on app development without worrying about infrastructure due to this scalability and management.

LangChain

Harrison Chase’s October 2022 introduction of the extraordinary open-source project LangChain caused quite a stir in the IT industry. It has gained a lot of attention and investments, including a $20 million funding round from Sequoia Capital, thanks to its quickly expanding community on GitHub, Twitter, Discord, and other platforms.

It’s a novel architecture that works with a broad variety of systems and services, from cloud storage providers like Amazon and Google to language models like OpenAI, Anthropic, and Hugging Face. It serves as a unified and expandable platform for a wide variety of applications.

The range of possible applications is enormous. You may use the news, movie listings, and weather API wrappers. It is capable of running shell programs, crawling the web, and even generating few-shot learning prompts. From PDF manipulation to SQL, this tool has you covered.

It is compatible with a wide range of document types and data sources. and non-relational databases (NoSQL). In addition to its data management capabilities, LangChain can also generate, analyze, and debug scripts written in Python and Java. When all of these elements are combined, we get one of the most sophisticated autonomous agents possible.

Again, it’s not flawless, but it uses some cutting-edge machine learning techniques to build a capable AI companion with room for growth. BabyAGI also has the added bonus of being able to run on models based on either GPT-4 or LLaMA. So, the open source community will likely become more invested in BabyAGI.

What’s Next?

Perhaps it is too soon to put these tools into production for any significant assignment, but I would wager my life that autonomous agents have the potential to steal the spotlight from large language models. I can envision complicated multimodal bots in the future producing not just text but also visual and audio content. Even if computers lack consciousness, I have little doubt that they have already passed the Turing test.

If you enjoyed this article, check out one of our other AI articles.