AI Agents — Rise of self-guided systems.

Shaunak Inamdar
8 min readOct 14, 2024

--

No problem can withstand the assault of sustained thinking. — Voltaire.

The evolution of sustained thoughts. Photo

The Homo habilis, considered the first species in the genus Homo, used crafted tools for hunting, cutting and breaking. This is the first observed behaviour of rationality in the human evolutionary chain. The gathering, carving and using rudimentary stone tools suggests a formation of chain-of-thought and an ability to plan.

Through millions of years of experience, we homo sapiens evolved a sense of foresight and inherent skills for organization that has aided us to develop our quality of lives far beyond what has ever been observed in the history of living species. [1] Our ability to strategize tasks has been responsible for our evolution from hunter gatherers, to modern men, and also brought us to the cutting edge of technology.

A system of systems working towards a solution.

In this metropolis of digital technologies, AI is undergoing a similar evolution. AI agents, digital cousins of our own neural networks, are tackling problems with perception, reasoning and planning. Thus mimicking to an extent the cognitive and behavioural patterns of human brains.

Over the past years, we have grown to be familiar with complicated AI systems who seem to possess the Crystal Skull of knowledge called LLMs. These are traditional AI models, impressive with their scale and capabilities, but they are inflexible. These monoliths of data excel at certain tasks, but struggle to adapt and interact with the world around them[2].

The Crystal Skull of Knowledge. Photo: Paramount

The Monolith Problem

Contextual Limitation
LLMs are bounded strongly by their limited training data, and they lack the ability to learn new information in real-time. They have perfect recall, but limited capability to grasp.

Computational Expense
The Attention Mechanism [3] that these LLM transformers run on, limits the scalability of such systems. It has a complexity of O(n²) which means it takes up a lot of computational resources and time to go through long documents, and hence are not easy to scale.

Interface Challenge
LLMs are meant for question-and-answer problems, they are not able to engage in multi-step, complex problem solving that might require them to interface with other tools or resources. This challenge restricts the usefulness of language models in situations where interacting with real-world data or specialized knowledge is required. Which, coincidentally, is what most business use-cases need.

Whenever confronted with existential challenges, life adapts and evolves, finding ways to overcome obstacles in a constantly changing world. And the way AI is evolving is via AI Agents.

AI Agents: The Renaissance minds of the AI age.

Imagine if we took these AI models, and gave them the flexibility of a human mind.
The ability to remember, to reason, to reach out and to reflect.

Remember & Reflect | Raffaelo Santi: “Two Angels”

This is exactly what AI agents or compound AI systems aim to achieve. These digital renaissance minds combine the raw processing and recall powers of the LLM, and combine them with a suite of tools and a system to use them.

The components of cognition

  1. Language model
    At the heart of it all is a language model, typically based on a transformer architecture like GPT or BERT. This serves as the primary reasoning engine of rationality. It interprets queries, generates responses and helps in making decisions.
  2. Knowledge base
    This is the hippocampus of our system. An updatable memory bank which is often implemented with a vector database for efficient similarity search. It helps overcome the contextual limitations of the model.
  3. Motor skills
    The tool integration layer lets the model perform actions in the real world such as search the web, interact with IoT devices etc. It acts as a bridge between the models knowledge and the format for APIs to action.
  4. Control Logic
    The orchestration framework is the most important component of AI agents as it drives the solution of a problem. It is made up of a reinforcement learning model that helps the model learn, and make decisions about what actions to take.
  5. Memory systems
    AI agents incorporate both short-term and long-term memory systems which allows them to maintain context during interactions, and also learn information from past interactions and apply that knowledge to new problems.
A path through a maze. Photo

Plotting a course — Digital Deliberation

The true magic happens when the AI agents start behaving like us while solving problems. By thinking, analyzing, deliberating and then acting.

This is where a framework like ReAct comes into play [3]. ReAct (reasoning + acting) is a framework for approaching a problem like a human would, in an algorithmic fashion.

  • Contemplation: The AI considers the problem, breaking it down into manageable steps.
  • Action: It takes a step towards solving the problem, whether that’s searching for information or performing a calculation.
  • Reflection: The AI observes the results of its action, much like we would assess the outcome of our efforts.
  • Adaptation: Based on what it observes, the AI adjusts its approach, learning and improving as it goes.

ReAct allows agents to alternate between reasoning about their current state, and taking actions to enable a more advanced level of adaptive problem-solving behavior[4].

Initial State + Thought + Action = New State

Progression of state through stimuli. Photo

Working Memory — Attention span of a model

Working memory is a crucial aspect for maintaining coherence in conversations for humans. It is the short term storage of contextual information for any interaction. For an agent, it is required to keep track of multiple pieces of information simultaneously, keeping context in multi-turn interactions, and handling the execution of tasks. Think of it as a waiter taking 5 different meal orders from the same table. The waiter needs to remember key information about each persons order such as dietary restrictions, additions or substitutions at the same time. This requires a working attention span.

Episodic Memory — Active listening ability of a model

Episodic memory is required to store and retrieve information from past experiences and interactions, allowing AI to improvise and adapt over time. The advanced system architecture for AI agents uses temporal indexing to store these events. Each ‘episode’ or piece of information is stored in a database along with a timestamp indicating when it was stored. This can be thought of as a waiter remembering your usual order with your dietary restrictions as well as your usual table at a restaurant. The waiter learns your behaviour by observing and retaining the observations for a long time.

Both the working and the episodic memory work together to allow an agent to manage an immediate task, and long term learning and adaptation. It enables more human like interaction and patterns and problem-solving capabilities.

Arguably the most exciting thing about AI agents is the ability to interact with the real-world outside of their environment. This is facilitated by robust tool integration. It helps extend AI’s functionality beyond mere language processing and enabling it to take action.

Attention span. Photo

Real-world Interaction

To effectively integrate tools, AI agents use a formal Tool Description Language (TDL). This is like a formal ‘instruction manual’ that tells AI how to use a tool, what commands to give, and what results to expect[5].

This way, instead of hardcoding every tool, TDL helps the AI understand and interact with various tools in a more flexible and organized way. However the process of selecting a tool involves several steps.

  • The agent first has to parse the user’s requirements and understand what the expected result is.
  • Then the agent matches the requirements with the available tool descriptions. Allowing it to identify what tools would be appropriate to execute such instructions.
  • With the right tools on-hand, the agent now has to extract the important parameters to give to the tools. This involves techniques like named entity recognition from user’s input and identifying relevant values.
  • Now comes the execution step, where the AI uses the tool with the parameters. This might invoke making API calls, database requests or even sending IoT device signals to trigger a specific action.
  • Once the action has been taken, the agent now has to interpret the result and incorporate it into the reasoning process. This is used to expand the memory, to trigger the next step in the process or simply to be communicated back to the user.

This is the final step in a loop to solving problems rationally.

A rational path.

Rational thinking, deduction, planning, action and interpretation have always been signs of creative thought due to the sheer breadth of computation required to execute the steps. Humans have thrived for generation, facing a plethora of foreign terrain, environments and conditions by using rationality and problem solving. Through these systems we are creating digital minds, reminiscent of our own cognitive ways, that promise to shed new light on the nature of intelligence itself.

AI agents and compound AI systems represent a paradigm shift for the future of technology. The future of AI is not just smarter, but more thoughtful, more adaptable, and more human than ever before. The new era of silicon minds and digital thoughts will be driven by rational systems and creative humans. Hyper personalization in the areas of education, entertainment and marketing along with smooth global connectivity driven by a seamless human-technology collaboration is what the future awaits.

Seamless collaboration. Photo

What are your thoughts on the silicon cousins of our brains? How do you see them transforming the way we work, create, and tackle big challenges? Drop your thoughts in the comments or share them on social!

Thank you very much for reading :) Please subscribe for more such articles and follow me on Instagram, Medium, and LinkedIn.

References:

  1. The evolutionary and ecological roots of human organization https://www.researchgate.net/publication/348503027_The_evolutionary_and_ecological_roots_of_human_organization
  2. IBM| What are AI agents? https://www.ibm.com/think/topics/ai-agents
  3. Attention Is All You Need https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  4. ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629
  5. Andrew Ng’s lecture https://youtu.be/q1XFm21I-VQ?si=wWBGuBYpmqGMS5Zh

--

--

Shaunak Inamdar
Shaunak Inamdar

Written by Shaunak Inamdar

Shaunak Inamdar is an AI enthusiast with a passion for writing about technologies and making them accessible to a broader audience. www.shaunak.tech

Responses (1)