Defining Artificial General Intelligence

Author:Murphy | View: 23499 | Time: 2025-03-22 23:57:35

Photo by Possessed Photography on Unsplash

Last week Sam Altman was fired as CEO of OpenAI. The true reason for his departure remains unknown. According to the board, he was fired because they "concluded that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities." This vague explanation led to tons of speculation about why Altman was fired. Some of the best theories include:

Altman prioritized market penetration over security & privacy testing
Altman circumvented the board in a major deal
Altman's ego became too big and started to conflict with the company's mission

Interestingly, one of the most compelling rumors points towards a divergence in views on AI ethics, particularly around the development of Artificial General Intelligence ("AGI"). While Altman has been a vocal proponent of AGI's potential, rumors suggest that chief scientist Ilya Sutskever harbored growing concerns over the rapid advancement of OpenAI's internal technologies.

In this post, we will summarize some concepts from Google DeepMind's new paper Levels of AGI: Operationalizing Progress on the Path to AGI. This paper helps to define AGI, sets up a framework for evaluating AGI systems, and summarizes some of the potential risks of AGI.

Defining AGI

In the Artificial Intelligence ("AI") field, AGI is a subset of systems that can perform most human level tasks. Hypothetically, these systems are capable of understanding, learning, and applying their intelligence broadly like a human can. A complete AGI system would not be limited to the data it was trained on. Instead, as the system exists it can gather information and learn as time passes. For many AI companies, AGI is either explicitly or implicitly the long term goal. Currently, AGI hasn't been achieved. However, given the rapid advancements in LLMs and artificial intelligence more broadly, AGI feels closer than ever.

In the paper, researchers at DeepMind outline defining characteristics of AGI. " argue that any definition of AGI should meet the following six criteria:"

"Focus on Capabilities, not Process": It doesn't matter how the AGI system does the thing, it matters what thing(s) the system can do. AGI systems are not required to think or understand in a humanlike way, or develop humanlike consciousness. This not to say these systems will not demonstrate these things – but they are not required to meet the definition of AGI.
"Focus on Generality and Performance": Some systems choose to emphasize generality – or the ability to handle a wide range of tasks and adapt to various environments. However, generality should not come at the cost of competence. These systems need to be able to perform a wide range of tasks at a high level of performance.
"Focus on Cognitive and MetaCognitive Tasks": The researchers argue that systems do not need to be able to perform physical tasks to be considered AGI. Instead, these systems should focus on the ability to complete cognitive and metacognitive tasks. In this context, cognitive tasks are things like perception, memory, language and problem solving. Metacognitive tasks include the ability to learn new tasks or gather more information when required.
"Focus on Potential, not Deployment": Any system developed that meets the criteria of AGI does not have to be deployed into the real world to be considered AGI. Instead, the system needs to demonstrate that it is capable of meeting the criteria of AGI. According to the authors, "Requiring deployment as a condition of measuring AGI introduces non-technical hurdles such as legal and social considerations, as well as potential ethical and safety concerns."
"Focus on Ecological Validity": Any system that is developed in an effort to meet the definition of AGI should focus on tasks that people will value in the real world. In other words, AGI should not focus on highly specialized or abstract tasks like solving extremely complex theoretical problems. Something like this would not be valuable to most people in every day life.
"Focus on the path to AGI, not a single endpoint": Outlining varying levels of AGI and attaching clear metrics, benchmarks, and risks enable easier discussions of policy and progress. Different levels can make it easier to compare systems against each other and quantify progress.

These 6 criteria for AGI ensure that researchers and other interested parties are talking about AGI in the same way. The criteria seek to eliminate confusion related to different terms, capabilities, or outcomes and focus discussion on the things that matter. I'm not saying these are the right 6 criteria, but they do make it easier to think through systems as they relate to AGI.

Levels of AGI

In the section above, the sixth criteria mentioned a system to outline varying levels of AGI. The table below summarizes the various levels of AGI, as defined by Deepmind. Note that for each level, the table exanines both narrow and general AI tasks where:

Narrow tasks are clearly scoped or defined
General tasks are a wide variety of tasks including the ability to learn new skills

Levels of AGI, Google DeepMind https://arxiv.org/pdf/2311.02462.pdf

When we're talking about narrowly focused AI, we see numerous examples of products at specific levels (like the humble calculator at level 0). Building fully autonomous solutions is easier when the use case is tightly scoped and specific. However, these systems, due to their specialized nature, fall short of fulfilling DeepMind's criteria for AGI. For example, AlphaFold is a system designed to predict 3D protein structures. While its specialized capabilities are impressive, its narrow focus means it lacks the ecological validity that is central to AGI.

When we talk about more generalized system like ChatGPT, we encounter a different set of challenges. Initially, ChatGPT's ability to handle a wide variety of questions was impressive. Yet, the more I used the tool, the more I realized that while ChatGPT might seem superhuman on the surface, it often falls short in areas requiring deep expertise. We can connect this idea to the AGI levels outlined above – it only answers at the level of an unskilled human (level 1). Many of its responses seem factually true, but are actually plagued by inaccuracies. This realization highlights that today's AI systems, despite their "first of their kind" capabilities, often lack the depth in performance. They prioritize a wide range of functions but don't excel at all of them, conflicting with DeepMind's second criterion for AGI.

Risks of AGI

In the 2004 movie I, Robot, robots exist to serve their human owners. These robots are not allowed to injure humans, they must obey their humans at all times, and they must protect their owners at all costs. As the movie progresses, the robots start to disobey these rules. They develop a sense of free will and emotions. As a result, the robots become an existential threat to humanity.

Although this doomsday scenario could become a risk in the AGI world, there are more practical risks as well. The table below summarizes various risks associated with narrow and general AI associated with varying levels of AI autonomy.

Risks of AGI, Google DeepMind https://arxiv.org/pdf/2311.02462.pdf

Admittedly, when I first started using advanced AI tools and researching AGI as a concept, I had blinders on to the potential risks of the tools. I subconsciously refused to acknowledge that the development and advancement of these tools could come with any negative side effects. I mean, who doesn't love having 24/7 access to 30 second code reviews?

However, when I stumbled across this table it changed my perspective. It opened my eyes to some of the associated risks of these systems, including the tools we use today. Using AI as a consultant ("Autonomy Level 2) is similar to how many use ChatGPT today or rely on recommender systems for product or movie recommendations. Using the tools in this way can lead to chronic overtrust or targeted manipulation. In fact, recently I couldn't find anyone to review my code so I used ChatGPT review my code. The ChatGPT review came back clean, so I pushed the code to production. As it turns out, the code was riddled with bugs and had to be rolled back almost instantly. I put too much trust in ChatGPT and paid the price!

As we continue to advance both narrow and general AI systems, the risk grows substantially. The current systems can lead to over-trust and targeted manipulation whereas more powerful systems can lead to large job loss and a concentration of power. The technology-oriented side of me is wildly impressed with how far we've come and just as excited to see where were going. However, the practical side of me recognizes the need to build advanced AI systems with careful thought about the risks and consequences.

Tags: AGI Artificial Intelligence Data Science Deepmind Machine Learning