
Bye, bye humans
OpenAI made no bones about it when they issued the following warning this week:
“Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.”
What happens when AI becomes smarter than us?
According to OpenAI, superintelligence could become a reality within this decade, and steps need to be taken to manage the risks associated with it. This will require the creation of new institutions for governance and addressing the challenge of superintelligence alignment. The key question is: how can AI systems be ensured they follow human intent when they could be much smarter than humans? Currently, techniques for aligning AI rely heavily on human feedback and supervision which won’t be scalable to superintelligence, and new scientific and technical breakthroughs are needed.
The goal is to build an “automated alignment researcher” that is roughly at the level of human intelligence. Vast amounts of computing power can be used to scale efforts and align superintelligence. To achieve this, a scalable training method needs to be developed, resulting models need to be validated, and the entire alignment pipeline needs to be stress tested. OpenAI says we will need to:
- To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can’t supervise (generalization).
- To validate the alignment of our systems, we automate search for problematic behavior (robustness) and problematic internals (automated interpretability).
- Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).
Assembling a “Superalignment team”
A team of highly skilled machine learning researchers and engineers is being assembled by OpenAI to tackle the problem of superintelligence alignment. For the next four years, 20% of their computing resources have been allocated to work on this issue. The primary research focus is the new Superalignment team.
OpenAI’s goal is to solve the core technical challenges of superintelligence alignment within four years. While this is a highly ambitious goal and there are no guarantees, OpenAI is optimistic that with focused and concerted effort, progress can be made.
Ilya Sutskever, co-founder and Chief Scientist of OpenAI, has made superintelligence alignment his primary research focus. He will be co-leading the team with Jan Leike, Head of Alignment. Researchers and engineers from the previous alignment team as well as other teams across the company will also be joining the effort.
New researchers and engineers are actively being sought to join this effort. Superintelligence alignment is a machine learning problem at its core, and it is believed that great machine learning experts, even if they are not currently working on alignment, will be critical to solving it.
OpenAI plans to share the results of their work with a broad audience and consider it essential to contribute to the alignment and safety of non-OpenAI models.
We have more to worry about than just extinction
The new team’s work is in addition to OpenAI’s existing efforts to enhance the safety of current models, such as ChatGPT, while also identifying and mitigating other risks associated with AI, such as misuse, economic disruption, disinformation, bias and discrimination, addiction and overreliance, and more. While the new team will concentrate on the machine learning challenges of aligning superintelligent AI systems with human intent, there are also related sociotechnical issues on which OpenAI is actively engaging with interdisciplinary experts to ensure that their technical solutions take into account broader human and societal concerns.
Superintelligence alignment is one of the most important unsolved technical problems of our time. The best minds in the world are required to solve this problem.
Image Credits
In-Article Image Credits
OpenAI Logo via OpenAI with usage type - Editorial use (Fair Use)Featured Image Credit
OpenAI Logo via OpenAI with usage type - Editorial use (Fair Use)