GPTKB – Massive Knowledge Extraction From the GPT Language Model

Typ Demo

Studiengang / Lehrstuhl / Firma
Knowledge-Aware AI

Präsentator Yujia Hu

Website https://gptkb.org

Large language models (LLMs) have greatly advanced artificial intelligence (AI) and natural language processing (NLP). In addition to their ability to perform many different tasks, their great success lies in the fact that they have a lot of factual knowledge. For years, researchers have always been interested in how much these models really “know”, but previous methods only work with small, pre-selected data, which leads to an “availability bias” (Tversky and Kahneman), meaning that researchers often only discover what they already expected - and may miss a lot. To solve this problem, we have developed a new method with which we can systematically and comprehensively capture the knowledge of an LLM. To do this, we ask it many questions and intelligently summarize the answers. As a test run, we used GPT-4o-mini to create GPTKB - a huge collection of knowledge with 101 million facts about 2.9 million subjects. Best of all, we did the whole thing for just 1% of the cost of previous projects! GPTKB is a significant step forward in two areas: First, it helps to better understand how LLMs “think” and what facts they know. Secondly, it shows new, efficient ways to create large knowledge collections.