External research: Curate Labs did not author this paper.
Community Reading: KGGen for Text-to-KG Construction
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models proposes a language-model-driven pipeline for turning plain text into knowledge graphs, with clustering to reduce entity and relation sparsity.
The paper also introduces MINE, a benchmark for evaluating text-to-KG extraction through information preservation and retrieval usefulness. That evaluation emphasis is as important as the extractor itself.
Why we're excited
Automatically generated KGs often fail because they are sparse, fragmented, or filled with near-duplicate entities. KGGen's aggregation and clustering stages directly target that failure mode.
The paper reports strong results against GraphRAG and OpenIE baselines on its benchmark and releases code.
Our community read
The most useful idea is that graph extraction should be judged by downstream utility, not only by triple overlap. If the graph is meant to support retrieval or reasoning, then evaluation should measure whether it preserves useful structure.
The caution is that MINE is new. The field still needs broader agreement about how to evaluate graph usefulness across domains.
Source
- arXiv: 2502.09956