Curate Labs

March 7, 2025External researchKnowledge graphsLLMs

External research: Curate Labs did not author this paper.

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models proposes a language-model-driven pipeline for turning plain text into knowledge graphs, with clustering to reduce entity and relation sparsity.

The paper also introduces MINE, a benchmark for evaluating text-to-KG extraction through information preservation and retrieval usefulness. That evaluation emphasis is as important as the extractor itself.

Why we're excited

Automatically generated KGs often fail because they are sparse, fragmented, or filled with near-duplicate entities. KGGen's aggregation and clustering stages directly target that failure mode.

The paper reports strong results against GraphRAG and OpenIE baselines on its benchmark and releases code.

Our community read

The most useful idea is that graph extraction should be judged by downstream utility, not only by triple overlap. If the graph is meant to support retrieval or reasoning, then evaluation should measure whether it preserves useful structure.

The caution is that MINE is new. The field still needs broader agreement about how to evaluate graph usefulness across domains.

Source

arXiv: 2502.09956

Community Reading: KGGen for Text-to-KG Construction

Why we're excited

Our community read

Source