Curate Labs

October 13, 2025External researchKnowledge graphsScientific text

External research: Curate Labs did not author this paper.

Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling proposes CoDe-KG, an LLM-based pipeline for constructing knowledge graphs from scientific abstracts.

The core insight is that extraction quality depends on sentence complexity. The pipeline includes coreference resolution, complexity classification, sentence conversion or decomposition, and then relation extraction.

Why we're excited

This is a more realistic picture of scientific IE than a single prompt over raw text. Scientific abstracts contain nested claims, compressed phrasing, and entity references that often need simplification before relation extraction works well.

The paper also contributes datasets and annotations around triples, complexity, coreference, and conversion policy.

Our community read

The design lesson is that upstream linguistic bottlenecks matter. If a sentence is too dense, the extractor may not be the only problem; the input representation may need to change first.

The trade-off is engineering complexity. Multi-stage pipelines are easier to inspect but harder to operate. That trade-off is often worth it in scientific or biomedical settings.

Source

arXiv: 2509.17289

Community Reading: CoDe-KG and Sentence Complexity

Why we're excited

Our community read

Source