External research: Curate Labs did not author this paper.
Community Reading: CoDe-KG and Sentence Complexity
Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling proposes CoDe-KG, an LLM-based pipeline for constructing knowledge graphs from scientific abstracts.
The core insight is that extraction quality depends on sentence complexity. The pipeline includes coreference resolution, complexity classification, sentence conversion or decomposition, and then relation extraction.
Why we're excited
This is a more realistic picture of scientific IE than a single prompt over raw text. Scientific abstracts contain nested claims, compressed phrasing, and entity references that often need simplification before relation extraction works well.
The paper also contributes datasets and annotations around triples, complexity, coreference, and conversion policy.
Our community read
The design lesson is that upstream linguistic bottlenecks matter. If a sentence is too dense, the extractor may not be the only problem; the input representation may need to change first.
The trade-off is engineering complexity. Multi-stage pipelines are easier to inspect but harder to operate. That trade-off is often worth it in scientific or biomedical settings.
Source
- arXiv: 2509.17289