Chunkers API
This module provides classes for splitting documents into manageable units for processing by LLMs and for providing context to those units.
Unit Chunkers
Unit chunkers determine how a document is divided into smaller pieces for frame extraction. Each piece is a FrameExtractionUnit
.
llm_ie.chunkers.UnitChunker
Bases: ABC
This is the abstract class for frame extraction unit chunker. It chunks a document into units (e.g., sentences). LLMs process unit by unit.
Source code in package/llm-ie/src/llm_ie/chunkers.py
chunk
llm_ie.chunkers.WholeDocumentUnitChunker
Bases: UnitChunker
This class chunks the whole document into a single unit (no chunking).
Source code in package/llm-ie/src/llm_ie/chunkers.py
chunk
Parameters:
text : str The document text.
llm_ie.chunkers.SentenceUnitChunker
Bases: UnitChunker
This class uses the NLTK PunktSentenceTokenizer to chunk a document into sentences.
Source code in package/llm-ie/src/llm_ie/chunkers.py
chunk
Parameters:
text : str The document text.
Source code in package/llm-ie/src/llm_ie/chunkers.py
llm_ie.chunkers.TextLineUnitChunker
Bases: UnitChunker
This class chunks a document into lines.
Source code in package/llm-ie/src/llm_ie/chunkers.py
chunk
Parameters:
text : str The document text.
Source code in package/llm-ie/src/llm_ie/chunkers.py
Context Chunkers
Context chunkers determine what contextual information is provided to the LLM alongside a specific FrameExtractionUnit
.
llm_ie.chunkers.ContextChunker
Bases: ABC
This is the abstract class for context chunker. Given a frame extraction unit, it returns the context for it.
Source code in package/llm-ie/src/llm_ie/chunkers.py
chunk
Parameters:
unit : FrameExtractionUnit The frame extraction unit.
Return : str The context for the frame extraction unit.
Source code in package/llm-ie/src/llm_ie/chunkers.py
llm_ie.chunkers.NoContextChunker
Bases: ContextChunker
This class does not provide any context.
Source code in package/llm-ie/src/llm_ie/chunkers.py
fit
llm_ie.chunkers.WholeDocumentContextChunker
Bases: ContextChunker
This class provides the whole document as context.
Source code in package/llm-ie/src/llm_ie/chunkers.py
fit
llm_ie.chunkers.SlideWindowContextChunker
Bases: ContextChunker
This class provides a sliding window context. For example, +-2 sentences around a unit sentence.
Source code in package/llm-ie/src/llm_ie/chunkers.py
fit
Parameters:
units : List[FrameExtractionUnit] The list of frame extraction units.