Class TextExtractor
Info
Represents Documentize.TextExtractor plugin. Used to extract text from PDF documents.
public static class TextExtractor
Inheritance
Inherited Members
- object.GetType(),
- object.MemberwiseClone(),
- object.ToString(),
- object.Equals(object?),
- object.Equals(object?, object?),
- object.ReferenceEquals(object?, object?),
- object.GetHashCode()
Examples
The example demonstrates how to extract text content of PDF document.
// Create TextExtractorOptions object to set instructions
var options = new TextExtractorOptions(TextFormattingMode.Pure);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = TextExtractor.Process(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();
Methods
Process(TextExtractorOptions)
Extract text from PDF document.
public static ResultContainer Process(TextExtractorOptions options)
Parameters
options
TextExtractorOptions: An options object containing instructions for the operation.
Returns
ResultContainer : An object containing the result of the extraction.
Exceptions
If options not set.
Namespace: Documentize Assembly: Documentize.dll