Class TextExtractor

Info

Represents Documentize.TextExtractor plugin. Used to extract text from PDF documents.

public static class TextExtractor

Inheritance

objectTextExtractor

Inherited Members

Examples

The example demonstrates how to extract text content of PDF document.

// Create TextExtractorOptions object to set instructions
var options = new TextExtractorOptions(TextFormattingMode.Pure);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = TextExtractor.Process(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();

Methods

Process(TextExtractorOptions)

Extract text from PDF document.

public static ResultContainer Process(TextExtractorOptions options)

Parameters

Returns

ResultContainer : An object containing the result of the extraction.

Exceptions

ArgumentException

If options not set.

Namespace: Documentize Assembly: Documentize.dll

 English