Class PdfExtractor

Info

Represents base functionality to extract text, images, and other types of content that may occur on the pages of PDF documents.

public abstract class PdfExtractor : IDisposable

Inheritance

objectPdfExtractor

Derived

Implements

Inherited Members

Examples

The example demonstrates how to extract text content of PDF document.

// create TextExtractor object to extract PDF contents
using (TextExtractor extractor = new TextExtractor())
{
    // create TextExtractorOptions object to set instructions
    textExtractorOptions = new TextExtractorOptions();

    // add input file path
    textExtractorOptions.AddInput(new FileDataSource(inputPath));

    // perform extraction process
    ResultContainer resultContainer = extractor.Process(textExtractorOptions);

    // get the extracted text from the ResultContainer object
    string textExtracted = resultContainer.ResultCollection[0].ToString();
}

Remarks

The Documentize.TextExtractor object is used to extract text, or Documentize.ImageExtractor to extract images.

Constructors

PdfExtractor()

protected PdfExtractor()

Methods

Dispose()

Implementation of IDisposable. Actually, it is not necessary for PdfExtractor.

public void Dispose()

Process(IPluginOptions)

Starts PdfExtractor processing with the specified parameters.

public ResultContainer Process(IPluginOptions pdfExtractorOptions)

Parameters

  • pdfExtractorOptions IPluginOptions: An options object containing instructions for the PdfExtractor.

Returns

ResultContainer : A ResultContainer object containing the result of the extraction.

Namespace: Documentize Assembly: Documentize.dll

 English