Class PdfExtractor

Info

Represents Documentize.PdfExtractor plugin. Used to Extract Text, Images, Form Data from PDF documents.

public static class PdfExtractor

Inheritance

object ← PdfExtractor

Inherited Members

Methods

Extract(ExtractTextOptions)

Extract text from PDF document.

public static ResultContainer Extract(ExtractTextOptions options)

Parameters

options ExtractTextOptions: An options object containing instructions for the operation.

Returns

ResultContainer : An object containing the result of the extraction.

Examples

The example demonstrates how to extract text content of PDF document.

// Create ExtractTextOptions object to set instructions
var options = new ExtractTextOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = PdfExtractor.Extract(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();

The example demonstrates how to extract text content of PDF document with TextFormattingMode.

// Create ExtractTextOptions object to set TextFormattingMode
var options = new ExtractTextOptions(TextFormattingMode.Pure);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = PdfExtractor.Extract(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();

Exceptions

ArgumentException

If options not set.

Extract(ExtractImagesOptions)

Extract images from PDF document.

public static ResultContainer Extract(ExtractImagesOptions options)

Parameters

options ExtractImagesOptions: An options object containing instructions for the operation.

Returns

ResultContainer : An object containing the result of the operation.

Examples

The example demonstrates how to extract images from PDF document.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output Directory path
options.AddOutput(new DirectoryDataSource("path_to_results_directory"));
// Perform the process
var results = PdfExtractor.Extract(options);
// Get path to image result
var imageExtracted = results.ResultCollection[0].ToFile();

The example demonstrates how to extract images from PDF document to Streams without folder.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Not set output - it will write results to streams
// Perform the process
var results = PdfExtractor.Extract(options);
// Get Stream
var ms = results.ResultCollection[0].ToStream();
// Copy data to file for demo
ms.Seek(0, SeekOrigin.Begin);
using (var fs = File.Create("test_file.png"))
{
    ms.CopyTo(fs);
}

Exceptions

ArgumentException

If options not set.

Extract(ExtractFormDataToDsvOptions)

Extract Form Data from PDF document.

public static ResultContainer Extract(ExtractFormDataToDsvOptions options)

Parameters

options ExtractFormDataToDsvOptions: An options object containing instructions for the operation.

Returns

ResultContainer : An object containing the result of the operation.

Examples

The example demonstrates how to Export Form values to CSV file.

// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions(',', true);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileDataSource("path_to_result_csv_file.csv"));
// Perform the process
PdfExtractor.Extract(options);

Exceptions

ArgumentException

If options not set.

Namespace: Documentize Assembly: Documentize.dll