Class PdfExtractor

Info

Represents Documentize.PdfExtractor plugin. Used to Extract Text, Images, Form Data from PDF documents.

public static class PdfExtractor

Inheritance

objectPdfExtractor

Inherited Members

Examples

The example demonstrates how to extract text content of PDF document.

// Create ExtractTextOptions object to set instructions
var options = new ExtractTextOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = PdfExtractor.ExtractText(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();

The example demonstrates how to extract text content of PDF document with TextFormattingMode.

// Create ExtractTextOptions object to set TextFormattingMode
var options = new ExtractTextOptions(TextFormattingMode.Pure);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Perform the process
var results = PdfExtractor.ExtractText(options);
// Get the extracted text from the ResultContainer object
var textExtracted = results.ResultCollection[0].ToString();

The example demonstrates how to extract images from PDF document.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output Directory path
options.AddOutput(new DirectoryDataSource("path_to_results_directory"));
// Perform the process
var results = PdfExtractor.ExtractImages(options);
// Get path to image result
var imageExtracted = results.ResultCollection[0].ToFile();

The example demonstrates how to extract images from PDF document to Streams without folder.

// Create ExtractImagesOptions to set instructions
var options = new ExtractImagesOptions();
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Not set output - it will write results to streams
// Perform the process
var results = PdfExtractor.ExtractImages(options);
// Get Stream
var ms = results.ResultCollection[0].ToStream();
// Copy data to file for demo
ms.Seek(0, SeekOrigin.Begin);
using (var fs = File.Create("test_file.png"))
{
    ms.CopyTo(fs);
}

The example demonstrates how to Export Form values to CSV file.

// Create ExtractFormDataToDsvOptions object to set instructions
var options = new ExtractFormDataToDsvOptions(',', true);
// Add input file path
options.AddInput(new FileDataSource("path_to_your_pdf_file.pdf"));
// Set output file path
options.AddOutput(new FileDataSource("path_to_result_csv_file.csv"));
// Perform the process
PdfExtractor.ExtractFormData(options);

Methods

ExtractFormData(ExtractFormDataToDsvOptions)

Extract Form Data from PDF document.

public static ResultContainer ExtractFormData(ExtractFormDataToDsvOptions options)

Parameters

Returns

ResultContainer : An object containing the result of the operation.

Exceptions

ArgumentException

If options not set.

ExtractImages(ExtractImagesOptions)

Extract images from PDF document.

public static ResultContainer ExtractImages(ExtractImagesOptions options)

Parameters

Returns

ResultContainer : An object containing the result of the operation.

Exceptions

ArgumentException

If options not set.

ExtractText(ExtractTextOptions)

Extract text from PDF document.

public static ResultContainer ExtractText(ExtractTextOptions options)

Parameters

Returns

ResultContainer : An object containing the result of the extraction.

Exceptions

ArgumentException

If options not set.

Namespace: Documentize Assembly: Documentize.dll

 English