Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 43 additions & 54 deletions Document-Processing-toc.html
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,49 @@
</li>
</ul>
</li>
<li>
<a href="/document-processing/data-extraction/OCR/overview">OCR Processor</a>
<ul>
<li>
<a href="/document-processing/data-extraction/OCR/net/overview">NET</a>
<ul>
<li>
<a href="/document-processing/data-extraction/OCR/net/overview">Overview</a>
</li>
<li>
<a href="/document-processing/data-extraction/OCR/net/Assemblies-Required">Assemblies Required</a>
</li>
<li>
<a href="/document-processing/data-extraction/OCR/net/NuGet-Packages-Required">NuGet Packages Required</a>
</li>
<li>
<a href="/document-processing/data-extraction/OCR/net/Getting-started-overview">Getting Started</a>
<ul>
<li><a href="/document-processing/data-extraction/OCR/net/Windows-Forms">Windows Forms</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/WPF">WPF</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/aspnet-mvc">ASP.NET MVC</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/net-core">ASP.NET Core</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/blazor">Blazor</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/Docker">Docker</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/azure">Azure</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/Azure-Vision">Azure Vision</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/Azure-Kubernetes-Service">Azure Kubernetes Service</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/AWS-Textract">AWS Textract</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/Linux">Linux</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/Amazon-Linux-EC2-Setup-Guide">Amazon Linux EC2</a></li>
<li><a href="/document-processing/data-extraction/OCR/net/mac">Mac</a></li>
</ul>
</li>
<li>
<a href="/document-processing/data-extraction/OCR/net/Features">Features</a>
</li>
<li>
<a href="/document-processing/data-extraction/OCR/net/Troubleshooting">Troubleshooting</a>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>
Expand Down Expand Up @@ -2931,60 +2974,6 @@
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-Document-Conversions">Working with Document Conversions</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Working-with-OCR">Working with OCR</a>
<ul>
<li>Getting Started
<ul>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Windows-Forms">Windows Forms</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/WPF">WPF</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/aspnet-mvc">ASP.NET MVC</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/net-core">ASP.NET Core</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/blazor">Blazor</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Docker">Docker</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/azure">Azure</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Azure-Vision">Azure Vision</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Azure-Kubernetes-Service">Azure Kubernetes Service</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/AWS-Textract">AWS Textract</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Linux">Linux</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Amazon-Linux-EC2-Setup-Guide">Amazon Linux EC2</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/mac">Mac</a>
</li>
</ul>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Features">Features</a>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/Working-with-OCR/Troubleshooting">Troubleshooting</a>
</li>
</ul>
</li>
<li>
<a href="/document-processing/pdf/pdf-library/net/working-with-hyperlinks">Working with Hyperlinks</a>
</li>
Expand Down
65 changes: 65 additions & 0 deletions Document-Processing/Data-Extraction/OCR/NET/Assemblies-Required.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Assemblies Required for OCR | Syncfusion
description: This section describes the required Syncfusion assemblies needed to integrate and use the OCR Processor effectively in your applications
platform: document-processing
control: PDF
documentation: UG
keywords: Assemblies
---
# Assemblies Required to work with OCR processor

Get the following required assemblies by downloading the OCR library installer. Download and install the OCR library for Windows, Linux, and Mac respectively. Please refer to the advanced installation steps for more details.

#### Syncfusion<sup>&reg;</sup> assemblies

<table>
<tr>
<thead>
<th><b>Platform(s)</b></th>
<th><b>Assemblies</b></th>
</thead>
</tr>
<tr>
<td>
Windows Forms, WPF, ASP.NET, and ASP.NET MVC
</td>
<td>
<ul>
<li>Syncfusion.OCRProcessor.Base.dll</li>
<li>Syncfusion.Pdf.Base.dll</li>
<li>Syncfusion.Compression.Base.dll</li>
<li>Syncfusion.ImagePreProcessor.Base.dll</li>
</ul>
</td>
</tr>
<tr>
<td>
.NET Standard 2.0
</td>
<td>
<ul>
<li>Syncfusion.OCRProcessor.Portable.dll</li>
<li>Syncfusion.PdfImaging.Portable.dll</li>
<li>Syncfusion.Pdf.Portable.dll</li>
<li>Syncfusion.Compression.Portable.dll</li>
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/3.119.1)'| markdownify }} package</li>
<li>Syncfusion.ImagePreProcessor.Portable.dll</li>
</ul>
</td>
</tr>
<tr>
<td>
.NET 8/.NET 9/.NET 10
</td>
<td>
<ul>
<li>Syncfusion.OCRProcessor.NET.dll</li>
<li>Syncfusion.PdfImaging.NET.dll</li>
<li>Syncfusion.Pdf.NET.dll</li>
<li>Syncfusion.Compression.NET.dll</li>
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/3.119.1)'| markdownify }} package</li>
<li>Syncfusion.ImagePreProcessor.NET.dll</li>
</ul>
</td>
</tr>
</table>
Original file line number Diff line number Diff line change
@@ -1,173 +1,14 @@
---
title: Perform OCR on PDF features | Syncfusion
description: Learn how to perform OCR on scanned PDF documents and images with different tesseract versions using Syncfusion .NET OCR library.
title: Getting started with OCR processor | Syncfusion
description: This section provides an introduction to getting started with the OCR processor and explains the basic concepts and workflow involved
platform: document-processing
control: PDF
documentation: UG
keywords: Assemblies
---
# Getting started with OCR processor

# Working with Optical Character Recognition (OCR)

Optical character recognition (OCR) is a technology used to convert scanned paper documents in the form of PDF files or images into searchable and editable data.

The [Syncfusion<sup>&reg;</sup> OCR processor library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) has extended support to process OCR on scanned PDF documents and images with the help of Google’s [Tesseract](https://github.com/tesseract-ocr/tesseract) Optical Character Recognition engine.

An inbuilt `image preprocessor` has been added to the OCR to prepare images for optimal recognition. This step ensures cleaner input and reduces OCR errors. The preprocessor supports the following enhancements:

* **Convert to Grayscale** – Simplifies image data by removing color information, making text easier to detect.
* **Deskew** – Corrects tilted or rotated text for proper alignment.
* **Denoise** – Removes speckles and artifacts that can interfere with character recognition.
* **Apply Contrast Adjustment** – Enhances text visibility against the background.
* **Apply Binarize** – Converts images to black-and-white for sharper text edges, using advanced thresholding methods

The Syncfusion<sup>&reg;</sup> OCR processor library works seamlessly in various platforms: Azure App Services, Azure Functions, AWS Textract, Docker, WinForms, WPF, Blazor, ASP.NET MVC, ASP.NET Core with Windows, MacOS and Linux.

N> Starting with v20.1.0.x, if you reference Syncfusion<sup>&reg;</sup> OCR processor assemblies from the trial setup or the NuGet feed, you also have to include a license key in your projects. Please refer to this [link](https://help.syncfusion.com/common/essential-studio/licensing/overview) to learn more about registering the Syncfusion<sup>&reg;</sup> license key in your application to use its components.

## Key features

* Create a searchable PDF from scanned PDF.
* Zonal text extraction from the scanned PDF.
* Preserve Unicode characters.
* Extract text from the image.
* Create a searchable PDF from large scanned PDF documents.
* Create a searchable PDF from rotated scanned PDF.
* Get OCRed text and its bounds from a scanned PDF document.
* Native call.
* Customizing the temp folder.
* Performing OCR with different Page Segmentation Mode.
* Performing OCR with different OCR Engine Mode.
* White List.
* Black List.
* Image into searchable PDF or PDF/A.
* Improved accessibility.
* Post-processing.
* Compatible with .NET Framework 4.5 and above.
* Compatible with .NET Core 2.0 and above.

## Install .NET OCR library

Include the OCR library in your project using two approaches.

* NuGet Package Required (Recommended)
* Assemblies Required

N> Starting with v21.1.x, If you reference the Syncfusion<sup>&reg;</sup> OCR processor library from the NuGet feed, the package structure has been changed. The TesseractBinaries and Tesseract language data paths has been automatically added and do not need to add it manually.

### NuGet Package Required (Recommended)

Directly install the NuGet package to your application from [nuget.org](https://www.nuget.org/).

<table>
<tr>
<thead>
<th><b>Platform(s)</b></th>
<th><b>NuGet Package</b></th>
</thead>
</tr>
<tr>
<td>
Windows Forms<br/>
Console Application (Targeting .NET Framework)
</td>
<td>
{{'[Syncfusion.Pdf.OCR.WinForms.nupkg](https://www.nuget.org/packages/Syncfusion.Pdf.OCR.WinForms)'| markdownify }}
</td>
</tr>
<tr>
<td>
WPF
</td>
<td>
{{'[Syncfusion.Pdf.OCR.Wpf.nupkg](https://www.nuget.org/packages/Syncfusion.Pdf.OCR.Wpf)'| markdownify }}
</td>
</tr>
<tr>
<td>
ASP.NET
</td>
<td>
{{'[Syncfusion.Pdf.OCR.AspNet.nupkg](https://www.nuget.org/packages/Syncfusion.Pdf.OCR.AspNet)'| markdownify }}
</td>
</tr>
<tr>
<td>
ASP.NET MVC5
</td>
<td>
{{'[Syncfusion.Pdf.OCR.AspNet.Mvc5.nupkg](https://www.nuget.org/packages/Syncfusion.Pdf.OCR.AspNet.Mvc5)'| markdownify }}
</td>
</tr>
<tr>
<td>
ASP.NET Core (Targeting NET Core) <br/>
Console Application (Targeting .NET Core) <br/>
Blazor
</td>
<td>
{{'[Syncfusion.PDF.OCR.Net.Core](https://www.nuget.org/packages/Syncfusion.PDF.OCR.Net.Core)'| markdownify }}
</td>
</tr>
</table>

### Assemblies Required

Get the following required assemblies by downloading the OCR library installer. Download and install the OCR library for Windows, Linux, and Mac respectively. Please refer to the advanced installation steps for more details.

#### Syncfusion<sup>&reg;</sup> assemblies

<table>
<tr>
<thead>
<th><b>Platform(s)</b></th>
<th><b>Assemblies</b></th>
</thead>
</tr>
<tr>
<td>
Windows Forms, WPF, ASP.NET, and ASP.NET MVC
</td>
<td>
<ul>
<li>Syncfusion.OCRProcessor.Base.dll</li>
<li>Syncfusion.Pdf.Base.dll</li>
<li>Syncfusion.Compression.Base.dll</li>
<li>Syncfusion.ImagePreProcessor.Base.dll</li>
</ul>
</td>
</tr>
<tr>
<td>
.NET Standard 2.0
</td>
<td>
<ul>
<li>Syncfusion.OCRProcessor.Portable.dll</li>
<li>Syncfusion.PdfImaging.Portable.dll</li>
<li>Syncfusion.Pdf.Portable.dll</li>
<li>Syncfusion.Compression.Portable.dll</li>
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/3.119.1)'| markdownify }} package</li>
<li>Syncfusion.ImagePreProcessor.Portable.dll</li>
</ul>
</td>
</tr>
<tr>
<td>
.NET 8/.NET 9/.NET 10
</td>
<td>
<ul>
<li>Syncfusion.OCRProcessor.NET.dll</li>
<li>Syncfusion.PdfImaging.NET.dll</li>
<li>Syncfusion.Pdf.NET.dll</li>
<li>Syncfusion.Compression.NET.dll</li>
<li>{{'[SkiaSharp](https://www.nuget.org/packages/SkiaSharp/3.119.1)'| markdownify }} package</li>
<li>Syncfusion.ImagePreProcessor.NET.dll</li>
</ul>
</td>
</tr>
</table>
To quickly get started with extracting text from scanned PDF documents in .NET using the Syncfusion<sup>&reg;</sup> OCR processor Library, refer to this video tutorial:
{% youtube "https://www.youtube.com/watch?v=VhN7ETn0vyA" %}

## Prerequisites

Expand Down Expand Up @@ -247,11 +88,6 @@ processor.PerformOCR(lDoc);

{% endhighlight %}

## Get Started with OCR

To quickly get started with extracting text from scanned PDF documents in .NET using the Syncfusion<sup>&reg;</sup> OCR processor Library, refer to this video tutorial:
{% youtube "https://www.youtube.com/watch?v=VhN7ETn0vyA" %}

### Perform OCR using C#

Integrating the OCR processor library in any .NET application is simple. Please refer to the following steps to perform OCR in your .NET application.
Expand Down Expand Up @@ -354,5 +190,4 @@ Refer to [this](https://help.syncfusion.com/document-processing/pdf/pdf-library/

## Troubleshooting

Refer to [this](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/working-with-ocr/troubleshooting) section for troubleshooting PDF OCR failures.

Refer to [this](https://help.syncfusion.com/document-processing/pdf/pdf-library/net/working-with-ocr/troubleshooting) section for troubleshooting PDF OCR failures.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ documentation: UG
keywords: Assemblies
---

# Perform OCR in Mac
# Perform OCR on macOS

The [Syncfusion<sup>&reg;</sup> .NET OCR library](https://www.syncfusion.com/document-processing/pdf-framework/net/pdf-library/ocr-process) used to extract text from scanned PDFs and images in the Mac application.

Expand Down
Loading