Comparison of optical character recognition software
This '''comparison of optical character recognition software includes:
- OCR engines, that do the actual character identification
- Layout analysis software, that divide scanned documents into zones suitable for OCR
- Graphical interfaces to one or more OCR engines
- Software development kits that are used to add OCR capabilities to other software
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |
Google Drive OCR or Google Cloud Vision | 2015 | Yes | Browser | Browser | Browser | Unknown | Unknown | Yes | 200+ | All fonts | text | Google blog post | |||
Tesseract | 1985 | 4.1.1 | 2019 | C++, C | 100+ | Any printed font | Text, ALTO, hOCR, PDF, others with different user interfaces or the API | Created by Hewlett-Packard; under further development by Google | |||||||
ABBYY FineReader | 1989 | 15 | 2019 | C/C++ | 192 | All fonts | DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2 | ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac. | |||||||
E-aksharayan | 2010 | 14 | RTF, TXT, BRL | ||||||||||||
Asprise OCR SDK | 1998 | 15 | 2015 | Java, C#,VB.NET, C/C++/Delphi | 20+ | Plain text, searchable PDF, XML | Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix. | ||||||||
AnyDoc Software | 1989 | VBScript | Works with structured, semi-structured, and unstructured documents. | ||||||||||||
CuneiForm | 1996 | 1.1 | 2011-04-19 | C/C++ | 28 | Any printed font | HTML, hOCR, native, RTF, TeX, TXT | Enterprise-class system, can save text formatting and recognizes complicated tables of any structure | |||||||
Dynamsoft OCR SDK | 2003 | 8.2 | 2012 | C/C++ | 40+ | PDF, TXT | |||||||||
OmniPage | 1970s | 19.2 | 2015 | C/C++, C# | 125 | Machine and handprinted fonts | DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 | Product of Nuance Communications | |||||||
Microsoft Office OneNote 2007 | 2011 | 2007 | |||||||||||||
GOCR | 2000 | 0.52 | 2018-10-15 | C | 20+ | ||||||||||
Ocrad | 0.26 | 2017-03-31 | C++ | Latin alphabet | Command line | ||||||||||
SmartScore | 1991 | 10.5.8 | 2015-07 | For musical scores | |||||||||||
Microsoft Office Document Imaging | Office 2007 | 2007 | Uses OmniPage | ||||||||||||
Puma.NET | 2009-10-29 | C# | 28 | Any printed font | .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for.NET applications | ||||||||||
ReadSoft | 14 | Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes. | |||||||||||||
Scantron | For working with localized interfaces, corresponding language support is required. | ||||||||||||||
OCRFeeder | 2009-03 | 0.8.1 | 2014-12-22 | Python | Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad | ||||||||||
OCRopus | 2007 | 1.3.3 | 2017-12-16 | Python | All languages using Latin script | Normal Latin script and Fraktur | TXT, hOCR, PDF | Pluggable framework under active development, used for Google Books | |||||||
OCRvision | 2019 | 90+ | Searchable PDF | ||||||||||||
Name | Founded year | Latest stable version | Release year | License | Online | Windows | Mac OS X | Linux | BSD | Programming language | SDK? | Languages | Fonts | Output Formats | Notes |