Ask Your Question
4

What is recomended for OCR in Fedora?

asked 2015-04-30 21:28:50 -0500

mh4openfield gravatar image

updated 2016-03-07 22:26:35 -0500

Trying to use something like ABBYY FineReader. Wondering if anyone has utilized WINE successfully? Is there a recommended way to search the text of documents within a folder? I assume Fedora does not keep an index of words on each file the way Microsoft does.

edit retag flag offensive close merge delete

Comments

You want to search within the files of the system?

ervinonfedora gravatar imageervinonfedora ( 2015-05-01 05:35:17 -0500 )edit

Correct, tracker does not index words in files (which would be nice). It tracks only file names.

florian gravatar imageflorian ( 2015-05-01 08:35:24 -0500 )edit

4 Answers

Sort by ยป oldest newest most voted
4

answered 2015-05-01 08:39:32 -0500

florian gravatar image

updated 2016-03-09 08:56:48 -0500

I have never used it but I know that there is gImageReader, a GUI for tesseract-ocr. It can be easily installed in Software Center. Have you tried this?

From terminal run sudo dnf install gimagereader-gtk (it will automatically install tesseract as a dependency).

edit flag offensive delete link more

Comments

Good call. I've been looking into OCR lately, many, many programs for both Windows and Fedora are frontends for tesseract.

randomuser gravatar imagerandomuser ( 2015-05-02 01:46:25 -0500 )edit
2

answered 2016-03-08 00:52:50 -0500

davidva gravatar image

gscan2pdf...

su
dnf -y install gscan2pdf tesseract
edit flag offensive delete link more
1

answered 2015-05-01 09:17:02 -0500

markito3 gravatar image

There is a package called "recoll" in the Fedora repository. It keeps an index of all documents in selected directories including words within documents. It handles pdf by default and can index MSWord documents once the necessary helper packages are installed. It has a lot of configuration options, but the vanilla installation is still very useful. Guidance for installing index generation as a cron job is given. Indexing is done incrementally; you don't have to scan the whole directory tree for each iteration.

The project homepage is at http://www.lesbonscomptes.com/recoll/ .

edit flag offensive delete link more

Comments

1

Appreciate the info, will post back when I have results.

mh4openfield gravatar imagemh4openfield ( 2015-05-01 16:00:39 -0500 )edit
1

Appreciate the info too. Nice for the document folder.

florian gravatar imageflorian ( 2015-05-01 17:41:36 -0500 )edit
0

answered 2015-05-01 05:39:11 -0500

ervinonfedora gravatar image

If you want to use windows programs I suggest you install playonlinux. There is a better chance it will work rather than with wine. And also if the system files are not indexed, I doubt any program can read inside them. They have to be indexed first.

edit flag offensive delete link more

Comments

This doesn't really say anything about OCR. Are there any specific applications from PlayOnLinux that you would recommend for extracting text from scanned images?

randomuser gravatar imagerandomuser ( 2016-03-07 22:29:17 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2015-04-30 21:28:50 -0500

Seen: 4,762 times

Last updated: Mar 09 '16