![]() Everything built well and without errors (note: I did have warnings, but no errors.). I was able to get Tesseract 3.03 release candidate to build on OSX 10.9.4 from source ( ) and it is working with some warnings (detailed below). I HIGHLY recommend backing up your system before you do anything like what’s described below. Consider yourself warned… you are attempting this at your own risk. This is written for those who have never (or barely) used the Terminal app on OSX and are new to Tesseract and ORC.Ī lot of credit goes to mchristy at the Early Modern OCR Project ( ) as you’ll notice many, but not all things are the same as he outlined for OSX 10.8.ĭISCLAIMER: Attempting the process outlined below may cause problems with the operation of your computer or cause you to lose data. I don’t know if this journey is over, but I can tell you my OCR process works well enough for now. The following is the fruit of my journey so far. So this past weekend I decided that I wanted to OCR some image-only PDFs into searchable PDFs that could also be annotated correctly. I know of programs that will automatically OCR (object character recognition) documents like DEVONthink Pro Office and PDFpen, but 1) I’m on a grad school budget and 2) I like the challenge of figuring out ways to configure and promote technology using open source resources. These PDFs can’t be searched or annotated and for my workflow this is a no go. ![]() However, every so often I can only obtain PDFs that are images. Most of these articles are in PDF file format and I use Skim to read and annotate them. Since I’m in the middle of my doctoral studies, I read A LOT of journal articles. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |