Thread: DEFE3 Files
View Single Post
  #9  
Old 20th July 2018, 18:52
Franek Grabowski Franek Grabowski is offline
Alter Hase
 
Join Date: Dec 2004
Location: Warsaw, Poland
Posts: 2,352
Franek Grabowski is on a distinguished road
Re: DEFE3 Files

There is one thing, that could be relatively easy done. I have tried to download those huge files, my PC got stuck with the size, and my browser snailed over the pages, only to find that this was a wrong file.
The thing that would make life easier is to break down those pdfs into individual images, and put them on some server. If in order, then by picking a random image would allow to narrow down to the set of images of interest.
Of course, such images could be then tagged by users, adding date or basic contents info, but I have little faith in people.
As to OCR, I worked on such half readable documents with OCR, and it is always of help, even if it narrows down to a file of few hundred pages. Otherwise OCR tools improve, so within few years they could be capable of faithfuly reading all the files.
Reply With Quote