Re: DEFE3 Files
There is one thing, that could be relatively easy done. I have tried to download those huge files, my PC got stuck with the size, and my browser snailed over the pages, only to find that this was a wrong file.
The thing that would make life easier is to break down those pdfs into individual images, and put them on some server. If in order, then by picking a random image would allow to narrow down to the set of images of interest.
Of course, such images could be then tagged by users, adding date or basic contents info, but I have little faith in people.
As to OCR, I worked on such half readable documents with OCR, and it is always of help, even if it narrows down to a file of few hundred pages. Otherwise OCR tools improve, so within few years they could be capable of faithfuly reading all the files.
|