need for a document management system - requirements outlined below - trial or personal software reccomendation would be best

I have a pdf that has 10,000 pages.   1 page for each bill for a customer.  I occassionally search for bills by the accountnumber on the pdf.  it takes a long time.  Is there a personal document management system that can import the 10000 pages and save each page as it's own individual document and then automatically tag the document by the accountnumber that is on the page?

ie.. i search the interface of the document management system by the accountnumber and it will quickly retrieve a single page of the original 10,000 page document.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

David Johnson, CD, MVPOwnerCommented:
What a pretty mess you've gotten yourself into. Are these standard A4 pages for each customer?
My thinking of the flow of operation would be print each page to a seperate PDF, OCR the PDF and rename the file as to the account number, save as a DOCX or Plain Text, repeat 10,000 times, Now you have something that can be input into a database or CMS
Joe Winograd, Fellow&MVEDeveloperCommented:
Hi James,
As I mentioned at your other question, I have written many custom programs in the PDF space. I could leverage my existing code to split your 10,000-page PDF into 10,000 one-page PDFs, then rename each one with the account number and invoice number. Presumably, the combination of those two fields is unique, but if not, the program would add an integer suffix until the file name is unique. Searching for a file by account number and/or invoice number should be fast with any decent file manager, even Windows/File Explorer. I can also provide a simple GUI front-end where you can search by account number and/or invoice number, although I think that using a standard file manager should suffice, providing fast access to a specific bill.

Two questions for you: (1) Are these historical bills, i.e., does the 10,000-page PDF never change? Or are new bills always coming in, meaning the 10,000-page PDF is always changing? (2) Do the PDFs have text in them (e.g., PDF Normal or PDF Searchable) or are they image-only PDFs? If the latter, then OCR will have to be performed in order to extract the account number and invoice number (doable, but makes it a more difficult project).

We can probably wrap the project at your other question into this question and solve both requirements. As mentioned at your other question, drop me a PM if this interests you. Btw, if you prefer using the single, 10,000-page PDF instead of splitting it into 10,000 one-page PDFs, I have ideas for that, too — let me know. Regards, Joe

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Document Management

From novice to tech pro — start learning today.