build an online file management system

Nirvana
Nirvana used Ask the Experts™
on
We have thousands of documents, power points, spreadsheets relating to Health care. we would like to build a platform (web) where users can access it anytime

this is where I need help of experts on  how do I make a GUI where all the documents are digitized and search can take us to relevant docs

example

Business Process--->
  1.                Finance and Accounting Services
  2.              Debt Collection & Restructuring
  3.              Business Lead Generation
  4.              Payroll Administration and Processing
  5.              Insurance and Mortgage Processing Services
  6.              Database Management Services
-       Data Conversion Services
-       Data Mining Services
-       Data Processing Services
-       Date Entry Services
              Travel and Entertainment Processing
Analytics--->
    • 1.1      Data requirements
    • 1.2      Data collection
    • 1.3      Data processing
    • 1.4      Data cleaning
    • 1.5      Exploratory data analysis
    • 1.6      Modeling and algorithms
    • 1.7      Data product
    • 1.8      Communication

Technology--->
Procurement--->
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
David FavorFractional CTO
Distinguished Expert 2018

Commented:
There are many existing systems which already provide this facility.

Take a look at https://www.lesbonscomptes.com/recoll/pages/recoll-webui-install-wsgi.html for one common approach.

This provides a Google-esque type Web interface to all documents on a given server + through a few additional tricks can index many servers.
John TsioumprisSoftware & Systems Engineer

Commented:
I don't think that there is a definite answer to this question...just so many solution all with their pros and cons..
Googling for "document management system web interface" will get a ton of results...from hi end expensive solution to free open source ones...you just review what they offer...in what percent they match your needs and you start evaluating.
David FavorFractional CTO
Distinguished Expert 2018
Commented:
https://www.opensemanticsearch.org/doc/admin/install/search_server provides another approach. This system has built in OCR, so if you have PDFs with no text component or scanned images of documents, this might be another option.

As John says above, there are numerous options, each addressing different criteria.

Tip: For me, I generally run software to add missing text components to image only PDFs + build PDF files (with text components) of scanned images, before attempting to do any indexing or searching.

Trying to do on the fly OCR can take forever. Better to normalize all your documents (convert them to some pure text representation), then index the pure text representation.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

David FavorFractional CTO
Distinguished Expert 2018

Commented:
https://github.com/tesseract-ocr provides the underlying code of many OCR systems these days.

Tesseract works amazingly well for running OCR across many documents.
nociSoftware Engineer
Distinguished Expert 2018

Commented:
I once worked in a place where they needed to index documents (produced in that company) on data that was in the image  the problem was the data was all over a page.
An OCR scan of the image was a problem because some data (amounts of money, dates, etc. ) were needed and there was no single place where they were put.

Solution print White on a White backgroup the needed data somewhere on an obvious white spot. On that place all needed data was put in a  machine readable  layout.
(and it was invisible for humans). (I think it ended up in a white strip on the top of the page, so it was the first line read as well.)
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
 how do I make a GUI where all the documents are digitized and search can take us to relevant docs

Are you asking about making a front end user interface? Or are you asking how to make a search engine for documents? or both?

Google Drive/Photos, Microsoft OneDrive/Sharepoint, DropBox etc already have this functionality with a lot of resources behind them that would be very difficult to recreate on your own.  For example, go to your google photos and if you have a picture anywhere of a dog, you can search for "dog" and your dog images will appear. You didn't have to do any type of tagging as it used AI. If you want to do this on your own, you would need to tag the document in order to make a search with expected results.  If you are wanting to do something simple (for development) and are open to uploading documents and giving each individual tags, your home made search can be easier though not as good as what you can get from what is already built on Microsoft, Google, Dropbox etc.

Other options for private cloud are https://owncloud.org/ or https://www.worldox.com/
Nirvanalearner

Author

Commented:
wow. so many brilliant suggestions thanks a lot!!!

@ Scott: we need to showcase to clients our capabilities so, we want to create an user interface, the underlying systems can be  google,  MS etc.,

@John: thanks a lot, great suggestion looking into it

@David: thanks a lot for taking time and writing details, helped a ton.
John TsioumprisSoftware & Systems Engineer

Commented:
Also ...a small note...try not to "chase" the documents...the OCR solutions suggested are great but there is always margin for error...try to implement a philosophy that every raw document is stored and then distributed in a more uneditable format like PDF
David FavorFractional CTO
Distinguished Expert 2018

Commented:
As John + noci suggested, OCR'ing many documents is complex.

Projects where 100s or 1000s of documents must be processed each month generally have a human review/edit step where columnar data is compared against original documents (like bank statements) to ensure the OCR lined up columns correctly, as OCR conversion normally requires some manual editing to fix minor alignment problems.
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
we need to showcase to clients our capabilities

After first reading your question, I thought you were trying to recreate a document management system.

What are your capabilities? What are you trying to showcase? This is the place to start.

the underlying systems can be  google,  MS etc.,
Good news, you don't have to reinvent the wheel and can just hit api's from your front end.
nociSoftware Engineer
Distinguished Expert 2018

Commented:
Are you looking for something like Next Cloud?   https://nextcloud.com/
Nirvanalearner

Author

Commented:
so our department does range of works in Healthcare. we want to show our best practices, solutions that we built over the time, six sigma projects, we have documents all the over the place and in different formats. when leadership team or clients visit us we want to showcase them what we have done doing and will do in future
Scott FellDeveloper & EE Moderator
Fellow 2018
Most Valuable Expert 2013

Commented:
If what you are trying to do is show off your best work to clients on a web site, this sounds to me like you need something curated rather than have a system to internally search for documents.

In other words, you have documents relating to Business Process ->Finance and Accounting Services.  On the site, you would have a navigation item for this that leads to your page for Finance and Accounting Services where you have perhaps some images, an overview, start of a few case studies that lead to another page with complete information  and additional navigation or links to the appropriate documents.

When you are trying to showcase your offerings, you want to keep it concise and allow the reader to dive in as they wish.  Trying to offer some type of search for documents for this type of thing could yield undesirable results. By using hyperlinks or dropdown navigation, you are in control.

For actually storing the documents on your website, it does not matter how they are stored or what they contain as long as you know what they are and where they are. It may be best to have a logical folder structure such as

Business Process
    Finance and Accounting Services
        WhitePapers
        Images
        CaseStudies


The key takeaway is building internal linking on your web pages to appropriate documents rather than a search.
Nirvanalearner

Author

Commented:
Hi Scott, exactly what I was thinking however, could not articulate.
Developer & EE Moderator
Fellow 2018
Most Valuable Expert 2013
Commented:
I understand.  The hardest part to get an answer is coming up with the right question, especially when there is confusion to begin with.  As you can see, it is easy to misinterpret and go down a different path.
Nirvanalearner

Author

Commented:
all the suggestions helped great. how about creating a site with SharePoint?
Nirvanalearner

Author

Commented:
Thanks a lot, experts

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial