• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 195
  • Last Modified:

Idea for the project

Hi,

We currently have a ASP app that request users to browse though PDF (10m in average) and enter the info into ASP pages.  The problem is that does not matter how much bandwidth you add, it seems not possible to handle the valume...

Is there a way to do this?  thx
0
mcrmg
Asked:
mcrmg
  • 5
  • 5
1 Solution
 
inthedarkCommented:
PDF Files are a bad idea for these reasons:

1) The are a proprietary concept and you have to line the pockets of ADOBE in order to create them. Also you have no control over the viewer.  For example in the Linux viewer you cannot copy data form pages, so that makes it a bad place to store data like code snippets.

2) Nobody downloads and reads the whole of a PDF File - you just need the snippet of information to solve you current problem.

3) When you click on the download the default IE/Acrobat Reader behavior is to download the PDF file and view it within the browser.  It is not so easy for the user to save what is being viewed. So instead of saving a local copy the user will download, download and download again the same PDF file, killing your bandwidth.

4) The images (the lions share of the file space) cannot be located on different servers.

Stage 1 - (Very little change required problem will improve within hours) As a quick fix I would consider the following:

a) Change the location of your downloads. You can buy quite cheaply webspace on several different servers, because you don't need any active components the space should be very cheap. So by moving your data to up-line servers your bandwidth will be unaffected by the downloads.

b) Make sure your most popular downloads are on many different servers (with different ISPs so each ISP does not have a clue what you are doing).

c) If your most popular downloads are limited to just a few files. Make sure they are copied onto other servers so that the downloads can be  rotated between servers.

d) Check your web-site has sensible document/image expiry so that the same items are not being unnecessarily re-requested.

Stage 2 - Identify why each file is being downloaded.

For example, say I have a router all I need to know is how do a factory reset on the router and what the IP address, Admin logon name and password will be after the factory reset.  If you go to all of the major router manufacturers you will find it hard to get this simple information.  In most cases you will need to download a huge PDF just to find the few words you need.

You need to find out why people need your information and how you can improve how you can deliver it to them.

Stage 3 - Convert the PDF documents into database snippets. In this way your site only needs to deliver the information that is actually wanted. Any images can also be loaded onto up line servers within the internet cloud.  In this way your primary work-horse servers will not be bogged down delivering images. Note the images can also be rotated amongst many different servers.

Without knowing what type of data is being released any assumptions that are made may be unhelpful. But I hope that some of these suggestions may lead you to a solution :~)

0
 
mcrmgAuthor Commented:
Thanks for the reply.

Actually, we have some contractors who work from home for us, they need to open up PDF files thay they were assigned to and find the infomation then enter them into the ASP app, the reason that PDF is so big is because we would like them to save time to look for the file, so we put all the images into the same file, this is part of the requirements, too.

We thought about moving PDF to those data center, however, those PDFs could change daily, it will take us forever to upload those new PDFs....

>>Stage 3 - Convert the PDF documents into database snippets. In this way your site only needs to deliver the information that is actually wanted. Any images >>can also be loaded onto up line servers within the internet cloud.  In this way your primary work-horse servers will not be bogged down delivering images. >>Note the images can also be rotated amongst many different servers.
Do you think this is the way I should go?  

thx
0
 
inthedarkCommented:
I don't know enough about what you do to say yes this is what you should do. But.....these PDF files that keep changing what do they contain and how are they generated?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
mcrmgAuthor Commented:
The files contain some info from the forms that were scanned into PDF....thx
0
 
inthedarkCommented:
If you zip one of the PDF by what % do they shrink, if any?
0
 
mcrmgAuthor Commented:
I tried 3, only 100k..........
0
 
inthedarkCommented:
So what percentage was that?
0
 
mcrmgAuthor Commented:
oh, I am sorry, I meant only reduced by 100 k....not much....thx
0
 
inthedarkCommented:
So this is a document storage system?  I did a project with document storage and found that using a combo if tiff and a compression library gave very good results.    Some scanners will create needlessly large images, by reducing the quality of the image you can save a lot of space.  What you could do is create 2 levels of quality.  Normal and high quality.

What you can do is register a file type on each of the client systems say an extension of .CTF.  You can configure an exe associated with the .CTF file to decompress a download image and then display as a tiff/whatever file.  If the user cannot the the normal quality images, the system could download the full quality image, keeping a local copy incase it is reviewed

Compression library info
 http://www.ricazip.com/
0
 
mcrmgAuthor Commented:
I am thinking to have them download the files to their local pc, (I will need to zip it so it will be easy for them to download.)

I saw Adobe has a product called LiveCycle Policy Server, we need some sort of security on those files.....I need to contact them, but another problem is, how can I send the files to them........I would like to schedule them instead of have them logon at once...any ideas? thx
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now