Link to home
Start Free TrialLog in
Avatar of Piyush Katariya
Piyush Katariya

asked on

pdf data extract

how to get the data from pdf form which is getting information from specific site encripted ?
ASKER CERTIFIED SOLUTION
Avatar of David Favor
David Favor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Piyush Katariya
Piyush Katariya

ASKER

hi david, should i share the pdf file which can be easy to understand my problem , pls share ur email id
You can attach a copy + I'll provide output for you from pdftotext.

And, as mentioned above, this will only work if you provide a decrypted copy of the .pdf file.
hi david

my query is as follow

there is one site ( www.mca.gov.in) in which we used to file pdf forms

form is available on the portal www.mca.gov.in

the forms are like you put CIN ( corporate Identification No) in the pdf form then press pre_fill button and some information comes out in the form with respect to company

i want to know can we get all the details with reapect to all the CIN

for ex there are around 1 millions company and all details is public domain information

I would like to extract the information by the help of pdf

is it possible ??
Suggestion: Best to open a 2nd question related to this topic, as this is a different topic.

If I follow what you're asking, I'd just scrape the public domain data, rather than attempting to interact with PDF forms for this.
hi david, can you assist to set a utilitu to exteact public domain data which is available free of cost on the portal www.mca.gov.in
Best to contact the site for this information, as I find no mention of an API service or scraping guidelines.

Normally if you just scrape a site, they will notice this quickly + block all IPS involved, so best to ask the site administrator for a correct answer.

Many times, Government sites provide a CD/DVD data dump, so the site isn't slowed down by scrapes.
hi david, it is govt site but the data what i am asking is generally available in public domain and there is no way to contact site adminstrator