Piyush Katariya
asked on
pdf data extract
how to get the data from pdf form which is getting information from specific site encripted ?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You can attach a copy + I'll provide output for you from pdftotext.
And, as mentioned above, this will only work if you provide a decrypted copy of the .pdf file.
And, as mentioned above, this will only work if you provide a decrypted copy of the .pdf file.
ASKER
hi david
my query is as follow
there is one site ( www.mca.gov.in) in which we used to file pdf forms
form is available on the portal www.mca.gov.in
the forms are like you put CIN ( corporate Identification No) in the pdf form then press pre_fill button and some information comes out in the form with respect to company
i want to know can we get all the details with reapect to all the CIN
for ex there are around 1 millions company and all details is public domain information
I would like to extract the information by the help of pdf
is it possible ??
my query is as follow
there is one site ( www.mca.gov.in) in which we used to file pdf forms
form is available on the portal www.mca.gov.in
the forms are like you put CIN ( corporate Identification No) in the pdf form then press pre_fill button and some information comes out in the form with respect to company
i want to know can we get all the details with reapect to all the CIN
for ex there are around 1 millions company and all details is public domain information
I would like to extract the information by the help of pdf
is it possible ??
Suggestion: Best to open a 2nd question related to this topic, as this is a different topic.
If I follow what you're asking, I'd just scrape the public domain data, rather than attempting to interact with PDF forms for this.
If I follow what you're asking, I'd just scrape the public domain data, rather than attempting to interact with PDF forms for this.
ASKER
hi david, can you assist to set a utilitu to exteact public domain data which is available free of cost on the portal www.mca.gov.in
Best to contact the site for this information, as I find no mention of an API service or scraping guidelines.
Normally if you just scrape a site, they will notice this quickly + block all IPS involved, so best to ask the site administrator for a correct answer.
Many times, Government sites provide a CD/DVD data dump, so the site isn't slowed down by scrapes.
Normally if you just scrape a site, they will notice this quickly + block all IPS involved, so best to ask the site administrator for a correct answer.
Many times, Government sites provide a CD/DVD data dump, so the site isn't slowed down by scrapes.
ASKER
hi david, it is govt site but the data what i am asking is generally available in public domain and there is no way to contact site adminstrator
ASKER