Avatar of 25112
25112
 asked on

what font behind pdf

is it possible to find out what font is on a pdf, so that it can read legibly on the PC?

in PDF, it is fine, but when you copy it, the characters are gibberish..

thanks.
Fonts TypographyPCWindows 7MiscellaneousAdobe Acrobat

Avatar of undefined
Last Comment
25112

8/22/2022 - Mon
Dave Baldwin

Go to File -> Properties and click on the Fonts tab to see what fonts are being used.
PDF fonts
Joe Winograd

This 5-minute EE video Micro Tutorial should help:
Xpdf - PDFfonts - Command Line Utility to List Fonts Used in a PDF File

Note that Step 10 underneath the video is the same as what Dave posted. Regards, Joe
☠ MASQ ☠

I'm guessing you don't actually have a font, just a graphical representation of one
See the explanation and suggested solutions here:

https://www.experts-exchange.com/questions/28564191/Cannot-convert-this-pdf-to-text.html
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Joe Winograd

I'm glad MASQ remembered that thread! About a year after it, I published a 5-minute EE video Micro Tutorial on one of the free OCR tools mentioned in it (PDF-XChange Editor):
How to OCR pages in a PDF with free software

Regards, Joe
viki2000

Could you upload the pdf or at least 1 page of it here?
Then we will tell you.
Member_2_276102

...find out what font is on a pdf...
"On" a .PDF? What do you mean by "on"?

If there's a referenced font in the .PDF, open the .PDF in a text editor and search for {font}.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
25112

ASKER
thanks for your patience.. I had to get permission to upload a page.. please see attached..
what is ideal is to have similar font on PC, so we can copy it and paste in actual font..
10.pdf
viki2000

You did not mention is not English or Western/European font...
What language is that?
Is that Tamil?
Joe Winograd

Acrobat shows this for that file:

10pdf

PDFfonts shows this:
name                type      emb sub uni object ID
--------------------------    --- --- --- ---------
SSBRRF+TT266t00     TrueType  yes yes no      11  0
SNDABN+Helvetica    Type 1C   yes yes no      13  0

Open in new window

Regards, Joe
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
25112

ASKER
yes, Tamil.

is what is required to download a font called 'TT266t00'?
viki2000

I am curious who is able to provide a Word version of the document.
If I understood right, actually you are not so much interested to know the font type, but rather to be able to open or copy/paste in Word document.
Is that right?

Here is a list with Tamil fonts:
http://kandupidi.com/font_help.php
http://www.tn.nic.in/tamilsw/otf.htm
viki2000

I tried some tricks, but not sure if I got it right because I do not speak Tamil.
It seems a Scripture, Gospel about Jesus.
Just check it out and tell me if I am right.
10_001.jpg.docx
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Joe Winograd

@25112
> is what is required to download a font called 'TT266t00'?

Yes, or some other Tamil font and do a font substitution. The good news is that it's text, not an image, so OCR is not needed.

@viki2000
> I am curious who is able to provide a Word version of the document.

Not here, because I'm unwilling to download and install a Tamil font. In my Word 2016, the font in your document shows as Vijaya when I copy/paste a word to a new doc, but as Arima Madura when I enable editing on it. Can you explain what's going on with those fonts?
viki2000

No , I can't explain now.
The fonts come from internet. They are symbols for those glyphs.

When I enable editing in Word I see Arima Madurai, but if I copy and paste in a new docx then I have Latha.

Here are more fonts:
http://indiatyping.com/index.php/download/hindi-fonts

To identify the Hindi fonts is not easy like with Latin writings as click right, Properties and read the font type.
Here is some research/methods:
http://airccse.org/journal/ijci/papers/4315ijci02.pdf
http://esatjournals.net/ijret/2014v03/i03/IJRET20140303095.pdf
viki2000

So 25112, what do you say?
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
25112

ASKER
thanks viki..
i checked..
Control Panel\All Control Panel Items\Fonts\
and see Latha font there already..

i searched for the other one..
and found:
https://github.com/NDISCOVER/Arima-Font/blob/master/fonts/otf/Madurai/ArimaMadurai-Bold.otf
when i put it in
Control Panel\All Control Panel Items\Fonts\
and using Word2010 to open the pdf
it asks 'select the encoding that makes your document readable : Text Encoding.. WINDOWS, MSDOS, OTHERENCODING..

can you guide where i have missed?
Member_2_276102

"Encoding" is very different from "font". The two are almost unrelated.

Word will show that message when you try to open a file that isn't in a supported format. PDF isn't a Word DOC file, so Word (2010) doesn't know what to do with it. PDFs are opened by Adobe Acrobat Reader, not by Word.

There are plug-ins for Word (2010) that allow importing of PDFs, or you might use Adobe (or similar product) to export a PDF to a Word DOC or DOCX file.

If you already have some reliable plug-in for opening PDFs with Word 2010, it's also possible that the document actually has an "encoding" problem. If it wasn't created on a similar Windows system, it's possible that it's not even an ASCII-/Unicode-encoded file. For example, it conceivably could be a mainframe EBCDIC-encoded file. If so, then "font" isn't necessarily a critical part of the problem.
viki2000

@25112
My question was if you could read the text in Tamil, if you understand the language and if it is a Scripture, Gospel about Jesus.
Do you understand Tamil? Is the text a Scripture, Gospel about Jesus?
I can only tell you how I did to obtain the PDF file in Word format, from where you can easy copy paste without garbage characters, nothing more.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
25112

ASKER
>>If you already have some reliable plug-in for opening PDFs with Word 2010, it's also possible that the document actually has an "encoding" problem.
no plug-in, atm..

i don't need to use word2010 for this.. but vikki method has worked.. so would glean from it...
25112

ASKER
>>
if it is a Scripture, Gospel about Jesus.

yes to above (in tamil language- confirmed!)


>>I can only tell you how I did to obtain the PDF file in Word format

thanks. can you guide what steps to take to make this happen.
ASKER CERTIFIED SOLUTION
viki2000

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
25112

ASKER
thanks..
for Google Drive, and Google docs, all you need is a gmail account or more?
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Joe Winograd

> for Google Drive, and Google docs, all you need is a gmail account or more?

No, you don't need a Gmail account. Any email account is fine. What you need is a Google Account, not Google Mail (Google Account works with a Google Mail account, but also works with any email account). You may create a Google Account (free!) here:
https://accounts.google.com/SignUp

Regards, Joe
25112

ASKER
thanks to viki for the unconventional easy solution!

thanks to all who assisted..
25112

ASKER
you had said:
>>It is not easy, rather ugly long, but I could not find a better method free:

what may be a alternative solution that would be simpler to use (for less tech savvy people in other developing countries.. to derive the same end result) ?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Joe Winograd

> thanks to all who assisted

You're welcome. Happy to help. Regards, Joe
viki2000

I do not know now what software will work.
I tried Abbyy FineReader, but does not support Tamil.
I am thinking how can you accelerate/automate the tasks.
The main disadvantage to the proposed method is limitation to basically one time processing page.
You have a pdf with one page, then you get one picture.
If you have a pdf document with many pages, then you need a program to convert all the pages into separate pictures, so many pictures as many pages are in pdf. I used Nitro PDF. Then I could convert all pages from pdf document into individual pictures. Then have to upload them one by one. You may put original pdf in one dedicated folder and the obtained pictures in the same folder. Then at upload you click CTRL+A to select all, the CTRL+pdf to deselect the pdf. Then is automatically uploaded one by one.
But then you are in trouble with open in Google Docs and save one by one. That takes time. It is not anymore a batch operation. This seems the bottle neck.
Once you have them back in your PC as .docx, then you can merge all .docx files into a single one.
25112

ASKER
thanks for your review with FineReader..

so at the moment we have only one sure (google) solution- for 1 page or 10, right? (for the language in question)
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
viki2000

I guess so.
If I find any other method/program I will let you know.
25112

ASKER
thank u indeed.
☠ MASQ ☠

So in summary the solution was to capture an image and then use OCR :)
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
25112

ASKER
good conclusion, MASQ.. but seems like the regular OCR we may have in common PCs was NOT up to par, and google seems to have enough tools to handle a lot..!!