Solved

what font behind pdf

Posted on 2016-09-27
32
99 Views
Last Modified: 2016-10-21
is it possible to find out what font is on a pdf, so that it can read legibly on the PC?

in PDF, it is fine, but when you copy it, the characters are gibberish..

thanks.
0
Comment
Question by:25112
  • 11
  • 10
  • 6
  • +3
32 Comments
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
Go to File -> Properties and click on the Fonts tab to see what fonts are being used.
PDF fonts
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
This 5-minute EE video Micro Tutorial should help:
Xpdf - PDFfonts - Command Line Utility to List Fonts Used in a PDF File

Note that Step 10 underneath the video is the same as what Dave posted. Regards, Joe
0
 
LVL 62

Expert Comment

by:☠ MASQ ☠
Comment Utility
I'm guessing you don't actually have a font, just a graphical representation of one
See the explanation and suggested solutions here:

https://www.experts-exchange.com/questions/28564191/Cannot-convert-this-pdf-to-text.html
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
I'm glad MASQ remembered that thread! About a year after it, I published a 5-minute EE video Micro Tutorial on one of the free OCR tools mentioned in it (PDF-XChange Editor):
How to OCR pages in a PDF with free software

Regards, Joe
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
Could you upload the pdf or at least 1 page of it here?
Then we will tell you.
0
 
LVL 27

Expert Comment

by:tliotta
Comment Utility
...find out what font is on a pdf...
"On" a .PDF? What do you mean by "on"?

If there's a referenced font in the .PDF, open the .PDF in a text editor and search for {font}.
0
 
LVL 5

Author Comment

by:25112
Comment Utility
thanks for your patience.. I had to get permission to upload a page.. please see attached..
what is ideal is to have similar font on PC, so we can copy it and paste in actual font..
10.pdf
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
You did not mention is not English or Western/European font...
What language is that?
Is that Tamil?
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
Acrobat shows this for that file:

10pdf

PDFfonts shows this:
name                type      emb sub uni object ID
--------------------------    --- --- --- ---------
SSBRRF+TT266t00     TrueType  yes yes no      11  0
SNDABN+Helvetica    Type 1C   yes yes no      13  0

Open in new window

Regards, Joe
0
 
LVL 5

Author Comment

by:25112
Comment Utility
yes, Tamil.

is what is required to download a font called 'TT266t00'?
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
I am curious who is able to provide a Word version of the document.
If I understood right, actually you are not so much interested to know the font type, but rather to be able to open or copy/paste in Word document.
Is that right?

Here is a list with Tamil fonts:
http://kandupidi.com/font_help.php
http://www.tn.nic.in/tamilsw/otf.htm
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
I tried some tricks, but not sure if I got it right because I do not speak Tamil.
It seems a Scripture, Gospel about Jesus.
Just check it out and tell me if I am right.
10_001.jpg.docx
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
@25112
> is what is required to download a font called 'TT266t00'?

Yes, or some other Tamil font and do a font substitution. The good news is that it's text, not an image, so OCR is not needed.

@viki2000
> I am curious who is able to provide a Word version of the document.

Not here, because I'm unwilling to download and install a Tamil font. In my Word 2016, the font in your document shows as Vijaya when I copy/paste a word to a new doc, but as Arima Madura when I enable editing on it. Can you explain what's going on with those fonts?
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
No , I can't explain now.
The fonts come from internet. They are symbols for those glyphs.

When I enable editing in Word I see Arima Madurai, but if I copy and paste in a new docx then I have Latha.

Here are more fonts:
http://indiatyping.com/index.php/download/hindi-fonts

To identify the Hindi fonts is not easy like with Latin writings as click right, Properties and read the font type.
Here is some research/methods:
http://airccse.org/journal/ijci/papers/4315ijci02.pdf
http://esatjournals.net/ijret/2014v03/i03/IJRET20140303095.pdf
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
So 25112, what do you say?
0
 
LVL 5

Author Comment

by:25112
Comment Utility
thanks viki..
i checked..
Control Panel\All Control Panel Items\Fonts\
and see Latha font there already..

i searched for the other one..
and found:
https://github.com/NDISCOVER/Arima-Font/blob/master/fonts/otf/Madurai/ArimaMadurai-Bold.otf
when i put it in
Control Panel\All Control Panel Items\Fonts\
and using Word2010 to open the pdf
it asks 'select the encoding that makes your document readable : Text Encoding.. WINDOWS, MSDOS, OTHERENCODING..

can you guide where i have missed?
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 27

Expert Comment

by:tliotta
Comment Utility
"Encoding" is very different from "font". The two are almost unrelated.

Word will show that message when you try to open a file that isn't in a supported format. PDF isn't a Word DOC file, so Word (2010) doesn't know what to do with it. PDFs are opened by Adobe Acrobat Reader, not by Word.

There are plug-ins for Word (2010) that allow importing of PDFs, or you might use Adobe (or similar product) to export a PDF to a Word DOC or DOCX file.

If you already have some reliable plug-in for opening PDFs with Word 2010, it's also possible that the document actually has an "encoding" problem. If it wasn't created on a similar Windows system, it's possible that it's not even an ASCII-/Unicode-encoded file. For example, it conceivably could be a mainframe EBCDIC-encoded file. If so, then "font" isn't necessarily a critical part of the problem.
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
@25112
My question was if you could read the text in Tamil, if you understand the language and if it is a Scripture, Gospel about Jesus.
Do you understand Tamil? Is the text a Scripture, Gospel about Jesus?
I can only tell you how I did to obtain the PDF file in Word format, from where you can easy copy paste without garbage characters, nothing more.
0
 
LVL 5

Author Comment

by:25112
Comment Utility
>>If you already have some reliable plug-in for opening PDFs with Word 2010, it's also possible that the document actually has an "encoding" problem.
no plug-in, atm..

i don't need to use word2010 for this.. but vikki method has worked.. so would glean from it...
0
 
LVL 5

Author Comment

by:25112
Comment Utility
>>
if it is a Scripture, Gospel about Jesus.

yes to above (in tamil language- confirmed!)


>>I can only tell you how I did to obtain the PDF file in Word format

thanks. can you guide what steps to take to make this happen.
0
 
LVL 20

Accepted Solution

by:
viki2000 earned 500 total points
Comment Utility
It is not easy, rather ugly long, but I could not find a better method free:
- I took your original pdf file and I converted it to image high resolution, 400-600 dpi.
- Then I took the image and I uploaded it on Google Drive.
- Then click right on the image and open with Google Docs.
- Then you save it as Word .docx on your PC.

Basically I avoid the exiting OCR pdf file with its own Tamil symbols and encoding and I use Google's OCR engine to get clean recognizable fonts/symbols with known encoding.
Try it by yourself with another pdf and see if it works for  you too.
0
 
LVL 5

Author Comment

by:25112
Comment Utility
thanks..
for Google Drive, and Google docs, all you need is a gmail account or more?
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
> for Google Drive, and Google docs, all you need is a gmail account or more?

No, you don't need a Gmail account. Any email account is fine. What you need is a Google Account, not Google Mail (Google Account works with a Google Mail account, but also works with any email account). You may create a Google Account (free!) here:
https://accounts.google.com/SignUp

Regards, Joe
0
 
LVL 5

Author Comment

by:25112
Comment Utility
thanks to viki for the unconventional easy solution!

thanks to all who assisted..
0
 
LVL 5

Author Comment

by:25112
Comment Utility
you had said:
>>It is not easy, rather ugly long, but I could not find a better method free:

what may be a alternative solution that would be simpler to use (for less tech savvy people in other developing countries.. to derive the same end result) ?
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
> thanks to all who assisted

You're welcome. Happy to help. Regards, Joe
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
I do not know now what software will work.
I tried Abbyy FineReader, but does not support Tamil.
I am thinking how can you accelerate/automate the tasks.
The main disadvantage to the proposed method is limitation to basically one time processing page.
You have a pdf with one page, then you get one picture.
If you have a pdf document with many pages, then you need a program to convert all the pages into separate pictures, so many pictures as many pages are in pdf. I used Nitro PDF. Then I could convert all pages from pdf document into individual pictures. Then have to upload them one by one. You may put original pdf in one dedicated folder and the obtained pictures in the same folder. Then at upload you click CTRL+A to select all, the CTRL+pdf to deselect the pdf. Then is automatically uploaded one by one.
But then you are in trouble with open in Google Docs and save one by one. That takes time. It is not anymore a batch operation. This seems the bottle neck.
Once you have them back in your PC as .docx, then you can merge all .docx files into a single one.
0
 
LVL 5

Author Comment

by:25112
Comment Utility
thanks for your review with FineReader..

so at the moment we have only one sure (google) solution- for 1 page or 10, right? (for the language in question)
0
 
LVL 20

Expert Comment

by:viki2000
Comment Utility
I guess so.
If I find any other method/program I will let you know.
0
 
LVL 5

Author Comment

by:25112
Comment Utility
thank u indeed.
0
 
LVL 62

Expert Comment

by:☠ MASQ ☠
Comment Utility
So in summary the solution was to capture an image and then use OCR :)
0
 
LVL 5

Author Comment

by:25112
Comment Utility
good conclusion, MASQ.. but seems like the regular OCR we may have in common PCs was NOT up to par, and google seems to have enough tools to handle a lot..!!
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

One of the biggest challenges facing freelancers is balancing multiple projects and deadlines. Organizational skills and time management are key to keeping up with projects and staying on track. Luckily, we’ve curated seven tools to help you focus o…
This story has been written with permission from the scammed victim, a valued client of mine – identity protected by request.
This Micro Tutorial will give you a introduction in two parts how to utilize Windows Live Movie Maker to its maximum capability. This will be demonstrated using Windows Live Movie Maker on Windows 7 operating system.
The viewer will learn how to successfully download and install the SARDU utility on Windows 7, without downloading adware.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now