Solved

convert text/html nach text/plain;charset=UTF8

Posted on 2003-10-22
9
1,302 Views
Last Modified: 2007-12-19
looking of a way to convert html files to text

does anyone know some software which does it? All I found just can do it to ascii

I know a way to convert charsets with java so it might be enough for me to convert html to text preserving the right charset

anyone knows?
0
Comment
Question by:mightyone
  • 2
  • 2
  • 2
  • +3
9 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 9597266
There's no good easy way, but it's worth trying

line = line.replaceAll("<[^>]+>", "");

on each line of the file
0
 
LVL 35

Expert Comment

by:TimYates
ID: 9597270
nice! :-)
0
 
LVL 7

Expert Comment

by:tomboshell
ID: 9597854
ummmm, CEHJ wont you need the '/' character in that replace call also?

and for the concerns about ASCII, both are (well ASCII is more or less used as a generic term since it could very well be UTF-8 or UTF-16 and if it is all written in english you wont notice the difference).  Or more better said, it doesn't matter.  It looks like you want the content without the mark-up, keep it in the same character encoding.  Take CEHJ's answer
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 6

Author Comment

by:mightyone
ID: 9600816
hmm not quite what i need,

to tombo
see e.g german has a few letters with are not in ascii "öäü"
all non western scripts are not in ascii therefor i will notice (i am German)

just stripping the tags wont work either, i need no font info as text, no javascript stuff as text i just want the visible text
so it is a bit more complicated, therefore i am looking for a libary or tool....
i tested some tools, e.g html2txt, but just ascii support, same with several others


anyone any idea?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9601363
>>ummmm, CEHJ wont you need the '/' character in that replace call also?

That's taken care of.

>>just stripping the tags wont work either, i need no font info as text

You won't get any font info. You're right though about the JavaScript - you'd have to have another regex for that.
0
 
LVL 6

Author Comment

by:mightyone
ID: 9602631
e.g. write a litle letter with word add some fat letters bigger letters some pics a table a sound and than save as html

try stripping that you´ll find plenty of stuff no one wants (specially me....)


still not any further snief
0
 
LVL 17

Accepted Solution

by:
paulop1975 earned 50 total points
ID: 9602785
Try this program.

http://www.convertzone.com/doc2txt/help.htm

Seems good enough for the job.
:)

Fui (portuguese for "gone")
c(^.^)o

pAul0|PIm3NTA
0
 

Expert Comment

by:gsergiu
ID: 11230789
line = line.replaceAll("<[^>]+>", "");

this is just the main idea of the solution:

in html you also mai have comments on more then one line of code
<!--
this is a comment
-->

you can have

<img src="very long link" target="target"
title="image title"
/>

javascripts
<script>
.... java script here
</script>


:))

 If you want to implement this convertor .... have fun
0
 

Expert Comment

by:gsergiu
ID: 11230802
probably if you read char by char you can eliminate
< ........  > blocks
but you cannot replace html code errors ....(unclosed tags)
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
groupSum5 challenge 5 84
splitOdd10 challenge 5 106
allswap challenge 6 98
Java - Why doesn't this JFrame work 3 59
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
The viewer will learn how to implement Singleton Design Pattern in Java.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question