Solved

convert text/html nach text/plain;charset=UTF8

Posted on 2003-10-22
9
1,303 Views
Last Modified: 2007-12-19
looking of a way to convert html files to text

does anyone know some software which does it? All I found just can do it to ascii

I know a way to convert charsets with java so it might be enough for me to convert html to text preserving the right charset

anyone knows?
0
Comment
Question by:mightyone
  • 2
  • 2
  • 2
  • +3
9 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 9597266
There's no good easy way, but it's worth trying

line = line.replaceAll("<[^>]+>", "");

on each line of the file
0
 
LVL 35

Expert Comment

by:TimYates
ID: 9597270
nice! :-)
0
 
LVL 7

Expert Comment

by:tomboshell
ID: 9597854
ummmm, CEHJ wont you need the '/' character in that replace call also?

and for the concerns about ASCII, both are (well ASCII is more or less used as a generic term since it could very well be UTF-8 or UTF-16 and if it is all written in english you wont notice the difference).  Or more better said, it doesn't matter.  It looks like you want the content without the mark-up, keep it in the same character encoding.  Take CEHJ's answer
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 6

Author Comment

by:mightyone
ID: 9600816
hmm not quite what i need,

to tombo
see e.g german has a few letters with are not in ascii "öäü"
all non western scripts are not in ascii therefor i will notice (i am German)

just stripping the tags wont work either, i need no font info as text, no javascript stuff as text i just want the visible text
so it is a bit more complicated, therefore i am looking for a libary or tool....
i tested some tools, e.g html2txt, but just ascii support, same with several others


anyone any idea?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9601363
>>ummmm, CEHJ wont you need the '/' character in that replace call also?

That's taken care of.

>>just stripping the tags wont work either, i need no font info as text

You won't get any font info. You're right though about the JavaScript - you'd have to have another regex for that.
0
 
LVL 6

Author Comment

by:mightyone
ID: 9602631
e.g. write a litle letter with word add some fat letters bigger letters some pics a table a sound and than save as html

try stripping that you´ll find plenty of stuff no one wants (specially me....)


still not any further snief
0
 
LVL 17

Accepted Solution

by:
paulop1975 earned 50 total points
ID: 9602785
Try this program.

http://www.convertzone.com/doc2txt/help.htm

Seems good enough for the job.
:)

Fui (portuguese for "gone")
c(^.^)o

pAul0|PIm3NTA
0
 

Expert Comment

by:gsergiu
ID: 11230789
line = line.replaceAll("<[^>]+>", "");

this is just the main idea of the solution:

in html you also mai have comments on more then one line of code
<!--
this is a comment
-->

you can have

<img src="very long link" target="target"
title="image title"
/>

javascripts
<script>
.... java script here
</script>


:))

 If you want to implement this convertor .... have fun
0
 

Expert Comment

by:gsergiu
ID: 11230802
probably if you read char by char you can eliminate
< ........  > blocks
but you cannot replace html code errors ....(unclosed tags)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
tomcat startup error 5 107
Java string replace 11 55
CSV file parsing thru Java 13 34
Fast way to search item into Java Array (Rhino compatible) 2 31
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question