PHP and HTML format errors

Posted on 2010-01-03
Last Modified: 2012-05-08

I have a PHP script reading from a MySQL database where teachers have entered narrative comments for students in association with their report card.

I made the mistake, perhaps, of allowing the PHP scripts to accept rich text format. That way, teachers could copy and paste tables of grades, entries from essays in Microsoft Word, etc. They wanted the formatting preserved.

When viewed in the editor it is fine. In the MySQL it is piled up with tags.

When I parse it using a PHP script to output to the screen for parents I run into bizarre formatting errors.

I get the following character where there should be blank spaces or apostrophes: ý

The code for example is as follows:

The output on one line looks like: "ýýýýýýýýýýý Fredýs writing has become smoother since August"

The HTML source code after it has been parsed to the HTML site looks like:
<p class="MsoNormal"><span lang="EN-GB"><span style="mso-tab-count:1">ýýýýýýýýýýý </span>Fredýs
writing has become smoother since August..<span style="mso-spacerun: yes">ý </span>

The MySQL as it appears in phpMyAdmin looks like:
<p class="MsoNormal"><span lang="EN-GB"><span style="mso-tab-count:1">            </span>Freds
writing has become smoother since August, and she organizes her thoughts much
better now than then.<span style="mso-spacerun: yes">  </span>


It looks like a little box in IE, a question mark in a diamond in Firefox. Ironically, it looks perfect in Google Chrome. I don't want to tell all the parents that they need to download Google Chrome in order to make it look good though. There would be angry parents.

Can I add something to the PHP code so that it parses all of the formatting for spaces and apostrophes without the weird characters?

Question by:jkeagle13
    LVL 11

    Expert Comment

    I've seen this happen... and it was a pain figuring out what it was... but now I know!  Here was my response to the problem:

    The reason why this is happening, is because youre probably writing out your text in Microsoft Word, then copy/pasting them in to the rich-text editor.

    When you copy text from Microsoft Word, Word adds a bunch of invisible extra junk.  To avoid any future similar issues, I have 3 suggestions (in order of efficiency).

      1. Type the comments directly into the textbox.
      2. Type the comments in Notepad (Start -> All Programs -> Accessories -> Notepad). Then copy/paste from there.
      3. You could also type the text in Word, then copy/paste to Notepad, then copy/paste to the textarea, but that would mean an extra step.

    If you're using TinyMCE, you can add a Paste from Word button to the toolbar.  After running a few tests, Ive found that its not as efficient as Id hoped.  It will reduce the junk, but I cant guaranty that it will remove all of it.

    I would assume/hope that other rich-text editors have something similar.

    Author Comment


    Thanks for the advice. I am slowly learning!

    The problem is that I now have ~ 2000 records that have been copied and pasted from Word and tagged up in the MySQL. I need to have them in readable format by midnight for public access.

    I would think there should be some way to parse it without the formatting problems. The bizarre part is that Google Chrome reads it fine, no extra characters!

    Any ideas?

    LVL 11

    Accepted Solution

    You're going to need some HTML Purifier magic!

    HTML Purifier is a free HTML-cleaner-upper PHP library.  If you write a little script that will go through your 2,000 records, pass the text through the purifier, and update the record with the clean HTML.

    It's easy to use and there's a lot of documentation on their site.

    You should be able to pull it off before midnight!

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Course: CSS Specialist

    We don’t have to sell you on the idea of becoming a developer. If you’re you here, you already know it’s one of the most lucrative (and fastest growing) career tracks out there. It’s CSS that allows you to set yourself apart from other web and mobile developers.

    Suggested Solutions

    Showing your events from Google Calendar in Google Maps Why? I travel all week and I thought it would be ideal if staff in office knew where I was based on my calendar. (OK real reason: my son wanted to see where I would be working, and I thoug…
    SASS allows you to treat your CSS code in a more OOP way. Let's have a look on how you can structure your code in order for it to be easily maintained and reused.
    In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
    The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…

    758 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    8 Experts available now in Live!

    Get 1:1 Help Now