We help IT Professionals succeed at work.

PHP an removing whitespaces

dogsareit
dogsareit asked
on
I am extracting data from a pdf and so far, it's has been going fairly decent. I am working in localhost, Win 10, WampServer and using PHP 5.6.25.
I am using PdfParser and used composer to install it. It was successful. I am trying to explode a line of data on whitespace and it simple will not do it. I have done this before without a problem. I know from experience, that what I view in I.E. - source view - may not be what it really is to me visually. They appear to be a whitespace(s), so I decided that I would count the number of whitespaces in order to create the correct length for exploding on properly, and it returns a count of zero. I have tried to replace the whitespace(s) with another character - * - and it will not do it. I have even tried using chr(32) to represent the whitespace(s).
Therefore, even though it "looks" like a whitespace, maybe it really isn't. But I have no clue as to what it is or how to resolve it.
Can someone guide/educate me about what to try next or how to resolve this ??
I have other lines of data that I will need to explode based on whitespaces in this project.
Below is the snippet of coding: I have also attached a snapshot of what it looks in the source pane of I.E.
	if ($strtype == 'account')
        	{
			// look for actnbr
                       $wrkBegChar = 'AccountNumber';
	               $wrkEndChar = 'USD';
	     	       $strBegPos = stripos($strSearchData , $wrkBegChar,1); 
	    	       $strEndPos = stripos($strSearchData, $wrkEndChar, $strBegPos); 
		       $strLen = (($strEndPos +6) - ($strBegPos + 17));
	    	       $strWork = trim(substr($strSearchData ,($strBegPos + 17), $strLen));
		       $strWork = trim(substr($strSearchData ,($strBegPos + 17), $strLen));
			echo '<BR><BR>$strWork...' . $strWork . '<BR><BR>';
			// count the nbr of whitespaces
			$wrkBlanks = '';
			$wrkCount = 0;
			$wrkCount = substr_count($strWork,' ');
			echo "nbr of blanks.. " . $wrkCount;
			for ($x = 0; $x <= $wrkCount; $x++) 
	    	        	{
			        	$wrkBlank = ($wrkBlanks . ' ');
                                 }
			$wrkArray = explode($wrkBlanks,$strWork);
			$wrkCnt = count($wrkArray);
                        echo '<BR>Array Count ' . $wrkCnt;
		        foreach($wrkArray as $key => $value)
                           {
                                echo 'Line values   ' . $key." has the value ". $value . '<BR><BR>';
                           }
			//$stripData = trim($wrkArray[1]);
	    }	

Open in new window

Screenshot_1.png
Comment
Watch Question

Most Valuable Expert 2018
Distinguished Expert 2019
Commented:
You could try to use RegEx. There is a specific class for Whitespace that might do the trick : \s

$words= preg_split("@[\s+ ]@u", trim($strWork));

Open in new window


** Copy and paste that code because there's a multi-byte whitespace in there.

Easiest way to check it is to copy the string from your IE source pane and paste it into a proper text editor - something like Notepad++ - then switch the Encoding to something other than UTF8 (ANSI for example). You'll probably see the character change from spaces to weird looking ones.
Most Valuable Expert 2018
Distinguished Expert 2019

Commented:
Another option you might want to try is to convert the string to ANSI before you strip out the characters. Be aware though that doing this could mean data loss as ANSI won't represent all characters.

Author

Commented:
Again I appreciate your help. And you sharing your knowledge !

Author

Commented:
Thank you once again !! Works great ! I did, just for laughs, converted the string to ANSI and that caused a data loss, but it was a just to see. Years ago, I used a similar pattern for removing whitespaces that "looked" like whitespaces but weren't really - but  shoot !! couldn't remember how I structured it - I just remember creating and how pleased I was with myself - Ha Ha !! LOL !!