Link to home
Start Free TrialLog in
Avatar of jbclelland
jbclelland

asked on

Finding potential defines in PHP source code.

We have PHP source code that contains all the usual stuff - for example mixes of HTML, Javascript, CSS etc.  The challenge is this:
We have thousands of files, not all of which contain the necessary defines for the different languages that our product needs to support.  For example, we have PHP strings that contain language that should really be in a definition so that it is configurable:
"This is a test string"
should be:
define('This is a test string', TEST_STRING);

We would like an accurate method of reading our PHP source code and applying the necessary filtering / regular expressions to it so that we can identify strings like the example above, or maybe even strings contained within Javascript etc.  I know there are a number of ways of building textual output in a typical PHP application, which is the majority of the challenge of writing a script to find them.  I am sure this is something that a lot of application developers could benefit from, having left the language definitions of an application to the last minutes, and not having dealt with it at the start of the applications life!

Typical problems are that of dealing with the start and end of the PHP tags i.e. <?php and <?=php so that the correct filtering can be used to detect english strings.

The end result of such a script would be to highlight all of the areas where english text is used, so that these areas could be visited directly and the prgrammers could make the necessary modifications to DEFINE the language so that correct language files can be built and the language of the product can be changed.

Not being an expert with regular expressions to start with, I feel that this is a topic much better suited to a site like Experts Exchange, rather than struggling to find the solution ourselves.

Thanks and Good Luck
Avatar of ddrudik
ddrudik
Flag of United States of America image

I think it's clear what you want to do, it's just not clear to me what a match would look like in your PHP source code.  Please cut-and-paste a section of code with examples of text that you want to match.
You mean something like this?

<html>
<span><p>Hello World</p></span>
</html>

--- to become: ---

<html>
<?php
  define('Hello World',ENG_HELLO);
?>
<span><p><?=ENG_HELLO></p></span>
</html>
Avatar of jbclelland
jbclelland

ASKER

Yes this is what I mean - although the defines will all be held in a different PHP file - specific for the language chosen for that user.  

I don't expect any solution to be an automatic process to replace any strings with defines, all I expect is a script to locate the lines of source code where a define could be put in place of a sting contained in the source code, depending on whether this string is held in possible javascript, html or php sections of a script.

.I will post some code examples in a few minutes
For example, here is some example code before and after the changes have been made.  I need a script to help me locate the lines of source code (before the defines are added) in the thousands of files that we have, so that we can add the appropriate define in its place and then actually save the definition in a separate language file.

BEFORE:
function head_extra()
{
      ?>
      <script language="javascript" type="text/javascript">
      <!---
            calPopup.offsetY = -150;
            calPopup.offsetX = -50;
            
            alert('This is a simple alert that should be changed to a define!');
            
            function buildstring(str)
            {
                  return str .= ' should be defined';
            }
            
            var endString = buildString('This');
            
      // --->
      </script>
      <?php
}


$form = new ooform('stage_form', 'risk_assessments', 'POST');
$form->primary_key = 'risk_id';
$form->display_submit = false;
$form->display_tabs = false;
$form->add_hidden('direction','');
$form->add_foreign_key('site_id', $_SESSION['site_id']);
$form->width = "700px";

$obj =& $form->add_panel('stage1');
$obj->left_col_width = "200px";

$obj =& $form->add_element(FORM_FIELD_SET_OPEN, '1', 'Details');

$obj =& $form->add_element(FORM_TEXTAREA, 'risk_description', 'Description', 'stage1');
$obj->textarea_rows = 8;
$obj->colspan = 4;
$obj->width = '450px';



AFTER:


function head_extra()
{
      ?>
      <script language="javascript" type="text/javascript">
      <!---
            calPopup.offsetY = -150;
            calPopup.offsetX = -50;
            
            alert('<?=DEFINE_ALERT_MSG?>');
            
            function buildstring(str)
            {
                  return str .= ' <?=DEFINE_SHOULD_BE_DEFINED?>';
            }
            
            var endString = buildString('<?=DEFINE_THIS?>');
            
      // --->
      </script>
      <?php
}


$form = new ooform('stage_form', 'risk_assessments', 'POST');
$form->primary_key = 'risk_id';
$form->display_submit = false;
$form->display_tabs = false;
$form->add_hidden('direction','');
$form->add_foreign_key('site_id', $_SESSION['site_id']);
$form->width = "700px";

$obj =& $form->add_panel('stage1');
$obj->left_col_width = "200px";

$obj =& $form->add_element(FORM_FIELD_SET_OPEN, '1', DEFINE_DETAILS_FIELD_NAME);

$obj =& $form->add_element(FORM_TEXTAREA, 'risk_description', DEFINE_DESCRIPTION_TAB, 'stage1');
$obj->textarea_rows = 8;
$obj->colspan = 4;
$obj->width = '450px';;
Notice that other strings are included in the script, that could be picked up by the search function (regular expression) we are trying to create.  This is ok, as the developers will know after the strings are highlighted that these are not meant to be translated into a define as they may be something like a field name that does not change as the database obviously remains the same and doesn't change with language.  However, other strings do need to be defined.  It will always be a manual process, but having a script to highlight the different strings (which maybe contained in php, html or javascript code blocks) so that the developers can navigate directly to the correct line of code to look at will save hours and hours of trawling through code manually to try and find them.  Also, it is less likely that one will be missed.
So you want to take JavaScript code, and list all the strings which appear in it? Do I understand correctly?
No.  It is not just Javascript code.  It may be javascript code embedded in a PHP script, but I also want to find the strings in the PHP, and also any HTML that may be embedded in the PHP script.  Because a PHP script can break out of php and into html, you have the ability to wrote php, html and javascript in the same PHP file, and I need to find all the possible strings so that I can see if I need to define them or not in a separate language file.
So any immediate string values inside javascript code, any immediate string values in php code, and any clear text between html tags?
Yes that is pretty much it.  One sepcial case is the following PHP tag:  <?=
Inside this tag could be one line of code as in this example:

<?=($variable ? 'Variable set' : 'Variable Not set')?>

for example could be changed to (after finding the strings):

<?=($variable ? DEFINE_VARIABLE_SET: DEFINE_VARIABLE_NOT_SET)?>

However there may be a sinple string inside it as in this example:

<?=DEFINE_THIS_IS_A_DEFINE?>

I am sure there are a number of different cases where strings could be found.  I suppose this is really what this question is about - how to find them!
Anyone had any progress on this?
Although this is a solution for localization - if you have not used it from the start like us, it is a lot of work to move over to this method.  It does not solve our problem - as we are trying to identify the areas where we have strings that need to be translated in the first place.
jbclelland, more parameters might be required in order to solve this.  For example, how would you propose on the last ' ' in this group is changed:
$obj =& $form->add_element(FORM_FIELD_SET_OPEN, '1', 'Details');
to:
$obj =& $form->add_element(FORM_FIELD_SET_OPEN, '1', DEFINE_DETAILS_FIELD_NAME);

It seems that your replacements require the regex to somehow "reason" beyond simply recognizing patterns.
All I require is a script that locates the possible defines for me - so, as above, the 'Details' word would be located as possibly needing to be changed to a define.  We would then manually go through the scripts and make the necessary decision to change the word to a define if appropriate.  

The key to this question is having a good enough regex to accurately search the source code for the matches.  This may have to be done through a number of steps of filters, although not knowing much about regex myself, I don't know if this can be done with one expression.

I hope this makes sense for you.  Having this functionality would save us so much time. Remember though, our source files are PHP, which have javascript, css, html and php inside them, either inside or outside of <?php tags which means they could be inside or outside of quotes.  Zend studio or macromedia accurately parse the source to colour code it - maybe this is a clue??
ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial