We have PHP source code that contains all the usual stuff - for example mixes of HTML, Javascript, CSS etc. The challenge is this:
We have thousands of files, not all of which contain the necessary defines for the different languages that our product needs to support. For example, we have PHP strings that contain language that should really be in a definition so that it is configurable:
"This is a test string"
should be:
define('This is a test string', TEST_STRING);
We would like an accurate method of reading our PHP source code and applying the necessary filtering / regular expressions to it so that we can identify strings like the example above, or maybe even strings contained within Javascript etc. I know there are a number of ways of building textual output in a typical PHP application, which is the majority of the challenge of writing a script to find them. I am sure this is something that a lot of application developers could benefit from, having left the language definitions of an application to the last minutes, and not having dealt with it at the start of the applications life!
Typical problems are that of dealing with the start and end of the PHP tags i.e. <?php and <?=php so that the correct filtering can be used to detect english strings.
The end result of such a script would be to highlight all of the areas where english text is used, so that these areas could be visited directly and the prgrammers could make the necessary modifications to DEFINE the language so that correct language files can be built and the language of the product can be changed.
Not being an expert with regular expressions to start with, I feel that this is a topic much better suited to a site like Experts Exchange, rather than struggling to find the solution ourselves.
Thanks and Good Luck