The other day I was coding a script for my twitter trend website Tweetpop, I was coding to script because I want to add more to the site by starting to retrieve lots of random tweets every minute and parse through them to pull out how many characters are in them, how many uppercase, lowercase, number, fullstop, work out averages etc you get the point.

So when I started coding the script I decided to get it to parse through a 100 paragraph slob of Lorum Ipsum. In total there were over 80,000 characters so this would really test my script compared with a few 140 character tweets I will be using it for later on. After making the first version of the script I ran it and it took over five seconds to execute. Because I want to be running this script once a minute till pretty much the end of time, using up five seconds of processing power per minute generally isnt the wisest idea. So optimisation beaconed.

For the script, I put all the text I wanted to parse through in one variable, in this instance called $text. Then I used the php function str_replace to find all of a certain character out of the $text variable and save it as a new variable. I then used strlen to count how many were in the variable and thats how I got my answer.
$text2 = ereg_replace("[^A-Za-z]", "", $text);

Though this a reasonably good way to do it and the way you will find most people online will tell you to, I found out its not the quickest way and if you wish to do more than one on the same variable then it is horribly slow.

So I had a think about ways I could improve the script and I decided to test str_replace to get rid of some of the characters in the variable after I have searched through then to find how many of them there are. I still used erge_replace to search through and pull out all the letters, then counted how many there were. After this I then used: str_replace ("[^A-Za-z0-9]", "", $text);. This went through and replaced all letters with “”, essentially deleting them from the variable. This meant that later in the script when I was looking through the variable to find fullstops, commers etc, It didn’t have to look through all the letters and numbers again because they had been removed.
$text2 = ereg_replace("[^A-Za-z]", "", $text);

$text = str_replace($letters, "", $text);

By doing this, it optimised my script and reduced the time it took to run by more than half, the script took 2.5 seconds to execute now. But that wasnt exactly what I was looking for, I wanted to optimise it even more.

Using what I learnt from revision two of the script, I wanted to try and use str_replace more as the more characters I remove from the variable, the less characters I would have to parse through the next time. I came up with the idea that I didn’t actually need to use ereg_replace any longer at all. If I count how many characters are in the variable, then use str_replace on the variable to remove the characters I want to count. Then all I have to do is then count the number of characters a second time and simple maths will tell me how many characters were removed ($number_characters=$amount_before-$amount_after).

By using this method you are killing two birds with one stone, you are finding out how many characters there are in the variable while removing them so that next time you look through it, it will be much quicker. After I did this I ran the script and the script sorted through all 80,000 characters in less that 0.01 of a second. A massive optimisation from the first version of the script which took over five seconds to do the exact same job, the script ran over 500x faster than it did first off.

The current version of the script is by no means perfect, there is still probably many things I can improve with it, but for the time being it is good enough for the job it will do, though when I get some free time in the future I do intend to try and optimise it even more as any improvements in script performance will be handy to know and could be used in the future. If any of you want to have a look at the final script you can download it here. If you have any improvements that could be made to it please comment as I would love to know and learn.

V1 of Script: Download
V2 of Script: Download
V3 of Script: Download