Check for Words in a PHP String

Consider the following code fragment :

$str = "Hello World";

if($str contains "Hello")
echo 'true'; //true

What would be an efficient code replacement for `$str contains “Hello”` in PHP ?

Using `strpos`

The PHP function `strpos` returns the first position in the `$haystack` string at which the `$needle` string occurs. Here’s how it can be used:

if(strpos($str, "Hello") !== false)
echo "true";

Note the use of the equality operator `!==`. `strpos` returns the numeric position of occurrence of the `$needle` string *if it occurs*. and `false` otherwise. Since string positions begin at 0 and 0 evaluates to `false` in PHP, strict equality is ensured using the `!==` operator. 

For case-insensitive comparisons, the related function `stripos` can be used.

Using Regular Expressions

PHP offers functions to enable pattern matching using regular expressions. Using the `preg_match` function :

if(preg_match("/Hello/", $str)
echo "true";

`preg_match` performs a regex search in the given `$string` and returns 1 if the string matches, 0 if not, and `false` if an error occurs. It is to be noted though, that `preg_match` is slower than `strpos` and will give reduced performance on large number of search operations.

The above solution (as well as the first) is plagued by the problem that it matches the substring, and not the *word*. Therefore, the above match will be successful for the following strings :

  • “Hellozzzz”
  • “HiHelloHowDoYouDo”

To identify clear words instead of strings, the pattern needs to be refined :

$word = "Hello";
if(preg_match('/\b' . preg_quote($word) . '\b/', $string))
echo "true";

The character `\b` signifies word boundaries. A word boundary is identified by the presence of a character that is not alphanumeric or an underscore. Therefore, some cases can still fail:

  • “Erm_Hello”
  • “Hello2u”

Using `strstr`

An indirect check to confirm the presence of the word :

if(strstr($string, 'Hello') !== false)
echo 'true';

For case-insensitive searches, `stristr` can be used. The above method is weaker since it searches for the substring, disregarding word boundaries.

Using `substr_count`

A function which avoids any boolean problems :

if(substr_count($string, 'Hello') > 0)
echo 'true';

Again, there isn’t any regard for word boundaries, and the function basically checks for substrings. While neater, it’s slower than `strpos`.

Manipulating strings using `explode`

The string to be searched is exploded on common word boundary characters, and the resultant arrays are then searched for membership of the *word*.

$word = 'Hello';
$spaceWords = explode(' ', $string);
$nonBreakingSpaceWords = explode(chr(160), $string); // &nbsp
if(in_array($word, $spaceWords) || 
in_array($word, $nonBreakingSpaceWords))
echo 'true';

Using `mb_strpos`

The function `mb_strpos` is similar to the `strpos` function but makes sure that the search is multi-byte safe. The code remains exactly the same :

if(mb_strpos($string, 'Hello') !== false)
echo 'true';

References

  1. strpos
  2. preg_match
  3. preg_quote
  4. strstr
  5. substr_count
  6. explode
  7. mb_strpos
  8. Understanding Unicode and Character Sets

Leave a Reply

Your email address will not be published. Required fields are marked *