Ever wanted to handle search engine style queries from PHP‽
Today, anyone who has a used a Web-based search engine, with any bit of sophistication, has come across quoted strings.
For example, people might search for...
- "halo 3"
- "video blogging" software
- "Internet TV"
- "Zend Avesta" "english translation"
Often people do searches like these... using quoted strings... to improve the search results they are getting from the search engine. They do this by getting only those results that include the phrases... contained in the quotes string(s)... as a whole.
And while this paradigm for getting queries from the user is common place and seems to have become ubiquitous... AFAIK, PHP does NOT have a built in function to tokenize such queries.
This article provides you with a function you can use to do just that....
Query to Tokens
Here's the code....
function querytotokens($q)
{
//
// Check parameters.
//
if ( !isset($q) || FALSE === $q || !is_string($q) ) {
// Error.
return FALSE;
}
//
// Get the tokens from the query.
//
$x = trim($q);
// SHORT CIRCUIT
if ( '' === $x ) {
/////////////// RETURN
return array();
}
$chars = str_split($x);
$mode = 'normal';
$token = '';
$tokens = array();
for ($i=0;$i<count($chars);$i++) {
switch ($mode) {
case 'normal':
if ( '"' == $chars[$i] ) {
if ( '' != $token) {
$tokens[] = $token;
}
$token = '';
$mode = 'quoting';
} else if ( ' ' == $chars[$i] || "\t" == $chars[$i] || "\n" == $chars[$i] ) {
if ( '' != $token) {
$tokens[] = $token;
}
$token = '';
} else {
$token .= $chars[$i];
}
break;
case 'quoting':
if ( '"' == $chars[$i] ) {
if ( '' != $token) {
$tokens[] = $token;
}
$token = '';
$mode = 'normal';
} else {
$token .= $chars[$i];
}
break;
} // switch
} // foreach
if ( '' != $token) {
$tokens[] = $token;
}
//
// Return.
//
return $tokens;
}
Examples
If you were to run the follow code below, that makes use of the querytotokens() function, then you will get the output below it....
$q1 = 'apple';
$q2 = '"apple"';
$q3 = 'seedless grapes';
$q4 = '"seedless grapes"';
$q5 = '"once upon a time" "snow white"';
$q6 = '';
$q7 = ' ';
$q8 = 'aouei "m g ng z r" h d t c q "blfsn"';
$t1 = querytotokens($q1);
$t2 = querytotokens($q2);
$t3 = querytotokens($q3);
$t4 = querytotokens($q4);
$t5 = querytotokens($q5);
$t6 = querytotokens($q6);
$t7 = querytotokens($q7);
$t8 = querytotokens($q8);
That would give you...
$t1 == array
( 0 => 'apple'
);
$t2 == array
( 0 => 'apple'
);
$t3 == array
( 0 => 'seedless'
, 1 => 'grapes'
);
$t4 == array
( 0 => 'seedless grapes'
);
$t5 == array
( 0 => 'once upon a time'
, 1 => 'snow white'
);
$t6 == array();
$t7 == array();
$t8 == array
( 0 => 'aouei'
, 1 => 'm g ng z r'
, 2 => 'h'
, 3 => 'd'
, 4 => 't'
, 5 => 'c'
, 6 => 'q'
, 7 => 'blfsn'
);
-- Charles Iliya Krempeaux, B.Sc.
Comments
No known comments. (There may be some out there though.)