Jump to content

fuzzy search


sean1688
 Share

Recommended Posts

No and yes.

 

No, MySQL is not built for searching. There are two methods MySQL employs and CubeCart tries one first, then the other. Neither is 'fuzzy'-looking.

 

Yes, because CubeCart is (now) 100% human-readable and extensible, a plugin could be written to implement a search algorithm comparable to what Google can do, or the plugin can use Goggle itself to return it's results from having scanned your site's pages.

 

I've started to play around with Sphinx, a "man in the middle" database that scans your CubeCart database and creates a comprehensive collection of searchable stuff that is 'fuzzy'-looking. But this requires your own server.

Link to comment
Share on other sites

You can try switching the order of the two methods CubeCart uses to send queries to MySQL. In /classes/catalogue.class.php, find the searchCatalogue() function. (Line numbers have changed in CC600b6.)

 

Change the function argument list from $search_mode = 'fulltext' to $search_mode = 'like'.
 

There are some statements about 190 lines later than could be edited to have CubeCart switch back to fulltext if like didn't find anything. We won't worry about that for now.

Link to comment
Share on other sites

thanks for your reply. I tried to change the search mode per your suggestion, but nothing seems to have changed.

 

maybe I should rephrase the request. can we implement the search so that a match of subset will be presented?

 

in other words, the product code is HY3005D, but if we search for a subset of that, say 3005D, the result of HY3005D search should be presented instead of null?

Link to comment
Share on other sites

The 'like' method of searching uses the standard SQL expression:

RLIKE [[:<:]]3005D[[:>:]]

 

The [[:<:]] is a character class that means a word boundary and RLIKE is short for Regular Expression.

 

So, this is searching on whole words.

 

To get a simpler search (as you describe), make this next edit:

Was:
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if (($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR')) {
                                $regexp .= '[[:<:]]'.$searchArray[$i].'[[:>:]].*';
                            }
                        }
                        $regexp = substr($regexp, 0, strlen($regexp)-2);
                        $rlike = " AND (I.name RLIKE '".$regexp."' OR I.description RLIKE '".$regexp."' OR I.product_code RLIKE '".$regexp."')";
 
Now:
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if (($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR')) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp .= '%'.$searchArray[$i].'%';
                            }
                        }
                        // $regexp = substr($regexp, 0, strlen($regexp)-4);
                        $rlike = " AND (I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."')";

I recall testing this many moons ago, but I cannot be certain I have it all correct.

 

In admin, Store Settings, Advanced, enable Debug mode and enter your IP address (www.whatis myip.com).

 

Perform a search. In the queries section of debug, look for the query that has your search terms in it. We may need to tweak the code.

Link to comment
Share on other sites

This code works better, but still not perfect:

                        $rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                            } else { $rlike .= " ".$ucSearchTerm." "; continue; }
                            $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."') AND ";
                        }
                        $rlike = substr($rlike, 0, strlen($rlike)-5); $rlike .= " )";

Please realize that there are several syntaxes for search patterns:

Every word is a term that is AND'd (<<== what the above code does)

Every word is a term that is OR'd

Words can be force AND'd or OR'd (<<== what the above code does)

AND and OR are the combining symbols, but others exist such as && and || and NOT

 

A few statements prior, any commas and spaces are used to separate the words and then removed.

 

Tesa31 or Tesa32, returns two products.

Tesa31, Test doesn't work.

 

More later.

Link to comment
Share on other sites

I like this one:

                        $rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                                $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."')";
                            }
                            else {
                              if(strlen($ucSearchTerm) > 0){ $rlike .= " ".$ucSearchTerm." "; }
                              else { $rlike .= " AND "; }
                              continue;
                            }
                        }
                        $rlike .= " )";
 
Link to comment
Share on other sites

  • 5 months later...

Yes it is possible because I have it working on one of my websites. Unfortunately I can't remember how I did it but I will try and work it out and let you know.

Did this on a v3 store and it works for part word and multiple word.

http://cubecart.expandingbrain.com/free-cubecart-3-mods/simple-search-improvement-cc3/prod_160.html

Haven't tried it on a v6 store but assume it is possible. 

Edited by ayz1
Update
Link to comment
Share on other sites

Yes it is possible because I have it working on one of my websites. Unfortunately I can't remember how I did it but I will try and work it out and let you know.

Did this on a v3 store and it works for part word and multiple word.

http://cubecart.expandingbrain.com/free-cubecart-3-mods/simple-search-improvement-cc3/prod_160.html

Haven't tried it on a v6 store but assume it is possible. 

Any idea which file I'd use?

Link to comment
Share on other sites

It's beyond me unfortunatley. However if it's possible in V3 then I assume it's possible on V6. Tought the link might give someone a clue how to do it. Tried the code from bsmither above but couldn't get it to work. Probably something I am doing wrong rather than a problem with the code. Will have another go at it.

Link to comment
Share on other sites

I like this one:

                        $rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                                $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."')";
                            }
                            else {
                              if(strlen($ucSearchTerm) > 0){ $rlike .= " ".$ucSearchTerm." "; }
                              else { $rlike .= " AND "; }
                              continue;
                            }
                        }
                        $rlike .= " )";

Tried this again and it worked. Seems to do exactly what is required. Thanks bsmither.

Link to comment
Share on other sites

I like this one:

                        $rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                                $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."')";
                            }
                            else {
                              if(strlen($ucSearchTerm) > 0){ $rlike .= " ".$ucSearchTerm." "; }
                              else { $rlike .= " AND "; }
                              continue;
                            }
                        }
                        $rlike .= " )";

Tried this again and it worked. Seems to do exactly what is required. Thanks bsmither.

I must be doing something wrong because that doesn't work for me :/

[Notice] /home/content/88/5118188/html/ltcube/classes/catalogue.class.php:1530 - Undefined variable: order
[Notice] /home/content/88/5118188/html/ltcube/classes/catalogue.class.php:1598 - Undefined variable: noKeys

I am getting these errors, not sure if that helps.

Link to comment
Share on other sites

Change the function argument list from $search_mode = 'fulltext' to $search_mode = 'like'.

Where bsmither said to make the main code change you should have this (replace what is already there)

 

$rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                            } else { $rlike .= " ".$ucSearchTerm." "; continue; }
                            $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."') AND ";
                        }
                        $rlike = substr($rlike, 0, strlen($rlike)-5); $rlike .= " )";

                    }

 

between the lines

                        $noKeys = count($searchArray);
                        $regexp = '';

 

AND

 

$q2 = "SELECT I.* FROM ".$GLOBALS['config']->get('config', 'dbprefix')."CubeCart_inventory AS I LEFT JOIN (SELECT product_id, MAX(price) as price

Link to comment
Share on other sites

With this as my code nothing shows up on the page- it's totally broken, but there aren't any errors in the debug.

 

$noKeys = count($searchArray);
                        $regexp = '';
        
    $rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                            } else { $rlike .= " ".$ucSearchTerm." "; continue; }
                            $rlike .= "(I.name LIKE '".$regexp."' OR I.seo_meta_keywords LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."') AND ";
                        }
                        $rlike = substr($rlike, 0, strlen($rlike)-5); $rlike .= " )";

                    }

$q2 = "SELECT I.* FROM ".$GLOBALS['config']->get('config', 'dbprefix')."CubeCart_inventory AS I LEFT JOIN (SELECT product_id, MAX(price) as price

Link to comment
Share on other sites

I've just verified the following works on CC606. There will be four edits:

Find near line 1590:
    $rlike = '';
    if (!empty($search_data['keywords'])) {
        $searchwords = preg_split( '/[\s,]+/', $GLOBALS['db']->sqlSafe($search_data['keywords']));
        foreach ($searchwords as $word) {
            $searchArray[] = $word;
        }

        $noKeys = count($searchArray);
        $regexp = '';
        for ($i=0; $i<$noKeys; ++$i) {
            $ucSearchTerm = strtoupper($searchArray[$i]);
            if (($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR')) {
                $regexp .= '[[:<:]]'.$searchArray[$i].'[[:>:]].*';
            }
        }
        $regexp = substr($regexp, 0, strlen($regexp)-2);
        $rlike = " AND (I.name RLIKE '".$regexp."' OR I.description RLIKE '".$regexp."' OR I.product_code RLIKE '".$regexp."')";
    }

Change to:
/*
    $rlike = '';
    if (!empty($search_data['keywords'])) {
        $searchwords = preg_split( '/[\s,]+/', $GLOBALS['db']->sqlSafe($search_data['keywords']));
        foreach ($searchwords as $word) {
            $searchArray[] = $word;
        }

        $noKeys = count($searchArray);
        $regexp = '';
        for ($i=0; $i<$noKeys; ++$i) {
            $ucSearchTerm = strtoupper($searchArray[$i]);
            if (($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR')) {
                $regexp .= '[[:<:]]'.$searchArray[$i].'[[:>:]].*';
            }
        }
        $regexp = substr($regexp, 0, strlen($regexp)-2);
        $rlike = " AND (I.name RLIKE '".$regexp."' OR I.description RLIKE '".$regexp."' OR I.product_code RLIKE '".$regexp."')";
    }
*/
    if (!empty($search_data['keywords'])) {
        $rlike = ' AND ( ';
        $searchwords = preg_split( '/[\s,]+/', $GLOBALS['db']->sqlSafe($search_data['keywords']));
        foreach ($searchwords as $word) {
            $searchArray[] = $word;
        }

        $noKeys = count($searchArray);
        for ($i=0; $i<$noKeys; ++$i) {
            $ucSearchTerm = strtoupper($searchArray[$i]);
            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                $searchArray[$i] = str_replace(array('%','_'), array('\%','\_'), $searchArray[$i]);
                $regexp = '%'.$searchArray[$i].'%';
                $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."')";
            } else {
                if(strlen($ucSearchTerm) > 0){ $rlike .= " ".$ucSearchTerm." "; }
                else { $rlike .= " AND "; }
                continue;
            }
        }
        $rlike .= " )";
    }

Just after that:

Find:
    if (($search = $GLOBALS['db']->query($query)) !== false) {
        $count = $GLOBALS['db']->query($q2, false, 0);
        $this->_category_count  = (int)count($count);
        $this->_category_products = $search;
        return true;
    }

Change to:
    if (($search = $GLOBALS['db']->query($query)) !== false) {
        $count = $GLOBALS['db']->query($q2, false, 0);
        $this->_category_count  = (int)count($count);
        $this->_category_products = $search;
        return true;
/**/} elseif ($search_mode == 'like') {
/**/	return $this->searchCatalogue($original_search_data, 1, $per_page, 'fulltext');
    }

Just above the first edit:

Find:
	if (($search = $GLOBALS['db']->query($query)) !== false) {
		$q2 = sprintf("SELECT COUNT(I.product_id) as count, %2\$s AS Relevance FROM %1\$sCubeCart_inventory AS I LEFT JOIN (SELECT product_id, MAX(price) as price, MAX(sale_price) as sale_price FROM %1\$sCubeCart_pricing_group $group_id GROUP BY product_id) as G ON G.product_id = I.product_id $joinString WHERE I.product_id IN (SELECT product_id FROM `%1\$sCubeCart_category_index` as CI INNER JOIN %1\$sCubeCart_category as C where CI.cat_id = C.cat_id AND C.status = 1) AND I.status = 1 AND (%2\$s) >= %4\$s %3\$s GROUP BY I.product_id %5\$s", $GLOBALS['config']->get('config', 'dbprefix'), $match, $whereString, $match_val, $order_string);
		$count = $GLOBALS['db']->query($q2, false, 0);
		$this->_category_count  = (int)count($count);
		$this->_category_products = $search;
		return true;
	} elseif ($search_mode == 'fulltext') {
		return $this->searchCatalogue($original_search_data, 1, $per_page, 'like');
	}

Change to:
	if (($search = $GLOBALS['db']->query($query)) !== false) {
		$q2 = sprintf("SELECT COUNT(I.product_id) as count, %2\$s AS Relevance FROM %1\$sCubeCart_inventory AS I LEFT JOIN (SELECT product_id, MAX(price) as price, MAX(sale_price) as sale_price FROM %1\$sCubeCart_pricing_group $group_id GROUP BY product_id) as G ON G.product_id = I.product_id $joinString WHERE I.product_id IN (SELECT product_id FROM `%1\$sCubeCart_category_index` as CI INNER JOIN %1\$sCubeCart_category as C where CI.cat_id = C.cat_id AND C.status = 1) AND I.status = 1 AND (%2\$s) >= %4\$s %3\$s GROUP BY I.product_id %5\$s", $GLOBALS['config']->get('config', 'dbprefix'), $match, $whereString, $match_val, $order_string);
		$count = $GLOBALS['db']->query($q2, false, 0);
		$this->_category_count  = (int)count($count);
		$this->_category_products = $search;
		return true;
//	} elseif ($search_mode == 'fulltext') {
//		return $this->searchCatalogue($original_search_data, 1, $per_page, 'like');
	}

Finally, at the beginning of the function searchCatalogue:

Find:
public function searchCatalogue($search_data = null, $page = 1, $per_page = 10, $search_mode = 'fulltext') {

Change to:
public function searchCatalogue($search_data = null, $page = 1, $per_page = 10, $search_mode = 'like' /* 'fulltext' */) {

The first edit discards the use of word boundaries in a regex-style search pattern, and uses a simple character match -- an underscore (_) replaces any one character, a percent (%) replaces zero or more characters. This new search code uses only the percent at the beginning and end of the search terms. The terms to be searched may contain an underscore or percent, but will be searched on literally.

The next three edits switches the search function to first use a "Simple Pattern Match" in the SQL statements sent to the database, then use the "Full Text" search of the database if nothing comes up.

This part of the new code:

$rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."')";

can be expanded upon to include other product-based fields, including:
seo_meta_keywords
short_description

Edited by bsmither
Link to comment
Share on other sites

  • 5 months later...
On 26/08/2015 at 6:15 PM, ayz1 said:

Tried this again and it worked. Seems to do exactly what is required. Thanks bsmither.

Ive just applied the short version of this to V6.10 and it searches no problem. Unfortunately the Sort By doesn't seem to work. It changes the URL but doesn't refresh the page. Anyone got the search and the sort working together?

Link to comment
Share on other sites

If i do an initial search for two keywords red bus I get the desired results with the URL domainname.com/search.html?search[keywords]=red+bus&_a=category

If I change the Sort to by price descending the URL changes to

search.html?_a=category&search[keywords]=red bus&sort[price]=DESC

but the page doesn't reload.

So the initial search works but the sorting doesn't. I've used the following code edit.

On 26/08/2015 at 8:28 PM, ayz1 said:

Change the function argument list from $search_mode = 'fulltext' to $search_mode = 'like'.

Where bsmither said to make the main code change you should have this (replace what is already there)

 

$rlike = ' AND ( ';
                        for ($i=0; $i<$noKeys; ++$i) {
                            $ucSearchTerm = strtoupper($searchArray[$i]);
                            if ( ($ucSearchTerm != 'AND') && ($ucSearchTerm != 'OR') && (strlen($ucSearchTerm) > 0) ) {
                                $searchArray[$i] = str_replace(array('%','_'), array('%','_'), $searchArray[$i]);
                                $regexp = '%'.$searchArray[$i].'%';
                            } else { $rlike .= " ".$ucSearchTerm." "; continue; }
                            $rlike .= "(I.name LIKE '".$regexp."' OR I.description LIKE '".$regexp."' OR I.product_code LIKE '".$regexp."') AND ";
                        }
                        $rlike = substr($rlike, 0, strlen($rlike)-5); $rlike .= " )";

                    }

 

between the lines

                        $noKeys = count($searchArray);
                        $regexp = '';

 

AND

 

$q2 = "SELECT I.* FROM ".$GLOBALS['config']->get('config', 'dbprefix')."CubeCart_inventory AS I LEFT JOIN (SELECT product_id, MAX(price) as price

 

Link to comment
Share on other sites

7 hours ago, bsmither said:

"but the page doesn't reload."

Are you sure??? Maybe a new page came in but it looks identical to the page that was already there.

Sorry yes you are correct. The page is going through the reload process but it is laoding the page that was already there.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...