Dirty Butter Posted July 12, 2015 Share Posted July 12, 2015 Google sometimes complains about 404's for the non seo version of some of our product listings.dirtybutter.com/plushcatalog/shop.php/animals/infantino/infantino-blue-green-orange-dog-string-vibrate-crinkle-rattle/p_941Why does this happen? Is there a way to stop it?My SEO version does not have shop.php or categories or subcategories or the item number, so where are they "getting" these from??? Quote Link to comment Share on other sites More sharing options...
bsmither Posted July 12, 2015 Share Posted July 12, 2015 This is a CC4 SEO URL.Where they get it? I have no idea, nor do I have any idea on how or where to find out where they got it.But I would poke around Webmaster Tools. Maybe they have a "Enter a search result URL and we will tell you where we found it" kind of tool. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted July 12, 2015 Author Share Posted July 12, 2015 I see these off and on and have for a long time. I wish there were a way to trace where Google found the links, but I've never found a way to back track. Since I've seen them for some time I just assumed everyone had similar odd occurrences. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted July 12, 2015 Author Share Posted July 12, 2015 There were 67 of these 404 crawl errors showing on Google Webmaster Tools for 7/10. I had about a hundred of this error in my error log all posted at the same time on 7/10 - no way to tell on Google a more specific time the 404 was created.[10-Jul-2015 17:43:10 America/Chicago] PHP Warning: preg_match() [<a href='http://docs.php.net/manual/en/function.preg-match.php'>function.preg-match.php</a>]: Unknown modifier '/' in /home3/butter01/public_html/plushcatalog/classes/seo.class.php on line 1049 Quote Link to comment Share on other sites More sharing options...
bsmither Posted July 12, 2015 Share Posted July 12, 2015 My copy of CC606's seo.class.php file, the line numbers stop at 996.Please reproduce here line numbers 1039-1059 from your copy of seo.class.php. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted July 12, 2015 Author Share Posted July 12, 2015 (edited) This is 1029 to the end, with 1039 red bold: /** * Create sitemap link * * @param string $input * @param string $updated * @param string $type */ private function _sitemap_link($input, $updated = false, $type = false) { $updated = (!$updated) ? time() : $updated; $store_url = (CC_SSL) ? $GLOBALS['config']->get('config', 'standard_url') : $GLOBALS['storeURL']; // ORIGINAL B4 BSMITHER ROBOTS SITEMAP HACK if (!isset($input['url']) && !empty($type)) { // $input['url'] = $store_url.'/'.$this->generatePath($input['id'], $type, '', false, true); //} // BSMITHER ROBOTS SITEMAP HACK if (!isset($input['url']) && !empty($type)) { $generated_path = $this->generatePath($input['id'], $type, '', false, true); if( !empty($this->_robot_disallows) ) { foreach( $this->_robot_disallows as $disallowed ) { (line 1039) if( preg_match('/^'.$disallowed.'/i', $generated_path, $matches) && in_array($type, $this->_robot_disallow_types) ) { return false; } } } $input['url'] = $store_url.'/'.$generated_path; } // END BSMITHER ROBOTS SITEMAP HACK $this->_sitemap_xml->startElement('url'); $this->_sitemap_xml->setElement('loc', htmlspecialchars($input['url']), false, false); $this->_sitemap_xml->setElement('lastmod', date('c', $updated), false, false); $this->_sitemap_xml->endElement(); } } Edited July 12, 2015 by Dirty Butter Quote Link to comment Share on other sites More sharing options...
bsmither Posted July 12, 2015 Share Posted July 12, 2015 preg_match('/^'.$disallowed.'/i', $generated_path, $matches)When PHP substitutes the current value of $disallowed into the string, that value may have a slash as one of the characters. So the result may be:preg_match('/^path/to/cat/i', $generated_path, $matches)These inadvertent slashes interfere with the delimiters that define the start and end of the string to match on.We need to use different delimiters - characters that will probably never appear as part of a URL in the robots file.Try:preg_match('#^'.$disallowed.'#i', $generated_path, $matches) Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted July 12, 2015 Author Share Posted July 12, 2015 Thank you, Bsmither! I've made the edit - it will take a while to see what Google sees from now on - this part of Google is slow to reflect errors I fix. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 6, 2015 Author Share Posted August 6, 2015 I'm still getting new errors of this type on Google. Could it be that I have something in the plushcatalog .htaccess that should instead be moved or duplicated to the dirtybutter .htaccess? Quote Link to comment Share on other sites More sharing options...
bsmither Posted August 6, 2015 Share Posted August 6, 2015 Using Google to search "plushcatalog/shop.php" (with quotes) gives several hits of other sites that have links in the CC4 style to your site -- including plushmemories.com.But the directives in .htaccess should find and correct for that:#### Rewrite rules for SEO functionality #### <IfModule mod_rewrite.c> RewriteEngine On RewriteBase /plushcatalog ### IMPORTANT! ### ######## START v4 SEO URL BACKWARD COMPATIBILITY ######## RewriteCond %{QUERY_STRING} (.*)$ RewriteCond %{REQUEST_FILENAME} !-f RewriteRule cat_([0-9]+)(\.[a-z]{3,4})?(.*)$ index.php?_a=category&cat_id=$1&%1 [NC] etc Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 6, 2015 Author Share Posted August 6, 2015 I don't actively manage plushmemories.com any more, as it got way too big and time consuming. I moved it to Facebook, where it's working quite well. I found two and fixed them in plushmemories, but one was in a comment and the Regex couldn't find any more. Obviously there are more somewhere. I have over a hundred of these error links listed on Google from the last few months.I don't want to take plushmemories offline. The images and descriptions, plus the link to the Facebook group, etc., are too helpful to many people. Quote Link to comment Share on other sites More sharing options...
bsmither Posted August 6, 2015 Share Posted August 6, 2015 The directives in .htaccess should find and correct for that. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 6, 2015 Author Share Posted August 6, 2015 But they don't, so is this a hopeless cause? Quote Link to comment Share on other sites More sharing options...
bsmither Posted August 6, 2015 Share Posted August 6, 2015 We might want to find out why.http://httpd.apache.org/docs/current/mod/mod_rewrite.htmlIn the Logging section. Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 6, 2015 Author Share Posted August 6, 2015 Sorry. It might as well be Greek. I use WP at dirtybutter.com as the domain landing page. These shop.php url's trigger the WP error page, not the plushcatalog error page. Does that give any clue as to what is wrong with the re-directs? Quote Link to comment Share on other sites More sharing options...
bsmither Posted August 6, 2015 Share Posted August 6, 2015 I wonder if the web server takes note of the fact that a subdirectory named /plushcatalog/ actually exists, then looks at the .htaccess file in that folder -- if the .htaccess file for CubeCart is there.Have you tried putting CubeCart's .htaccess rewrite directives in the WP .htaccess file? Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 6, 2015 Author Share Posted August 6, 2015 I have the plushcatalog htaccess merged into the domain one as best I could - that certainly doesn't mean I did it correctly.## File Security<FilesMatch "\.(htaccess)$"> Order Allow,Deny Deny from all</FilesMatch>#### Apache directory listing rules ####DirectoryIndex index.php index.htm index.htmlIndexIgnore *<IfModule mod_expires.c># Enable expirationsExpiresActive On# Default directiveExpiresDefault "access plus 1 month"# My faviconExpiresByType image/x-icon "access plus 1 year"# ImagesExpiresByType image/gif "access plus 1 month"ExpiresByType image/png "access plus 1 month"ExpiresByType image/jpg "access plus 1 month"ExpiresByType image/jpeg "access plus 1 month"</IfModule><IfModule mod_deflate.c> <IfModule mod_headers.c> Header append Vary User-Agent env=!dont-vary </IfModule> AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json <IfModule mod_mime.c> # DEFLATE by extension AddOutputFilter DEFLATE js css htm html xml </IfModule></IfModule>#### Rewrite rules for SEO functionality ####<IfModule mod_rewrite.c> RewriteEngine OnRewriteCond %{HTTPS} offRewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]RewriteBase / ######## START v4 SEO URL BACKWARD COMPATIBILITY ######## RewriteCond %{QUERY_STRING} (.*)$ RewriteCond %{REQUEST_FILENAME} !-f RewriteRule cat_([0-9]+)(\.[a-z]{3,4})?(.*)$ index.php?_a=category&cat_id=$1&%1 [NC] RewriteCond %{QUERY_STRING} (.*)$ RewriteCond %{REQUEST_FILENAME} !-f RewriteRule prod_([0-9]+)(\.[a-z]{3,4})?$ index.php?_a=product&product_id=$1&%1 [NC] RewriteCond %{QUERY_STRING} (.*)$ RewriteCond %{REQUEST_FILENAME} !-f RewriteRule info_([0-9]+)(\.[a-z]{3,4})?$ index.php?_a=document&doc_id=$1&%1 [NC] RewriteCond %{QUERY_STRING} (.*)$ RewriteCond %{REQUEST_FILENAME} !-f RewriteRule tell_([0-9]+)(\.[a-z]{3,4})?$ index.php?_a=product&product_id=$1&%1 [NC] RewriteCond %{QUERY_STRING} (.*)$ RewriteCond %{REQUEST_FILENAME} !-f RewriteRule _saleItems(\.[a-z]+)?(\?.*)?$ index.php?_a=saleitems&%1 [NC,L] ######## END v4 SEO URL BACKWARD COMPATIBILITY ######## RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} !=/favicon.ico RewriteRule ^(.*)\.html?$ index.php?seo_path=$1 [L,QSA] #301 Redirect Old FileRedirect 301 https://dirtybutter.com/1969-chevrolet-caprice/photo-gallery-1969-chevrolet-caprice/ https://dirtybutter.com/1969-chevrolet-caprice/1969-chevrolet-caprice-photos/Options -IndexesErrorDocument 404 /index.php?_a=404ErrorDocument 400 /400-bad-request.htmlErrorDocument 401 /401-restricted-page.htmlErrorDocument 500 /500-server-problem.htmlErrorDocument 302 /302-moved-temporarily.html# Use PHPcur as defaultAddHandler application/x-httpd-phpcur .php<IfModule mod_suphp.c> suPHP_ConfigPath /opt/phpcur/lib</IfModule></IfModule>RewriteCond %{HTTP_HOST} ^dirtybutter\.com$ [OR]RewriteCond %{HTTP_HOST} ^www\.dirtybutter\.com$RewriteRule ^plushcatalog\/$ "https\:\/\/dirtybutter\.com\/plushcatalog\/" [R=301,L]RewriteCond %{HTTP_HOST} ^www\.dirtybutter\.com$RewriteRule ^/?$ "https\:\/\/dirtybutter\.com\/" [R=301,L]# BEGIN WordPress<IfModule mod_rewrite.c>RewriteEngine OnRewriteBase /RewriteRule ^index\.php$ - [L]RewriteCond %{REQUEST_FILENAME} !-fRewriteCond %{REQUEST_FILENAME} !-dRewriteRule . /index.php [L]</IfModule># END WordPress Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 9, 2015 Author Share Posted August 9, 2015 I may have found the issue! I had a 404 plugin on the WP install. I deactivated it, and tried one of the v4 style links on Google Webmaster warnings - and it redirected as it should have! Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 13, 2015 Author Share Posted August 13, 2015 I don't know why, but my issue is BACK! Google is seeing the v4 form url's again. UGH!!!!! I have the WP .htaccess section LAST in my file. I would assume the server would deal with the plushcatalog re-directs BEFORE it ever gets to the WP files. WP is in a wp folder, but I'm using the redirect that makes it show on the domain as home. Is THAT the issue??? Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 14, 2015 Author Share Posted August 14, 2015 I've started from scratch again with this issue. This is what I did in debugging:I changed my WP site so it was not on the domain root, but on dirtybutter.com/wp and put my old dirtybutter.com page up instead. This involved changing the htaccess file back to the default hostgator one.When I tested one of the v4 urls, it still did not resolve to the v6 version, but showed the plushcatalog 404 page instead.So whatever the issue is, it is NOT happening because of my WP install on the domain root.So I put the WP install back on the root, as it was originally.I DID change the root WP htaccess section to the Multi-site version, and now the error message for the v4 version link goes to the plushcatalog 404 page. ONE PART FIXEDBut I AM still getting the error page, not the correct url for the product. It is STILL not redirecting to the SEO friendly URL. Quote Link to comment Share on other sites More sharing options...
bsmither Posted August 14, 2015 Share Posted August 14, 2015 If your 'troublesome' URLs are still of this syntax:...vibrate-crinkle-rattle/p_941then the .htaccess rule to deal with CC4 URLs are not going to work. The rewrite rule is:prod_([0-9]+)(\.[a-z]{3,4})?$ index.php?_a=product&product_id=$1&%1I will have to say that the URLs you are getting is not a standard CC4 SEO style. It may be a SEO mod style for CC3 or early CC4.You can try to catch that stryle of SEO URL by using this rewrite rule:Instead of: RewriteRule prod_([0-9]+)(\.[a-z]{3,4})?$ index.php?_a=product&product_id=$1&%1 [NC] Use: RewriteRule p(rod)?_([0-9]+)(\.[a-z]{3,4})?$ index.php?_a=product&product_id=$2&%1 [NC]This now looks for:p, then rod which may or may not be there, then the underscore, then the product ID number which is one or more digits from 0 to 9, then a period and three or four letters that may or may not be there. Because another set of parenthesis was added, the product ID is now in the second set of parenthesis ($2 instead of $1). Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 14, 2015 Author Share Posted August 14, 2015 THAT DID IT!! I don't use category/subcategory in my url's, and I've never noticed one of these odd url's for the other url's in the SEO htaccess defaults. It may be there have been some, but they were buried in all the hundreds of product urls, and I just didn't see them.I've been at the point of throwing my laptop over this (felt like it, but didn't ), but as long as I was blaming it on the WP install I was never going to get it fixed. We'll see if any of the other url's need to be tweaked, so I'll wait awhile to change this to Resolved.THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU - THANK YOU -Well, I just marked as fixed a lot of url's on Google, but there were two of a different type:https - dirtybutter.com/plushcatalog/extra/prodImages.php?productId=821I don't have, and frankly don't remember ever having, a folder named extra - how should I word a Redirect to take care of these?There were only two of those, first seen by Google just a couple of days ago, so I'm going to mark them as "fixed" and see if they show up again, before I worry about them. As much as I've been fiddling with stuff, trying to find the issue, no telling how they were created. Quote Link to comment Share on other sites More sharing options...
bsmither Posted August 14, 2015 Share Posted August 14, 2015 /plushcatalog/extra/prodImages.php?productId=821I recognize this as CC3's method of showing the secondary/additional images that are associated with a product. This link on CC3's View Product page will toss up a new popup window with a gallery of sorts..We can try to write a rewrite rule to capture this and deliver the product's page. I will need to study this because we are looking at the querystring (?productId=821) and not the URL, although we will need to trigger on the URL.Add somewhere along side the v4 SEO rules:RewriteCond %{QUERY_STRING} ^productId=([0-9]+)$ RewriteRule /extra/prodImages\.php$ index.php?_a=product&product_id=%1 [NC] Quote Link to comment Share on other sites More sharing options...
Dirty Butter Posted August 14, 2015 Author Share Posted August 14, 2015 All the way back to v3?? Someone else developed the store way back then. These HAVE to be old links somewhere in Plush Memories comments. Hopefully this will take care of that issue! Should I make a new thread for the page=all duplicate meta issue? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.