Jump to content

URL Rewriting not really UTF8


fabien

Recommended Posts

Hello, so I did the jump from version 3 to version 6 (no upgrade, but separate installation). 
Pretty happy with the result, only one thing is 'annoying'. As you can notice with here under attachment, the URL rewriting works, but does not recognize the langage specific coding (here, French) 

If for instance you create a new product, and name it, for the sake of the exercise 'Salade Composée'. Then, the url will be this one: 
http://www.forcemajeure.com/blog/livres/salade-compos-e.html
while it should be 
http://www.forcemajeure.com/blog/livres/salade-composée.html

I checked on the DB side, tables are properly set to UTF8, then tried to have a look deeper in the site files, but there I'm sort of lost.
Is this a problem already identified ?
 

Thanks for your help, 

Fabien

 

capture.jpg

Link to comment
Share on other sites

According to "RFC 3986" the valid characters for the path component are:

a-z A-Z 0-9 . - _ ~ ! $ & ' ( ) * + , ; = : @

as well as the slash / to delineate the path segments.

Then the question mark (?) starts a new set of legal characters.

And the percent (%) flags the next two characters as a hex value (of decimal value 0-255) that represents an otherwise invalid character.

There has been some talk among the organizations that maintain the "rules" for how the Internet works to allow for UTF-8 encoded byte sequences in URLs, but it's not here yet.

For now, the world is limited to these approx. 79 characters.

I did make a suggestion that CubeCart make URLs that look like this (partial):

compos%c3%a9e.html

But that would require the rest of the computing world be ready to automatically render the percent characters as the UTF-8 byte sequence and 'recognize' it as such, thus displaying the e-acute. There is also the HTML Entity equivalent:

é (I think)

but I'm not sure every character in every (even moderately used) language has an HTML Entity equivalent.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...