Jump to content

A code page for import/export files


SergeiG

Recommended Posts

Hi!

I use Russian symbols in the "product name" field and it shows without any problem everywhere.

However, when I export catalog and open it in Excel Russian symbols are shown wrong. If I open .csv file in UltraEdit and save it as UTF-8 file then it is opened in Excel without any problems. But I don't know what is original code-page of .csv file to use external converters (UltraEdit recognizes it automatically).

Could you be so kind to give me the clue what code page is use by default in .csv file and is it possible to change inside CC615 it to avoid additional manual convertation?

Thank you.

Link to comment
Share on other sites

By design, CSV files do not have any kind of language or codepage identifiers. Characters represented by UTF8 code points will appear as their full two, three, of four byte sequences. It is up to the application displaying the data to figure this out -- either by inspection or by a default setting for loading the file.

UltaEdit does this by looking for obvious UTF8 byte sequences within the first 16K (maybe 4K, maybe 256K, I do not recall) of the file's contents. UltraEdit also honors the file's first two bytes, if present, called the Byte Order Mark (BOM). When having UltraEdit save this file, the settings may be such that UltraEdit adds the BOM to the file.

This, in turn, allows Excel to honor the BOM and display the contents appropriately. Excel does not do content inspection.

The BOM is really to indicated '-endian' format, and may mess up other readers. But it will let Excel display the file correctly.

The CubeCart code that writes the CSV can be modified to output the BOM.

Link to comment
Share on other sites

3 minutes ago, bsmither said:

By design, CSV files do not have any kind of language or codepage identifiers. Characters represented by UTF8 code points will appear as their full two, three, of four byte sequences. It is up to the application displaying the data to figure this out -- either by inspection or by a default setting for loading the file.

UltaEdit does this by looking for obvious UTF8 byte sequences within the first 16K (maybe 4K, maybe 256K, I do not recall) of the file's contents. UltraEdit also honors the file's first two bytes, if present, called the Byte Order Mark (BOM). When having UltraEdit save this file, the settings may be such that UltraEdit adds the BOM to the file.

This, in turn, allows Excel to honor the BOM and display the contents appropriately. Excel does not do content inspection.

The BOM is really to indicated '-endian' format, and may mess up other readers. But it will let Excel display the file correctly.

The CubeCart code that writes the CSV can be modified to output the BOM.

Wow! Thank you for so detailed answer! Where can I find a piece of code in the CubeCart to change it to output the BOM?

Link to comment
Share on other sites

A better solution would be to get an "External"-type extension. Here are a couple:
https://www.cubecart.com/extensions/product-feeds/data-pump-import/update/export
https://www.cubecart.com/extensions/product-feeds/storeya-export

But you can try this (I have not actually tried this myself):

In the administrative folder, /sources/products.export.inc.php, near line 145, find:

deliverFile(false, false, $output, $filename);

Change to:

deliverFile(false, false, chr(bindec('11101111')).chr(bindec('10111011')).chr(bindec('10111111').$output, $filename);

 

Link to comment
Share on other sites

2 hours ago, bsmither said:

A better solution would be to get an "External"-type extension. Here are a couple:
https://www.cubecart.com/extensions/product-feeds/data-pump-import/update/export
https://www.cubecart.com/extensions/product-feeds/storeya-export

But you can try this (I have not actually tried this myself):


In the administrative folder, /sources/products.export.inc.php, near line 145, find:

deliverFile(false, false, $output, $filename);

Change to:

deliverFile(false, false, chr(bindec('11101111')).chr(bindec('10111011')).chr(bindec('10111111').$output, $filename);

 

Great thank you. I changed the code and it works. Only one small addition - you forgot one bracket.

The correct code is:

deliverFile(false, false, chr(bindec('11101111')).chr(bindec('10111011')).chr(bindec('10111111')).$output, $filename);

Could you be so kind to advice me how it would be easier to create a correct .csv file for Import?

If I save my .xls file as a .csv in Excel, it creates it not in UTF-8 :-(

I found a temporary solution - save .xls as a Unicode text, replace all tabulations on commas, and rename .txt to .csv ... But it really annoys me.

Thank you!

 

Link to comment
Share on other sites

Yes, I did miss one closing parenthesis.

For import? CubeCart and CubeCart's database structure is already programmed for UTF8. There is no need to specify it.

On the other hand, depending on the version of Excel (if a difference in versions exist), maybe the only choice for a codepage is what the computer is set for.

But see if your version can follow this:

To save a text file as tab-delimited, UTF-8 encoded in Excel:

    Choose File->Save as from the menu.
    In the 'Save as type' dropdown > select 'Text (Tab delimited) (*.txt)' (also try CSV)
(if you have this...)
    Select 'Web Options' in the 'Tools...' dropdown at the bottom of the dialog box.
    Select the 'Encoding' tab.
    In the 'Save this document as:' dropdown, select 'Unicode (UTF-8)'

 

Allow me to make a sincere suggestion: if you are able (that is, not prohibited by company policy), get and install LibreOffice. It's free. LibreOffice is much better at dealing with world-standard file formats, and is able to handle Excel formats.

 

You might also be interested in this:
https://jaimonmathew.wordpress.com/2011/08/23/excel_addin_to_work_with_unicode_csv/

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...