CSV S3 file parsing incompatible character encodings: ASCII-8BIT and UTF-8

Hi, I’m trying to parse a csv file uploaded to S3 vie Carrierwave and getting “incompatible character encodings: ASCII-8BIT and UTF-8” error.
I had to call CarrierWave::Storage::fog::File#read method on the Fog file I’m getting back from S3 in my resque job since it wouldn’t let me get the file as it was(calling #read give me a string back).
But then when I try to call CSV.parse(@file.force_encoding(‘UTF-8’), :encoding => ‘utf-8’) passing the string(@file in this case) I get the above error. Could someone give me a clue on how to resolve it?
Thanks.

Hi @alexbush, in the ruby files that deal with the CSV, you can try to add a comment on top like so:

# encoding: utf-8

That should do the trick :slight_smile: Hope this helps!

nope, that was the first thing I tried. The problem is not in the characters I type myself in a ruby file but in the file I parse after downloading from an S3 storage via Carrierwave/Fog gems.

I found a workaround which is… not that perfect…

temp_file.write file.read.encode('UTF-8', { :invalid => :replace, :undef => :replace, :replace => '?' })

before I upload the file to S3, this way I at least get the desired utf-8 encoding on the file so it doesn’t break when I later retrieve it from S3 storage but I have to escape all the chars it doesn’t understand…
Any way get around it or better just convert the file to utf-8 in a proper way.
Oh, and yeah, I don’t know ahead of time what encoding the user would upload :frowning:

Maybe you can look around or ask in the carrierwave github issues if this is a known bug?

this isn’t a Carrierwave issue, I get the same problem if I parse the file right away without uploading it to S3. It’s an encoding issue.