Fixes unicode support on gtfs by fserb · Pull Request #15 · bmander/gtfs

fserb · 2010-12-07T01:24:36Z

Hey,
I was trying to use GTFS to parse some utf-8 data, and it was failing with weird UnicodeEncodeError.
I traced this down to two factors:

unmapped_entities.py was converting string attributes to str() (thus trying to convert all unicode to 'ascii').
csv.reader doesn't handle unicode very well.

My first commit changes the test data to have one entry on Stops that has utf-8 characters, hence breaking the tests.

My second commit fixes both issues and makes the tests pass again:
to fix 1, I've made a special case for str on umapped_entities to convert to unicode() instead of str().
to fix 2, I've created a unicode_csv_reader function that wraps around csv.reader/codes.iterdecode. The steps here are a bit annoying: iterdecode() from utf-8, encode it back, so csv.reader is fine with it, get the output from csv.reader and decode it back to utf-8, so we have the final utf-8 output.

thanks for attention,
[]s
F.

Lawouach · 2013-12-27T17:22:33Z

Any chance this is fixed at some point?

fserb added 2 commits December 7, 2010 02:10

New test data with unicode strings

1572df1

fixed unicode support

e039074

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes unicode support on gtfs#15

Fixes unicode support on gtfs#15
fserb wants to merge 2 commits intobmander:masterfrom
fserb:master

fserb commented Dec 7, 2010

Uh oh!

Lawouach commented Dec 27, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fserb commented Dec 7, 2010

Uh oh!

Lawouach commented Dec 27, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants