Conversation
|
That's concerning. Did this happen to you? Is there an example somewhere? This is complicated a bit because people can supply their own model files. We'd need to not break compatibility. |
|
Yep, I can't figure out what's behind it though. It occasionally happens on some CentOS 8 hosts where everything else looks fine. It boils down to this call sometimes throwing a value error and sometimes not: That's a good call with compatibility, this change should really come with gzip support too. |
| # Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability). | ||
| with gzip.open(word_file) as f: | ||
| words = f.read().decode().split() | ||
| if check_magic(word_file, FileTypeMagicBytesRe.BZIP_FILE): |
There was a problem hiding this comment.
since we have the filename, why not just check the extension?
There was a problem hiding this comment.
Just in case there's any discrepancy between the file extension and its type. Although that really depends on what's the expected behavior if it's fed bad inputs.
Should it:
- Accept the extension and just throw out errors if it fails to process?
- Ignore the extension and just process it best it can?
- Ignore the extension, but pop out a warning if the extension doesn't match?
Gzip occasionally has issues with the underlying zlib library. It's a bit difficult to reproduce, but errors look like:
Swapping out the gzip file with bzip should be a harmless option to prevent those
bzip generation steps:
tests: