Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix handling of invalid UTF-8 byte sequences in Ruby 1.9+.
This pull request is a proposal to fix github-linguist#830, and closes github-linguist#829. Basically: - explicitly convert text to UTF-8, replacing invalid characters prior to spitting into lines and parsing. See changes in [blob_helper.rb](https://github.com/pullreq/linguist/blob/master/lib/linguist/blob_helper.rb). - Adds the test case (from LLVM's [lit](http://llvm.org/docs/CommandGuide/lit.html) testsuite) as [samples/Python/invalid-encoding.py](https://github.com/pullreq/linguist/blob/samples/Python/invalid-encoding.py). Tested with Ruby 1.8.7p358 and 2.0.0p353 on Darwin.
- Loading branch information