Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
taspinar committed May 6, 2018
2 parents fb531dd + 9a3b3bf commit b6dc916
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,21 +171,20 @@ contents of the output file will look like:
---------------------------

In order to correctly handle all possible characters in the tweets
(think of chinese or arabic characters), the output is saved as utf-8
(think of Japanese or Arabic characters), the output is saved as utf-8
encoded bytes. That is why you could see text like
"":raw-latex:`\u3`0b1:raw-latex:`\u3`0f3:raw-latex:`\u3`055:raw-latex:`\u3`07e:raw-latex:`\u3`0fe
..." in the output file.
"\u30b1 \u30f3 \u3055 \u307e \u30fe ..." in the output file.

What you should do is open the file with the proper encoding:

.. figure:: https://user-images.githubusercontent.com/4409108/30702318-f05bc196-9eec-11e7-8234-a07aabec294f.PNG
:alt: Example of output with chinese characters

Example of output with chinese characters
Example of output with Japanese characters

TO DO
=====

- Twitterscraper can not retrieve retweets.
- Add caching potentially? Would be nice to be able to resume scraping
if something goes wrong and have half of the data of a request cached
or so.
Expand Down

0 comments on commit b6dc916

Please sign in to comment.