weibospider API for original articles or more
- this is a simple spider to "m.weibo.cn" which is much easier than "weibo.com" without login.
- for now, it can spider original articles of a user. for more functions, you can modify it according to your purpose.
- only you need to do is to find out weibo API entrance.like this "https://m.weibo.cn/api/container/getIndex?&type=uid&value=54147469&containerid=16035414746957&page=0"(you cant open it for i've modifie this address)
- this project is based on python 2.7 in windows system, Scrapy frame. but python3.x is ok i guess.
- download this project, and put it into your IDE. (e.g.Pycharm)
- firstly, you need modify spider.py. the "start urls"(who do you want to spider?), "page"(how many pages you want to spider) and "parse_item"(what item you want to spider?).
- second, modify items.py according to formerly modified "parse_item".
- finally, modify pipeline.py, to tell where you want to put your items. i write items into json file. you can input these into Mysql or Mongodb whatever.
- actually, this project is just an idea to find an easier way to spider weibo.com. unfortunately, The API entrance is always not easy to disclose.
- if you can, more items are approachable such as user information, news, hot incidents and so on.
- if you like this project, please star it, thanks.