Skip to content
This repository has been archived by the owner on Jun 12, 2019. It is now read-only.

Scrapy is not scraping json items #9

Open
data1111 opened this issue Aug 6, 2017 · 5 comments
Open

Scrapy is not scraping json items #9

data1111 opened this issue Aug 6, 2017 · 5 comments

Comments

@data1111
Copy link

data1111 commented Aug 6, 2017

Hi there,

First of all, thanks for developing this code.

I'm having trouble with scrapy and the json items. I got it to scrape the pages I wanted and when I open the csv file it only comes with the urls, not the other items... What do you sugest?

Cheers

@Pant76
Copy link

Pant76 commented Aug 24, 2017

hi, same problem here!

@ikedaandre
Copy link

It appears that AirBnb no longer sends a JSON with the necessary information. In order to make it work now you will have to update the locator to get the information from the HTML (using XPATH or CSS selectors). Also you will have to use Splash since some of the elements are not loaded if requested by Scrapy only.

@griffadamus
Copy link

I'm having this same issue. I have no idea how to implement the update that idedaandre suggests. Any help would be awesome.

@evagian
Copy link

evagian commented Jan 24, 2018

me too... this is the output

instant_book,satisfaction_guest,rating_checkin,bed_type,person_capacity,accuracy_rating,rating_communication,room_type,hosting_id,url,amenities,rev_count,cancel_policy,rating_cleanliness,nightly_price,host_id,response_rate,price,response_time
,,,,,,,,,https://www.airbnb.com/rooms/993348?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/661755?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/2107937?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/659712?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/17428493?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/15064259?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/10983314?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/3455118?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/526402?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/2610077?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/283638?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/5283277?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/2670085?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/14349663?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/7027819?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/12783254?location=greece,,,,,,,,,
,,,,,,,,,https://www.airbnb.com/rooms/1192594?location=greece,,,,,,,,,

@ikedaandre
Copy link

What I would recommend is using Splash + Scrappy (if you google splash with scrappy there should be enough documentation on how to set it up properly). After you setup, splash+scrappy then use CSS selectors to get the data in the pages, since there's no longer a convenient .json to pull the data from.

Hopefully, this can help the setup:

https://github.com/scrapy-plugins/scrapy-splash

https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/

Cheers

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants