Skip to content

A crawler system build for crawling room/department

Notifications You must be signed in to change notification settings

otchoo/room_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

Room crawler

Room Crawler System has core function is crawl rooms information form 2 sites “muabannhadat.vn” & “nhadat24h.net” and possible to extend more sites.

The system helps people get huge collection of data about rooms which is selling or renting from some website.

Technology used in project:

  • Ruby on Rails, HTML5/CSS3, Boostrap
  • Mechanize & Nokogiri gem (lib)
  • MongoDB
  • Github for sub-version control

Getting started

Setup development environment:

Usage

Setup local environment variable

Setup account basic authentication:

#config/local_env.yml

BASIC_AUTHEN_USERNAME: username
BASIC_AUTHEN_PASSWORD: password 

Setup gem file

bundle install

Bundler will connect to https://rubygems.org (and any other sources that you declared), and find a list of all of the required gems that meet the requirements you specified

Crawling

Go to terminal, run below command to crawl rooms from external sites:

#crawl rooms from muabannhadat.vn
rake crawler:rooms:crawl_from_muabannhadat

#crawl rooms nhadat24h.net
rake crawler:rooms:crawl_from_nhadat24h

Check log/crawling_development.log for crawling log

Start server

Start the web server. In rails 5.0, by default Puma is used for web server

rails server

View

List room view: http://localhost:3000/rooms

Room detail view: http://localhost:3000/rooms/ [:room_id]

Searching

Search room with any of 5 conditions:

  • provider site
  • code
  • city or distric or address
  • area
  • price

About

A crawler system build for crawling room/department

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published