Distributed www crawler/scrapper with:
- CMS detection
- Plugin-based architecture
- Version and plugin detection
- Maintains historic versions
- Subdomain light bruteforcer (common subdomains e.g. blog., store.)
- Subdomain scrapper (1-deep)
- Subdirectory light bruteforcer (common directories e.g. /blog, /wp)
- Web interface for results and monitoring
- Static website to view results
- Updated with daily statistics
Prototype django application using celeryInitial architecture for plugin-based detectionInitial bakend configurationThird-party CI integrationSecond iterationMove GitHub site to JekyllContainer (docker) build- Third iteration
- Subdomain discovery
- Subdirectory discovery
- Limit crawling of subnets (e.g. 5 min wait per /24)
- Bug fixes
- Test cases
- Distributed architecture
- MongoDB cluster
- Network hardening
- Better Jekyll website
- Statistics, graphs, text search
- Elasticsearch integration
- TBD
- Start the RabbitMQ server
$ rabbitmq-server
- Start the worker process
$ DJANGO_SETTINGS_MODULE='cmspyder.settings.dev' celery -A cmspyder worker --concurrency=50 --pool=eventlet -Q cmspyder_detect_cms_queue, cmspyder_discover_domains_queue
- Start the master process
$ python manage.py runserver 0.0.0.0:8080 --settings=cmspyder.settings.dev