forked from vezaynk/Sitemap-Generator-Crawler
-
Notifications
You must be signed in to change notification settings - Fork 0
/
sitemap.config.php
84 lines (63 loc) · 2.17 KB
/
sitemap.config.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
<?php
/*
Sitemap Generator by Slava Knyazev. Further acknowledgements in the README.md file.
Website: https://www.knyz.org/
I also live on GitHub: https://github.com/knyzorg
Contact me: [email protected]
*/
//Make sure to use the latest revision by downloading from github: https://github.com/knyzorg/Sitemap-Generator-Crawler
/* Usage
Usage is pretty strait forward:
- Configure the crawler by editing this file.
- Select the file to which the sitemap will be saved
- Select URL to crawl
- Configure blacklists, accepts the use of wildcards (example: http://example.com/private/* and *.jpg)
- Generate sitemap
- Either send a GET request to this script or run it from the command line (refer to README file)
- Submit to Google
- Setup a CRON Job execute this script every so often
It is recommended you don't remove the above for future reference.
*/
// Default site to crawl
$site = "https://www.knyz.org/";
// Default sitemap filename
$file = "sitemap.xml";
$permissions = 0644;
// Depth of the crawl, 0 is unlimited
$max_depth = 0;
// Show changefreq
$enable_frequency = false;
// Show priority
$enable_priority = false;
// Default values for changefreq and priority
$freq = "daily";
$priority = "1";
// Add lastmod based on server response. Unreliable and disabled by default.
$enable_modified = false;
// Disable this for misconfigured, but tolerable SSL server.
$curl_validate_certificate = false;
// The pages will be excluded from crawl and sitemap.
// Use for exluding non-html files to increase performance and save bandwidth.
$blacklist = array(
"*.jpg",
"*/secrets/*",
"https://www.knyz.org/supersecret"
);
// Enable this if your site do requires GET arguments to function
$ignore_arguments = false;
// Not yet implemented. See issue #19 for more information.
$index_img = false;
//Index PDFs
$index_pdf = true;
// Set the user agent for crawler
$crawler_user_agent = "Mozilla/5.0 (compatible ; Googlebot/2.1 ; +http://www.google.com/bot.html)";
// Header of the sitemap.xml
$xmlheader = '';
// Optionally configure debug options
$debug = array(
"add" => true,
"reject" => false,
"warn" => false
);
//Modify only if configuration version is broken
$version_config = 2;