Skip to content

Commit

Permalink
Merge branch 'help'
Browse files Browse the repository at this point in the history
  • Loading branch information
osvik committed Oct 21, 2017
2 parents f2a15c2 + 4965ef5 commit 07f50c3
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 1 deletion.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This **command line** script complements other command line tools like ack, grep

## Download and install

### Install the last version
### Install the lastest version

Go to the [releases page](https://github.com/greenpeace/check-my-pages/releases) and download the last version for your operating system: Windows, Mac and Linux 64bit.

Expand All @@ -22,6 +22,20 @@ go get github.com/greenpeace/check-my-pages
go install github.com/greenpeace/check-my-pages
```

### Get help

If you downloaded and installed the latest version from the [releases page](https://github.com/greenpeace/check-my-pages/releases) do:

```
./check-my-pages --help
```

If you installed from source do:

```
check-my-pages --help
```

## File with list of urls

The urls file, by default `urls.csv` must have all the urls you want to check. You can use a text file with 1 url per line or a csv file with the urls on the first column and without headers.
Expand Down
6 changes: 6 additions & 0 deletions check-my-pages.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (

func main() {

isHelp := flag.Bool("help", false, "Help")
urlsFileName := flag.String("urls", "urls.csv", "Name of the csv file with the urs in the first column")
isHTTP := flag.Bool("http", false, "Http response codes")
isRedirects := flag.Bool("redirects", false, "Redirects response codes")
Expand All @@ -36,6 +37,11 @@ func main() {
c := colly.NewCollector()
// c.AllowedDomains = []string{"localhost", "greenpeace.es", "archivo.greenpeace.es"}

if *isHelp == true {
help()
os.Exit(0)
}

if *isHTTP == true {

httpResponses, httpErr := os.OpenFile("httpResponses.csv", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0600)
Expand Down
70 changes: 70 additions & 0 deletions help.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package main

import (
"fmt"
)

func help() {
fmt.Println(`
-------------------
CHECK-MY-PAGES HELP
-------------------
check-my-pages is a scrapping script. It checks each url in a list and creates report files about what was tested. Each file reports about a specific issue and includes the scanned url together with the result.
EXAMPLES:
./check-my-pages -urls=urls.csv -http -analytics -canonical -redirects -linkpattern -cssjspattern -mediapattern
./check-my-pages -urls=urls.csv -fileinfo -miliseconds=100
CHECKS:
-http : Gets the http response code. If it's 200 it should be OK.
-analytics : Gets the first Google Analytics account.
-canonical : Gets the canonical URL for the url.
-redirects : Gets info about redirects and final URLs.
-linkpattern : Gets links that match the regular expression pattern.
-cssjspattern : Gets CSS and JS URLs that match the regular expression pattern.
-mediapattern : Gets urls from images, videos, audios, iframes and objects that match the regular expression pattern
-fileinfo : Speciall check more suitable for non-html pages (for example images). It needs to be used alone as the example above, without other checks.
OPTIONS:
-urls=urls.csv : Sets the file with the urls to scan. Normally a text file with one URL per line or a csv without headers with the urls on the first column.
-pattern='https?://(\w|-)+.greenpeace.org/espana/.+' : Changes the url search pattern to the regular expression. To be used with *pattern checks.
-miliseconds=100 : Sets a delay of 100 miliseconds between requests.
FILES WITH THE REPORTS:
- httpResponses.csv : Stores the http response codes for the URL. 200 means everything is OK.
- analytics.csv : Reports google analytics tracking ID.
- canonicals.csv : Reports the canonical url for every url
- redirects.csv : Reports the requested URL and the final URL. This will be useful to test the redirects in the main site.
- linkpattern.csv : Reports on links that include a regular expression pattern. Useful to track links to specific dead sites. The default pattern can be set by the -pattern option.
- cssjspattern.csv : Reports css and js urls that include a regular expression pattern. To detect dead css and js urls in large sites. The pattern can also be defined with the option -pattern (described bellow)
- mediapattern.csv : Reports media links. Images, videos, audios, iframes and objects. Also use -pattern to define the urls pattern.
`)
}

0 comments on commit 07f50c3

Please sign in to comment.