Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates Webrecorder tools #69

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1726,11 +1726,11 @@ Google lens is not too user friendly for investigations. But this tool will help
| Link | Description |
| --- | --- |
| [Quick Cache and Archive search](https://quickcacheandarchivesearch.onrender.com/) | quick search website old versions in different search engines and archives (21 source) |
| [Trove](http://trove.nla.gov.au/search/category/websites) | australian web archive |
| [Trove](http://trove.nla.gov.au/search/category/websites) | Australian web archive |
| [Vandal](https://chrome.google.com/webstore/detail/vandal/knoccgahmcfhngbjhdbcodajdioedgdo/related) | extension that makes working with [http://archive.org](http://archive.org) faster, more comfortable, and more efficient. |
| [TheOldNet.com](https://theoldnet.com/) | |
| [Carbon Dating The Web](http://carbondate.cs.odu.edu/) | |
| [Arquivo.pt](https://arquivo.pt/) | |
| [Arquivo.pt](https://arquivo.pt/) | Portuguese web archive |
| [Archive.md](https://archive.md/) | |
| [Webarchive.loc.gov](http://webarchive.loc.gov/) | |
| [Swap.stanford.edu](https://swap.stanford.edu/) | |
Expand All @@ -1739,6 +1739,7 @@ Google lens is not too user friendly for investigations. But this tool will help
| [web.archive.bibalex.org](http://web.archive.bibalex.org/) | |
| [Archive.vn](https://archive.vn/) | |
| [UKWA](https://www.webarchive.org.uk/) | archive of more than half a billion saved English-language web pages (data from 2013) |
| [Ghostarchive](https://ghostarchive.org/) | Free web archive that uses ReplayWeb.page for viewing archived sites |

### [](#tools-for-working-with-web-archives)Tools for working with web archives

Expand All @@ -1760,16 +1761,19 @@ Google lens is not too user friendly for investigations. But this tool will help



### [](#warc)Tools for working with WARC (WebARChive) files
### [](#warc)Tools for working with WARC (WebARChive) and WACZ (Web Archive Collection Zipped) files

| Link | Description |
| --- | --- |
| [Warcat](https://github.com/chfoo/warcat) | My favorite (because it's the easiest) tool for working with Warc files. It allows you to see the list of files in the archive (command "list") and unpack it (command "extract"). |
| [Replayweb](https://github.com/webrecorder/replayweb.page) | If the warc file is small, you can view its contents with this extreme simple online tool. Also it's possible to deploy ReplayWeb on your own server |
| [Metawarc](https://github.com/datacoon/metawarc) | Allows you to quickly analyze the structure of the warc file and collect metadata from all the files in the archive |
| [Webrecorder tools](https://webrecorder.net/tools) | Archiving various interesting sites is a noble and useful activity for society. To make it easier for posterity to analyze your web archives, save them in Warc format with an online tool|
| [GRAB SITE](https://github.com/ArchiveTeam/grab-site) | Af you need to make a Warc archive out of a huge site with a lot of different content, then it is better to use this #python script with dozens of different settings that will optimize the process as much as possible.|
| [har2warc](https://github.com/webrecorder/har2warc) | Convert HTTP Archive (HAR) -> Web Archive (WARC) format|
| [Warcat](https://github.com/chfoo/warcat) | My favorite (because it's the easiest) tool for working with WARC files. It allows you to see the list of files in the archive (command "list") and unpack it (command "extract"). |
| [Browsertrix](https://webrecorder.net/browsertrix/) | Browser-based crawling service that saves websites as WACZ files (containing WARCs). Hosed as SaaS by Webrecorder but can alternatively be self-deployed on your own infrastructure |
| [ArchiveWeb.page](https://webrecorder.net/archivewebpage/) | Create WARC and WACZ files interactively as you navigate sites in your web browser. Good for saving high-fidelity |
| [ReplayWeb.page](https://webrecorder.net/replaywebpage/) | If the WARC file is small, you can view its contents with this extremely simple online tool / desktop app. WACZ files of any size will load much faster due to their built-in index. Also it's possible to deploy ReplayWeb.page on your own server |
| [Metawarc](https://github.com/datacoon/metawarc) | Allows you to quickly analyze the structure of the WARC file and collect metadata from all the files in the archive |
| [warcit](https://github.com/webrecorder/warcit) | Command line utility to convert a local directory containing website files into a WARC file |
| [unwarcit](https://github.com/emmadickson/unwarcit) | Command line utility to convert a WARC or WACZ file to a local directory containing website files |
| [GRAB SITE](https://github.com/ArchiveTeam/grab-site) | Af you need to make a WARC archive out of a huge site with a lot of different content, then it is better to use this #python script with dozens of different settings that will optimize the process as much as possible.|
| [har2warc](https://github.com/webrecorder/har2warc) | Convert HTTP Archive (HAR) → Web Archive (WARC) format|


[](#archives-of-documentsnewspapers)Archives of documents/newspapers
Expand Down