
{"id":1396,"date":"2011-05-26T14:20:41","date_gmt":"2011-05-26T17:20:41","guid":{"rendered":"http:\/\/www.talsoft.com.ar\/?page_id=1396"},"modified":"2014-11-21T10:56:04","modified_gmt":"2014-11-21T13:56:04","slug":"web-crawler-security-tool","status":"publish","type":"page","link":"https:\/\/www.talsoft.com.ar\/site\/es\/research\/tools\/web-crawler-security-tool\/","title":{"rendered":"Web Crawler Security Tool"},"content":{"rendered":"<p>The Web Crawler Security is a python based tool to automatically crawl a web site. It is a web crawler oriented to help in penetration testing tasks. The main task of this tool is to search and list all the links (pages and files) in a web site.<\/p>\n<p>The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of found files into a separated file (useful to crawl a site once, then download files and\u00a0analyse\u00a0them with FOCA), generate an output log in Common Log Format (CLF), manage basic authentication and more!<\/p>\n<p>Many of the old features has been reimplemented and the most interesting one is the capability of the crawler to search for directory indexing.<\/p>\n<p>Current stable version in 1.0.<\/p>\n<p>The main features:<\/p>\n<ul>\n<li>Crawl http and https web sites (even web sites not using common ports).<\/li>\n<li>Uses regular expressions to find &#8216;href&#8217;, &#8216;src&#8217; and &#8216;content&#8217; links.<\/li>\n<li>Identifies relative links.<\/li>\n<li>Identifies non-html files and shows them.<\/li>\n<li>Not crawl non-html files.<\/li>\n<li>Identifies directory indexing.<\/li>\n<li>Crawl directories with indexing\u00a0(not yet implemented in v1.0)<\/li>\n<li>Uses CTRL-C to stop current crawler stages and continue working. Very useful stuff&#8230;<\/li>\n<li>Identifies all kind of files by reading the content-type header field of the response.<\/li>\n<li>Exports (-e option) in a separated file a list of all files URLs found during crawling.<\/li>\n<li>Select type of files to download (-d option). Ex.: png,pdf,jpeg,gif or png,jpeg.<\/li>\n<li>Select in an interactive way which type of files to download (-i option).<\/li>\n<li>Save the\u00a0downloaded\u00a0files into a directory. It only creates the output directory if there is at least one file to download.<\/li>\n<li>Generates a output log in CLF (Common Log Format) of all the request done during crawling.<\/li>\n<li>(beta) Login with basic authentication. Feedback is welcome!<\/li>\n<li>Tries to detect if the website uses a CMS (like wordpress, joomla, etc)\u00a0(not yet implemented in v1.0)<\/li>\n<li>It looks for &#8216;.bk&#8217; or &#8216;.bak&#8217; files of php, asp, aspx, jps pages.\u00a0(not yet implemented in v1.0)<\/li>\n<li>It identifies and calculates the number of unique web pages crawled.\u00a0(not yet implemented in v1.0)<\/li>\n<li>It identifies and calculates the number of unique web pages crawled that contains parameters in URL.\u00a0(not yet implemented in v1.0)<\/li>\n<li>It works in Windows, but didn&#8217;t save results yet<\/li>\n<\/ul>\n<p>Note: This crawler can be used with Domain Analyzer Security Tool. (See\u00a0<a href=\"http:\/\/sites.google.com\/site\/mateslaboratory\/home\/proyectos-1\/domain-analizer\">Domain Analyzer<\/a>)<\/p>\n<h3>Installation<\/h3>\n<p>Just copy the python file to the \/usr\/bin directory. No need to run as root.<\/p>\n<h3>Download<\/h3>\n<p>Donwload at <a href=\"http:\/\/sourceforge.net\/projects\/webcrawler-py\/\" rel=\"nofollow\">Sourceforge<\/a>.<\/p>\n<h3>Bugs<\/h3>\n<p>Please report bugs sending us an email. You can find them in the python file.<\/p>\n<h3>Questions<\/h3>\n<p>If you have any question, please send us an email! You can find them in the python file.<\/p>\n<h3>Screenshots<\/h3>\n<div id=\"attachment_1720\" style=\"width: 366px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-1720\" class=\"size-medium wp-image-1720\" title=\"Crawler v1.0 - running\" src=\"http:\/\/www.talsoft.com.ar\/site\/wp-content\/uploads\/2011\/05\/crawler_v1.1_running-356x300.png\" alt=\"\" width=\"356\" height=\"300\" srcset=\"https:\/\/www.talsoft.com.ar\/site\/wp-content\/uploads\/2011\/05\/crawler_v1.1_running-356x300.png 356w, https:\/\/www.talsoft.com.ar\/site\/wp-content\/uploads\/2011\/05\/crawler_v1.1_running-297x250.png 297w, https:\/\/www.talsoft.com.ar\/site\/wp-content\/uploads\/2011\/05\/crawler_v1.1_running.png 734w\" sizes=\"auto, (max-width: 356px) 100vw, 356px\" \/><p id=\"caption-attachment-1720\" class=\"wp-caption-text\">Crawler v1.0 &#8211; running<\/p><\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Web Crawler Security is a python based tool to automatically crawl a web site. It is a web crawler oriented to help in penetration testing tasks. The main task of this tool is to search and list all the links (pages and files) in a web site. The crawler has been completely rewritten in [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":1392,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"class_list":["post-1396","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/pages\/1396","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/comments?post=1396"}],"version-history":[{"count":13,"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/pages\/1396\/revisions"}],"predecessor-version":[{"id":2558,"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/pages\/1396\/revisions\/2558"}],"up":[{"embeddable":true,"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/pages\/1392"}],"wp:attachment":[{"href":"https:\/\/www.talsoft.com.ar\/site\/wp-json\/wp\/v2\/media?parent=1396"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}