运行 Katana

有关 Katana 所有可用的标志和选项，请务必查看使用说明页面。在本页中，我们将分享使用特定标志和目标运行 Katana 的示例，以及每个示例预期的输出结果。

如果您有疑问，请通过帮助联系我们。

Katana 需要一个 URL 或端点来爬取，并且接受单个或多个输入。可以使用 -u 选项提供 URL，多个值可以通过逗号分隔输入，同样，文件输入可以使用 -list 选项，此外还支持管道输入（stdin）。

Katana 的输入方式

可以使用 -u 选项提供 URL，多个值可以通过逗号分隔输入，同样，文件输入可以使用 -list 选项，此外还支持管道输入（stdin）。

URL Input

katana -u https://tesla.com

多个 URL 输入（逗号分隔）

katana -u https://tesla.com,https://google.com

文件列表输入

$ cat url_list.txt

https://tesla.com
https://google.com

katana -list url_list.txt

STDIN (piped) 输入

echo https://tesla.com | katana

cat domains | httpx | katana

运行 Katana 示例 -

katana -u https://youtube.com

   __        __                
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ \/ _  /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/ v0.0.1                     

      projectdiscovery.io

[WRN] Use with caution. You are responsible for your actions.
[WRN] Developers assume no liability and are not responsible for any misuse or damage.
https://www.youtube.com/
https://www.youtube.com/about/
https://www.youtube.com/about/press/
https://www.youtube.com/about/copyright/
https://www.youtube.com/t/contact_us/
https://www.youtube.com/creators/
https://www.youtube.com/ads/
https://www.youtube.com/t/terms
https://www.youtube.com/t/privacy
https://www.youtube.com/about/policies/
https://www.youtube.com/howyoutubeworks?utm_campaign=ytgen&utm_source=ythp&utm_medium=LeftNav&utm_content=txt&u=https%3A%2F%2Fwww.youtube.com%2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen
https://www.youtube.com/new
https://m.youtube.com/
https://www.youtube.com/s/desktop/4965577f/jsbin/desktop_polymer.vflset/desktop_polymer.js
https://www.youtube.com/s/desktop/4965577f/cssbin/www-main-desktop-home-page-skeleton.css
https://www.youtube.com/s/desktop/4965577f/cssbin/www-onepick.css
https://www.youtube.com/s/_/ytmainappweb/_/ss/k=ytmainappweb.kevlar_base.0Zo5FUcPkCg.L.B1.O/am=gAE/d=0/rs=AGKMywG5nh5Qp-BGPbOaI1evhF5BVGRZGA
https://www.youtube.com/opensearch?locale=en_GB
https://www.youtube.com/manifest.webmanifest
https://www.youtube.com/s/desktop/4965577f/cssbin/www-main-desktop-watch-page-skeleton.css
https://www.youtube.com/s/desktop/4965577f/jsbin/web-animations-next-lite.min.vflset/web-animations-next-lite.min.js
https://www.youtube.com/s/desktop/4965577f/jsbin/custom-elements-es5-adapter.vflset/custom-elements-es5-adapter.js
https://www.youtube.com/s/desktop/4965577f/jsbin/webcomponents-sd.vflset/webcomponents-sd.js
https://www.youtube.com/s/desktop/4965577f/jsbin/intersection-observer.min.vflset/intersection-observer.min.js
https://www.youtube.com/s/desktop/4965577f/jsbin/scheduler.vflset/scheduler.js
https://www.youtube.com/s/desktop/4965577f/jsbin/www-i18n-constants-en_GB.vflset/www-i18n-constants.js
https://www.youtube.com/s/desktop/4965577f/jsbin/www-tampering.vflset/www-tampering.js
https://www.youtube.com/s/desktop/4965577f/jsbin/spf.vflset/spf.js
https://www.youtube.com/s/desktop/4965577f/jsbin/network.vflset/network.js
https://www.youtube.com/howyoutubeworks/
https://www.youtube.com/trends/
https://www.youtube.com/jobs/
https://www.youtube.com/kids/

爬取模式

标准模式

标准爬取模式在底层使用标准的 Go HTTP 库来处理 HTTP 请求/响应。由于没有浏览器开销,这种模式速度更快。但是它只会按原样分析 HTTP 响应体,不会执行任何 JavaScript 或 DOM 渲染,这可能会遗漏一些在复杂 Web 应用中依赖于浏览器特定事件的后期 DOM 渲染端点或异步端点调用。

无头模式

无头模式通过在浏览器上下文中直接处理 HTTP 请求/响应来实现。这提供了两个优势:

HTTP 指纹(TLS 和用户代理)可以将客户端完全标识为合法浏览器
覆盖范围更广,因为端点发现不仅分析标准原始响应(如前一种模式),还会分析启用了 JavaScript 的浏览器渲染响应。

无头爬取是可选的,可以使用 -headless 选项启用。以下是其他无头模式的命令行选项 -

katana -h headless

Flags:
无头模式:
   -hl, -headless                    启用无头混合爬取模式(实验性)
   -sc, -system-chrome               使用本地安装的Chrome浏览器代替katana内置浏览器
   -sb, -show-browser                在无头模式下显示浏览器界面
   -ho, -headless-options string[]   使用额外选项启动无头Chrome浏览器
   -nos, -no-sandbox                 以no-sandbox模式启动无头Chrome浏览器
   -cdd, -chrome-data-dir string     Chrome浏览器数据存储路径
   -scp, -system-chrome-path string  指定用于无头爬取的Chrome浏览器路径
   -noi, -no-incognito               不使用隐身模式启动无头Chrome浏览器
   -cwu, -chrome-ws-url string       使用在其他位置启动的带调试器的Chrome浏览器实例(指定WebSocket URL)
   -xhr, -xhr-extraction             提取XHR请求

`-no-sandbox`

使用 no-sandbox 选项运行无头 Chrome 浏览器,这在以 root 用户运行时很有用。

katana -u https://tesla.com -headless -no-sandbox

`-no-incognito`

在无头模式下运行Chrome浏览器时不使用隐身模式,这在使用本地浏览器时很有用。

katana -u https://tesla.com -headless -no-incognito

`-headless-options`

在无头模式下爬取时,可以使用 -headless-options 指定额外的 Chrome 选项,例如:

katana -u https://tesla.com -headless -system-chrome -headless-options --disable-gpu,proxy-server=http://127.0.0.1:8080

范围控制

如果不限定范围,爬取可能会无休止地进行,因此 katana 提供了多种方式来定义爬取范围。

`-field-scope`

最方便的范围定义选项,使用预定义的字段名称,默认使用 rdn 作为字段范围。

rdn - 爬取根域名及其所有子域名范围 (例如 *example.com) (默认)
fqdn - 爬取指定的(子)域名范围 (例如 www.example.com 或 api.example.com)
dn - 爬取域名关键词范围 (例如 example)

katana -u https://tesla.com -fs dn

`-crawl-scope`

对于高级范围控制,可以使用支持正则表达式的 -cs 选项。

katana -u https://tesla.com -cs login

对于多个范围规则,可以传入包含多行字符串/正则表达式的文件。

$ cat in_scope.txt

login/
admin/
app/
wordpress/

katana -u https://tesla.com -cs in_scope.txt

`-crawl-out-scope`

要定义不爬取的内容,可以使用 -cos 选项,该选项也支持正则表达式输入。

katana -u https://tesla.com -cos logout

对于多个排除范围规则,可以传入包含多行字符串/正则表达式的文件。

$ cat out_of_scope.txt

/logout
/log_out

katana -u https://tesla.com -cos out_of_scope.txt

`-no-scope`

Katana 默认的作用域是 *.domain，要禁用此默认作用域可以使用 -ns 选项，这样也可以爬取整个互联网。

katana -u https://tesla.com -ns

`-display-out-scope`

默认情况下,当使用范围选项时,它也会应用于显示为输出的链接,因此外部 URL 默认被排除。要覆盖此行为,可以使用 -do 选项来显示目标范围内 URL/端点中存在的所有外部 URL。

katana -u https://tesla.com -do

以下是所有用于范围控制的 CLI 选项 -

katana -h scope

Flags:
SCOPE:
   -cs, -crawl-scope string[]       要爬取的URL正则表达式范围
   -cos, -crawl-out-scope string[]  要排除的URL正则表达式范围
   -fs, -field-scope string         预定义的范围字段(dn,rdn,fqdn)(默认为"rdn")
   -ns, -no-scope                   禁用基于主机的默认范围
   -do, -display-out-scope          显示范围内爬取的外部端点

`-depth`

用于定义爬取 URL 的 depth(深度)的选项,深度越大,爬取的端点数量越多,爬取时间也越长。

katana -u https://tesla.com -d 5

`-js-crawl`

用于启用 JavaScript 文件解析和爬取在 JavaScript 文件中发现的端点的选项,默认禁用。

katana -u https://tesla.com -jc

`-crawl-duration`

用于预定义爬取持续时间的选项,默认禁用。

katana -u https://tesla.com -ct 2

`-known-files`

启用爬取 robots.txt 和 sitemap.xml 文件的选项,默认禁用。

katana -u https://tesla.com -kf robotstxt,sitemapxml

`-automatic-form-fill`

启用自动填充表单字段的选项,支持已知/未知字段。已知字段的值可以通过更新配置文件 $HOME/.config/katana/form-config.yaml 来自定义。自动填充表单是一个实验性功能。

katana -u https://tesla.com -aff

认证爬取

认证爬取需要在 HTTP 请求中包含自定义 headers 或 cookies 来访问受保护的资源。这些 headers 提供认证或授权信息,使你能够爬取需要认证的内容/端点。你可以直接在命令行中指定 headers,也可以通过文件的方式提供给 katana 来执行认证爬取。

注意: 用户需要手动执行认证并将会话 cookie/header 导出到文件中以供 katana 使用。

`-headers`

用于向请求添加自定义 header 或 cookie 的选项。

HTTP 规范中的 headers 语法

以下是向请求添加 cookie 的示例:

katana -u https://tesla.com -H 'Cookie: usrsess=AmljNrESo'

也可以通过文件的方式提供 headers 或 cookies。例如:

$ cat cookie.txt

Cookie: PHPSESSIONID=XXXXXXXXX
X-API-KEY: XXXXX
TOKEN=XX

katana -u https://tesla.com -H cookie.txt

在需要时还有更多配置选项,以下是所有与配置相关的命令行选项 -

katana -h config

Flags:
配置选项:
   -r, -resolvers string[]       自定义解析器列表(文件或逗号分隔)
   -d, -depth int                最大爬取深度(默认为3)
   -jc, -js-crawl                启用JavaScript文件中的端点解析/爬取
   -ct, -crawl-duration int      爬取目标的最大持续时间
   -kf, -known-files string      启用已知文件的爬取(all,robotstxt,sitemapxml)
   -mrs, -max-response-size int  要读取的最大响应大小(默认为9223372036854775807)
   -timeout int                  请求等待时间(秒)(默认为10)
   -aff, -automatic-form-fill    启用自动表单填充(实验性功能)
   -fx, -form-extraction         启用表单、输入框、文本区域和选择元素的提取
   -retry int                    重试请求的次数(默认为1)
   -proxy string                 使用的http/socks5代理
   -H, -headers string[]         请求中包含的自定义header/cookie
   -config string                katana配置文件的路径
   -fc, -form-config string      自定义表单配置文件的路径
   -flc, -field-config string    自定义字段配置文件的路径
   -s, -strategy string          访问策略(深度优先,广度优先)(默认为"depth-first")

连接到活动浏览器会话

Katana 还可以连接到用户已登录和认证的活动浏览器会话,并用于爬取。唯一的要求是启动浏览器时需要启用远程调试。以下是启用远程调试启动 Chrome 浏览器并与 katana 一起使用的示例 - 步骤 1) 首先找到 Chrome 浏览器可执行文件的路径

Operating System	Chromium Executable Location	Google Chrome Executable Location
Windows (64-bit)	`C:\Program Files (x86)\Google\Chromium\Application\chrome.exe`	`C:\Program Files (x86)\Google\Chrome\Application\chrome.exe`
Windows (32-bit)	`C:\Program Files\Google\Chromium\Application\chrome.exe`	`C:\Program Files\Google\Chrome\Application\chrome.exe`
macOS	`/Applications/Chromium.app/Contents/MacOS/Chromium`	`/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`
Linux	`/usr/bin/chromium`	`/usr/bin/google-chrome`

步骤 2) 启用远程调试启动 Chrome 浏览器,它将返回 websocket url。例如,在 MacOS 上,您可以使用以下命令启用远程调试启动 Chrome -

$ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

DevTools listening on ws://127.0.0.1:9222/devtools/browser/c5316c9c-19d6-42dc-847a-41d1aeebf7d6

现在登录到您想要爬取的网站并保持浏览器打开状态。

步骤 3) 现在使用 websocket url 与 katana 连接到活动的浏览器会话并爬取网站

katana -headless -u https://tesla.com -cwu ws://127.0.0.1:9222/devtools/browser/c5316c9c-19d6-42dc-847a-41d1aeebf7d6 -no-incognito

注意: 您可以使用 -cdd 选项指定自定义的 Chrome 数据目录来存储浏览器数据和 cookie,但如果 cookie 设置为仅 Session 或在一定时间后过期,则不会保存会话数据。

过滤器

`-field`

Katana 内置了一些字段,可用于过滤输出以获取所需信息,可以使用 -f 选项来指定任何可用的字段。

   -f, -field string  在输出中显示的字段 (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)

以下是每个字段及其使用时的预期输出示例 -

FIELD	DESCRIPTION	EXAMPLE
`url`	URL Endpoint	`https://admin.projectdiscovery.io/admin/login?user=admin&password=admin`
`qurl`	URL including query param	`https://admin.projectdiscovery.io/admin/login.php?user=admin&password=admin`
`qpath`	Path including query param	`/login?user=admin&password=admin`
`path`	URL Path	`https://admin.projectdiscovery.io/admin/login`
`fqdn`	Fully Qualified Domain name	`admin.projectdiscovery.io`
`rdn`	Root Domain name	`projectdiscovery.io`
`rurl`	Root URL	`https://admin.projectdiscovery.io`
`ufile`	URL with File	`https://admin.projectdiscovery.io/login.js`
`file`	Filename in URL	`login.php`
`key`	Parameter keys in URL	`user,password`
`value`	Parameter values in URL	`admin,admin`
`kv`	Keys=Values in URL	`user=admin&password=admin`
`dir`	URL Directory name	`/admin/`
`udir`	URL with Directory	`https://admin.projectdiscovery.io/admin/`

以下是使用字段选项仅显示所有包含查询参数的 URL 的示例 -

katana -u https://tesla.com -f qurl -silent

https://shop.tesla.com/en_au?redirect=no
https://shop.tesla.com/en_nz?redirect=no
https://shop.tesla.com/product/men_s-raven-lightweight-zip-up-bomber-jacket?sku=1740250-00-A
https://shop.tesla.com/product/tesla-shop-gift-card?sku=1767247-00-A
https://shop.tesla.com/product/men_s-chill-crew-neck-sweatshirt?sku=1740176-00-A
https://www.tesla.com/about?redirect=no
https://www.tesla.com/about/legal?redirect=no
https://www.tesla.com/findus/list?redirect=no

自定义字段

您可以使用正则表达式规则创建自定义字段来从页面响应中提取和存储特定信息。这些自定义字段使用 YAML 配置文件定义,默认从 $HOME/.config/katana/field-config.yaml 位置加载。或者,您可以使用 -flc 选项从其他位置加载自定义字段配置文件。以下是自定义字段的示例。

- name: email
  type: regex
  regex:
  - '([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)'
  - '([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)'

- name: phone
  type: regex
  regex:
  - '\d{3}-\d{8}|\d{4}-\d{7}'

定义自定义字段时,支持以下属性:

name (必需)

name 属性的值用作 -field 命令行选项的值。

type (必需)

自定义属性的类型,目前支持的选项 - regex

part (可选)

从响应中提取信息的部分。默认值是 response,包括响应头和响应体。其他可能的值是 header 和 body。

group (可选)

可以使用此属性选择正则表达式中的特定匹配组,例如: group: 1

使用自定义字段运行 katana:

katana -u https://tesla.com -f email,phone

`-store-field`

为了补充在运行时过滤输出的 field 选项,还有一个 -sf, -store-fields 选项,它的工作方式与 field 选项完全相同,只是它不进行过滤,而是将所有信息按目标 URL 排序存储在 katana_field 目录下的磁盘上。

katana -u https://tesla.com -sf key,fqdn,qurl -silent

$ ls katana_field/

https_www.tesla.com_fqdn.txt
https_www.tesla.com_key.txt
https_www.tesla.com_qurl.txt

-store-field 选项可用于收集信息以构建针对性的词表，其用途包括但不限于：

识别最常用的参数
发现经常使用的路径
查找常用文件
识别相关或未知的子域名

Katana 过滤器

`-extension-match`

可以使用 -em 选项轻松匹配特定扩展名,该选项可以确保只显示包含指定扩展名的输出。

katana -u https://tesla.com -silent -em js,jsp,json

`-extension-filter`

可以使用 -ef 选项轻松过滤特定扩展名,该选项可以确保移除所有包含指定扩展名的 URL。

katana -u https://tesla.com -silent -ef css,txt,md

`-match-regex`

-match-regex 或 -mr 标志允许使用正则表达式过滤输出的 URL。使用此标志时，只有与指定正则表达式匹配的 URL 才会在输出中显示。

katana -u https://tesla.com -mr 'https://shop\.tesla\.com/*' -silent

`-filter-regex`

-filter-regex 或 -fr 标志允许使用正则表达式过滤输出的 URL。使用此标志时,将会跳过与指定正则表达式匹配的 URL。

katana -u https://tesla.com -fr 'https://www\.tesla\.com/*' -silent

高级过滤

Katana 支持基于 DSL 表达式的高级匹配和过滤功能:

匹配状态码为 200 的端点:

katana -u https://www.hackerone.com -mdc 'status_code == 200'

匹配包含 “default” 且状态码不是 403 的端点:

katana -u https://www.hackerone.com -mdc 'contains(endpoint, "default") && status_code != 403'

匹配使用 PHP 技术的端点:

katana -u https://www.hackerone.com -mdc 'contains(to_lower(technologies), "php")'

过滤掉运行在 Cloudflare 上的端点:

katana -u https://www.hackerone.com -fdc 'contains(to_lower(technologies), "cloudflare")'

DSL 函数可以应用于 jsonl 输出中的任意字段。有关可用 DSL 函数的更多信息,请访问 dsl 项目。以下是其他过滤选项 -

katana -h filter

Flags:
FILTER:
   -mr, -match-regex string[]       使用正则表达式匹配输出URL(支持命令行和文件输入)
   -fr, -filter-regex string[]      使用正则表达式过滤输出URL(支持命令行和文件输入)
   -f, -field string                指定输出显示的字段(url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
   -sf, -store-field string         指定每个主机输出存储的字段(url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
   -em, -extension-match string[]   匹配指定扩展名的输出(例如: -em php,html,js)
   -ef, -extension-filter string[]  过滤指定扩展名的输出(例如: -ef png,css)
   -mdc, -match-condition string    使用DSL条件表达式匹配响应
   -fdc, -filter-condition string   使用DSL条件表达式过滤响应

速率限制

如果不遵守目标网站的限制进行爬取,很容易被封禁/拉黑。Katana 提供了多个选项来调整爬取速度的快慢。

`-delay`

用于在每个新请求之间引入延迟时间(以秒为单位)的选项,默认情况下禁用。

katana -u https://tesla.com -delay 20

`-concurrency`

用于控制同时抓取每个目标的URL数量的选项。

katana -u https://tesla.com -c 20

`-parallelism`

用于定义从输入列表中同时处理目标的数量的选项。

katana -u https://tesla.com -p 20

`-rate-limit`

用于定义每秒最大请求数的选项。

katana -u https://tesla.com -rl 100

`-rate-limit-minute`

用于定义每分钟最大请求数的选项。

katana -u https://tesla.com -rlm 500

以下是所有用于速率限制控制的长/短 CLI 选项 -

katana -h rate-limit

Flags:
RATE-LIMIT:
   -c, -concurrency int          并发抓取器数量 (默认值 10)
   -p, -parallelism int          并发处理输入的数量 (默认值 10)
   -rd, -delay int               每个请求之间的延迟时间(秒)
   -rl, -rate-limit int          每秒最大请求数 (默认值 150)
   -rlm, -rate-limit-minute int  每分钟最大请求数

输出

Katana 支持纯文本格式和 JSON 格式的文件输出。JSON 格式包含了额外的信息,如 source、tag 和 attribute 名称,以关联发现的端点。

`-output`

默认情况下,katana 以纯文本格式输出爬取的端点。可以使用 -output 选项将结果写入文件。

katana -u https://example.com -no-scope -output example_endpoints.txt

`-jsonl`

katana -u https://example.com -jsonl | jq .

{
  "timestamp": "2023-03-20T16:23:58.027559+05:30",
  "request": {
    "method": "GET",
    "endpoint": "https://example.com",
    "raw": "GET / HTTP/1.1\r\nHost: example.com\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36\r\nAccept-Encoding: gzip\r\n\r\n"
  },
  "response": {
    "status_code": 200,
    "headers": {
      "accept_ranges": "bytes",
      "expires": "Mon, 27 Mar 2023 10:53:58 GMT",
      "last_modified": "Thu, 17 Oct 2019 07:18:26 GMT",
      "content_type": "text/html; charset=UTF-8",
      "server": "ECS (dcb/7EA3)",
      "vary": "Accept-Encoding",
      "etag": "\"3147526947\"",
      "cache_control": "max-age=604800",
      "x_cache": "HIT",
      "date": "Mon, 20 Mar 2023 10:53:58 GMT",
      "age": "331239"
    },
    "body": "<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n",
    "technologies": [
      "Azure",
      "Amazon ECS",
      "Amazon Web Services",
      "Docker",
      "Azure CDN"
    ],
    "raw": "HTTP/1.1 200 OK\r\nContent-Length: 1256\r\nAccept-Ranges: bytes\r\nAge: 331239\r\nCache-Control: max-age=604800\r\nContent-Type: text/html; charset=UTF-8\r\nDate: Mon, 20 Mar 2023 10:53:58 GMT\r\nEtag: \"3147526947\"\r\nExpires: Mon, 27 Mar 2023 10:53:58 GMT\r\nLast-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\nServer: ECS (dcb/7EA3)\r\nVary: Accept-Encoding\r\nX-Cache: HIT\r\n\r\n<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n"
  }
}

`-store-response`

-store-response 选项允许将所有爬取的端点请求和响应写入文本文件。使用此选项时，包含请求和响应的文本文件将被写入 katana_response 目录。如果您想指定自定义目录，可以使用 -store-response-dir 选项。

katana -u https://example.com -no-scope -store-response

$ cat katana_response/index.txt

katana_response/example.com/327c3fda87ce286848a574982ddd0b7c7487f816.txt https://example.com (200 OK)
katana_response/www.iana.org/bfc096e6dd93b993ca8918bf4c08fdc707a70723.txt http://www.iana.org/domains/reserved (200 OK)

注意： -store-response 选项在 -headless 模式下不受支持。 以下是与输出相关的其他命令行选项 -

katana -h output

输出:
   -o, -output string                将输出写入文件
   -sr, -store-response              存储 HTTP 请求/响应
   -srd, -store-response-dir string  将 HTTP 请求/响应存储到自定义目录
   -j, -json                         以 JSONL(ines) 格式输出
   -nc, -no-color                    禁用输出内容着色（ANSI 转义码）
   -silent                           仅显示输出内容
   -v, -verbose                      显示详细输出
   -version                          显示项目版本

Katana 作为库使用

katana 可以作为库使用，方法是创建 Option 结构体的实例，并填充与通过命令行界面指定的相同选项。使用这些选项，您可以创建 crawlerOptions 以及标准或混合模式的 crawler。应调用 crawler.Crawl 方法来爬取输入内容。

package main

import (
	"math"

	"github.com/projectdiscovery/gologger"
	"github.com/projectdiscovery/katana/pkg/engine/standard"
	"github.com/projectdiscovery/katana/pkg/output"
	"github.com/projectdiscovery/katana/pkg/types"
)

func main() {
	options := &types.Options{
		MaxDepth:     3,             // Maximum depth to crawl
		FieldScope:   "rdn",         // Crawling Scope Field
		BodyReadSize: math.MaxInt,   // Maximum response size to read
		Timeout:      10,            // Timeout is the time to wait for request in seconds
		Concurrency:  10,            // Concurrency is the number of concurrent crawling goroutines
		Parallelism:  10,            // Parallelism is the number of urls processing goroutines
		Delay:        0,             // Delay is the delay between each crawl requests in seconds
		RateLimit:    150,           // Maximum requests to send per second
		Strategy:     "depth-first", // Visit strategy (depth-first, breadth-first)
		OnResult: func(result output.Result) { // Callback function to execute for result
			gologger.Info().Msg(result.Request.URL)
		},
	}
	crawlerOptions, err := types.NewCrawlerOptions(options)
	if err != nil {
		gologger.Fatal().Msg(err.Error())
	}
	defer crawlerOptions.Close()
	crawler, err := standard.New(crawlerOptions)
	if err != nil {
		gologger.Fatal().Msg(err.Error())
	}
	defer crawler.Close()
	var input = "https://www.hackerone.com"
	err = crawler.Crawl(input)
	if err != nil {
		gologger.Warning().Msgf("Could not crawl %s: %s", input, err.Error())
	}
}

工具

​运行 Katana

​Katana 的输入方式

​URL Input

​多个 URL 输入（逗号分隔）

​文件列表输入

​STDIN (piped) 输入

​爬取模式

​标准模式

​无头模式

​-no-sandbox

​-no-incognito

​-headless-options

​范围控制

​-field-scope

​-crawl-scope

​-crawl-out-scope

​-no-scope

​-display-out-scope

​-depth

​-js-crawl

​-crawl-duration

​-known-files

​-automatic-form-fill

​认证爬取

​-headers

​连接到活动浏览器会话

​过滤器

​-field

​自定义字段

​使用自定义字段运行 katana:

​-store-field

​Katana 过滤器

​-extension-match

​-extension-filter

​-match-regex

​-filter-regex

​高级过滤

​速率限制

​-delay

​-concurrency

​-parallelism

​-rate-limit

​-rate-limit-minute

​输出

​-output

​-jsonl

​-store-response

​Katana 作为库使用

运行 Katana

Katana 的输入方式

URL Input

多个 URL 输入（逗号分隔）

文件列表输入

STDIN (piped) 输入

爬取模式

标准模式

无头模式

`-no-sandbox`

`-no-incognito`

`-headless-options`

范围控制

`-field-scope`

`-crawl-scope`

`-crawl-out-scope`

`-no-scope`

`-display-out-scope`

`-depth`

`-js-crawl`

`-crawl-duration`

`-known-files`

`-automatic-form-fill`

认证爬取

`-headers`

连接到活动浏览器会话

过滤器

`-field`

自定义字段

使用自定义字段运行 katana:

`-store-field`

Katana 过滤器

`-extension-match`

`-extension-filter`

`-match-regex`

`-filter-regex`

高级过滤

速率限制

`-delay`

`-concurrency`

`-parallelism`

`-rate-limit`

`-rate-limit-minute`

输出

`-output`

`-jsonl`

`-store-response`

Katana 作为库使用