attribute - SKILL.md Agent Skill

name: attribute version: 1.0 description: Attribute leaked environment data to victim companies by analyzing ownership signals. Use when analyzing breach data, supply chain attack artifacts, or exfiltrated environment snapshots to identify which organization was compromised. Trigger for questions about victim identification, leaked credential attribution, or breach victimology. author: ramimac argument-hint: [target-directory]

Leaked Data Attribution

Identify the victim organization from leaked environment data by analyzing ownership signals - platform metadata, private infrastructure, corporate domains, credentials, and unique identifiers.

Core Principle: Organizational vs Individual Signals

Attribution requires distinguishing between:

Organizational signals - Directly identify an organization:

Platform organization names (GitHub org, Azure DevOps collection, GitLab namespace)
Corporate infrastructure (private registries, self-hosted tools)
Enterprise accounts (SSO redirects, tenant IDs)

Individual signals - Identify a person who may work for an organization:

Personal tokens and credentials
Email addresses
Usernames and profile fields

The key question: Does this individual represent the organization, or are they incidental to the data?

Analysis Workflow

Step 1: Extract and Catalog All Signals

Before attributing, systematically extract everything that could identify ownership.

Domains to extract:

Email domains (from any email address found)
URL domains (from any URL - registries, APIs, webhooks, configs)
Hostname patterns (machine names, DNS suffixes)
Proxy/network configuration domains

Identifiers to extract:

Platform org/user names (GitHub, GitLab, Azure DevOps, etc.)
Tenant/account IDs (Azure tenant, AWS account, GCP project)
Workspace/team IDs (Slack team ID, Atlassian org)
Repository paths and namespaces

High-entropy strings to note:

API keys and tokens (can be validated/enriched via APIs)
Webhook URLs (contain embedded IDs)
JWT tokens (contain issuer, claims)
Connection strings (contain hostnames, accounts)

Unique patterns to flag:

Custom environment variable prefixes (e.g., ACME_*)
Asset naming conventions in hostnames
Internal tool names or project codes
Scoped package names (@company/package)

Step 2: Analyze Organizational Platform Signals

Examine signals that directly identify an organization through platform metadata.

Platform organizations:

GitHub repository owner → Verify it's an Organization (not User) via API
Azure DevOps collection URI → Extract org from path
GitLab project namespace → Root namespace = org
CircleCI, Bitbucket, Buildkite, Drone, Travis → Org/workspace fields

Enterprise indicators:

Self-hosted platform domains (e.g., gitlab.acme.com = Acme owns it)
Enterprise SSO/SAML redirects (GitHub Enterprise Cloud detection)
Tenant IDs that resolve to organization names

What to look for:

Org names in platform-specific environment variables
Repository URLs with org in path
OIDC token claims containing owner/org fields

Confidence: HIGH when verified as organization

Step 3: Analyze Infrastructure Signals

Private or self-hosted infrastructure indicates organizational ownership.

Domains indicating ownership:

Private registry domains (npm.acme.com, artifactory.acme.io)
Self-hosted tool domains (vault.acme.com, sentry.acme.internal)
Internal Git server domains

Identifiers in infrastructure URLs:

Org names embedded in paths (pkgs.dev.azure.com/{org}/...)
Tenant subdomains ({company}.jfrog.io)
Account-specific endpoints

Package scopes:

Scoped npm packages (@acme/package) pointing to private registries
Private PyPI indexes
Go private module patterns

Confidence: HIGH when on identifiable corporate domain

Step 4: Analyze Domain Signals

Extract and analyze all domains found in the data.

High-value domain sources:

Email addresses → Corporate email = strong signal
Git commit metadata → Author/committer email domains
Proxy bypass lists (no_proxy) → Often contain internal domains
URL configurations → API endpoints, webhooks, service URLs

Domain extraction from:

Full URLs → Parse out hostname
Email addresses → Extract domain portion
Hostnames → Extract DNS suffix patterns
Configuration values → Look for embedded URLs/domains

Filtering - skip these:

Category	Examples
Localhost	`localhost`, `127.0.0.1`, `.local`
Internal suffixes	`.internal`, `.corp`, `.lan`, `.intranet`
Placeholders	`localdomain.com`, `example.com`
Cloud providers	`amazonaws.com`, `azure.com`, `googleapis.com`
Public platforms	`github.com`, `gitlab.com`, `npmjs.org`
Personal email	`gmail.com`, `yahoo.com`, `outlook.com`

Confidence: MEDIUM for corporate domains; needs corroboration

Step 5: Analyze High-Entropy Strings and Credentials

Tokens and secrets can be validated or enriched to reveal organizational context.

API tokens - validate and enrich:

GitHub PATs → Call /user to get profile with company field
Slack tokens → Call auth.test to get workspace name
npm tokens → Profile lookup reveals org memberships
Cloud credentials → Often return account metadata on validation

Webhook URLs - extract embedded IDs:

Slack webhooks → Team ID in path (/services/{TEAM_ID}/...)
Other webhooks → May contain account/workspace identifiers

JWT tokens - decode and examine:

Issuer domain (iss claim) → May indicate organization
Subject/audience claims → May contain org identifiers
Custom claims → Platform-specific org information

Connection strings - parse components:

Database hostnames → May be corporate infrastructure
Account names → May embed organization
Server URLs → Domain analysis

Confidence: Varies - validated tokens = HIGH; unvalidated = LOW

Step 6: Evaluate Individual Signals

When signals are tied to individuals (not organizations), extra validation is needed.

Individual signal types:

Personal tokens and API keys
Email addresses (could be personal or corporate)
Usernames and profile fields
Personal tool configurations

The problem:

Credentials may not belong to the victim
Profile fields are self-reported and may be stale
Individual context may not reflect organizational affiliation

Validation approaches:

Does individual identity match other context in the data?
Do multiple individual signals point to the same organization?
Is there corroboration from organizational signals?

Confidence: LOW unless corroborated by organizational signals

Step 7: Cross-Reference and Corroborate

Combine signals to build confidence.

Cross-referencing:

Does the email domain match the platform org?
Does the private registry domain match other infrastructure?
Do multiple independent signals point to the same company?

Alias resolution:

Map cryptic org names to company names (e.g., acme-dev → Acme Corp)
Look for company prefixes/suffixes in org names
Cross-reference org names with email domains

Confidence boosting:

Condition	Confidence Impact
Multiple corroborating signals	→ HIGH
Enterprise/Fortune 500 match	→ HIGH
API-verified organization	→ HIGH
Single organizational signal	→ MEDIUM
Single weak/individual signal	→ LOW
Contradictory signals	→ Manual review

Step 8: Resolve Ambiguity

Contradictory signals:

Prioritize organizational signals over individual signals
Prioritize infrastructure domains over email domains
Consider client/vendor relationships (one org using another's tools)
Flag for manual review if unresolved

Personal accounts mistaken for organizations:

If API lookup reveals personal account → Do not attribute as company
Check user's company profile field (LOW confidence)
Look for other organizational signals

No clear signals:

Document what signals exist
Note confidence as LOW or NONE
Identify enrichment opportunities

Signal Reliability Reference

HIGH Confidence

Verified platform organization (API confirmed)
Self-hosted infrastructure on corporate domain
Enterprise SSO/SAML configurations
Multiple corroborating organizational signals
Validated credentials returning org metadata

MEDIUM Confidence

Corporate email domains
Private registry domains (without full verification)
Unverified organization names
Workspace/team names from collaboration tools

LOW Confidence

Hostname patterns without domain context
Cloud resource naming conventions
JWT token issuer domains (unvalidated)
Individual profile fields without corroboration
Single uncorroborated signal

Signals to AVOID

Collection/exfiltration paths - Attacker's infrastructure, not victim
Package author metadata - Package creator, not consumer
Uncorroborated individual signals - May not represent the victim

Enrichment Tactics

Credential Validation

Validate tokens to extract organizational metadata:

GitHub → /user endpoint for company field
Slack → auth.test for workspace info
npm → Profile lookup for org memberships
Cloud platforms → Metadata APIs for account info

ID Resolution

Resolve opaque identifiers to names:

Slack team IDs → API or browser lookup
Azure tenant IDs → Organization name resolution
AWS account IDs → (limited without access)
Platform org IDs → API lookups

Org Name Mapping

Map cryptic names to companies:

Look for company prefixes/suffixes
Cross-reference with domain signals
Build alias dictionary from confirmed mappings

Domain Intelligence

Enrich domains with context:

WHOIS/DNS lookups for domain ownership
Certificate transparency for related domains
Known corporate domain databases

Quick Checklist

When analyzing leaked environment data:

Extracted all domains (URLs, emails, hostnames)
Extracted all platform org/user identifiers
Extracted all tenant/account/workspace IDs
Noted high-entropy strings (tokens, keys, webhooks)
Identified unique patterns (custom prefixes, naming conventions)
Filtered out infrastructure/cloud/personal domains
Validated credentials where possible
Cross-referenced signals for corroboration
Resolved ambiguous org names
Assigned confidence level to attribution