The 2026 Press Kit: One Page, Two Readers, Structured for Both
A 2026 press kit is no longer a download bundle — it's a single structured page that serves a reporter on deadline and an AI citation crawler at the same time. Here's the anatomy.

A 2026 press kit serves two readers at the same time: a reporter on deadline and an AI crawler building a citation graph. The download-bundle ZIP that worked for a decade is now a liability, because AI engines like Perplexity, ChatGPT search, and Google AI Overviews parse newsroom pages directly — and a kit that reads well to humans but lacks structured data is invisible to citation engines. The fix is a single page that exposes the same facts twice: as legible HTML for journalists, and as schema.org JSON-LD for crawlers, with stable URLs for every asset, bio, and release.
Why the press kit became a dual-audience interface
Reporters now arrive at brand pages through fewer direct pitches and more discovery — much of it AI-mediated. When a journalist asks Perplexity "who founded [brand]" or ChatGPT "what's [brand]'s funding history," the answer is generated from whatever those crawlers can parse on the open web. If your newsroom hands them a hero image, three paragraphs of marketing copy, and a contact email, they have nothing structured to cite. Meanwhile a competitor with proper Organization markup and a PressRelease feed gets quoted verbatim.
The unified kit replaces the old press kit ZIP with a live, machine-readable hub. Same content, two reading layers, one URL.
What reporters still need in 2026
The human side hasn't changed much. A journalist on deadline wants:
- One-click asset access — logos and product shots in print and web resolutions, no login wall, no "request media kit" form.
- Plain-language executive bios with verifiable past roles, not marketing prose.
- A named PR contact with timezone, working hours, and a response SLA.
- A recent releases archive in reverse-chronological order, with embargo state visible per item.
- A fact sheet — funding, headcount, locations, sector — with a "last verified" date so the journalist can date-stamp the paragraph they're writing.
If the human flow takes more than 60 seconds, you've already lost the story.
What AI crawlers need that humans don't see
This is the layer most newsrooms still skip. Crawlers parse markup, not vibes. The minimum viable structured stack for a 2026 newsroom:
- schema.org Organization on the brand entity, with
foundingDate,founders, andsameAslinks to authoritative profiles. - schema.org NewsArticle or PressRelease on each release. Google's structured data documentation specifies that NewsArticle markup requires
datePublished,dateModified,headline, andauthorfields to be eligible for rich results. - Stable canonical URLs for every asset, bio, and release — not query-string variants that fragment the citation graph.
- A robots.txt that explicitly grants the major AI crawlers access to the newsroom path. OpenAI publishes specific user-agent strings (GPTBot, OAI-SearchBot, ChatGPT-User) and crawler IP ranges that operators can selectively allow. Perplexity documents PerplexityBot and Perplexity-User as separate crawlers and confirms it respects robots.txt for both.
- JSON-LD identifiers —
sameAsto Wikidata, Crunchbase, and LinkedIn. Wikidata provides stable QID identifiers that disambiguate your brand from same-name competitors across knowledge graphs.
A robots.txt fragment that opens the newsroom to citation crawlers while leaving the rest of the site untouched:
User-agent: GPTBot
Allow: /newsroom/
Allow: /press/
User-agent: PerplexityBot
Allow: /newsroom/
Allow: /press/
User-agent: ClaudeBot
Allow: /newsroom/
Allow: /press/
User-agent: Google-Extended
Allow: /newsroom/
Pair this with an llms.txt at the root, which offers a structured content map specifically for LLM consumption and complements robots.txt.
Anatomy of a 2026 press kit page
Sections that should appear on a single newsroom URL:
Hero block
Brand sentence, founding year, headquarters, sector — rendered in human-readable HTML and again in JSON-LD:
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Example Corp",
"foundingDate": "2018-03-12",
"founders": [{"@type": "Person", "name": "Jane Doe"}],
"sameAs": [
"https://www.wikidata.org/wiki/Q12345678",
"https://www.linkedin.com/company/example-corp",
"https://www.crunchbase.com/organization/example-corp"
]
}
Releases feed
Latest ten releases with embargo-status indicators (under-embargo, lifted, on-the-record). schema.org defines NewsArticle and PressRelease as distinct types, with PressRelease explicitly designed for corporate-issued news content — use PressRelease for first-party announcements so crawlers know what they're reading.
Assets folder
Logos, product shots, headshots — organized by type. Add SHA-256 hashes for journalists verifying authenticity. This matters more in 2026 than it did in 2020.
Leadership
Person markup for each executive, with photos carrying IPTC metadata. Link to verifiable past roles, not "10x leader" phrases.
Fact sheet
Numerics in HTML tables, with a "last verified" date per row. Tables parse cleanly for both humans and crawlers — better than infographics, which are opaque to both.
Newsroom-as-API endpoint
The same content served as JSON for partners and crawlers, mirroring the HTML. If you have a release composer like Prfect's, the structured data is baked in at compose time, not bolted on later. For outlets consuming your feed, see the media partner program.
Embargo and freshness signals as machine-readable fields
Embargoes are where most newsrooms leak — a release goes up before the lift time, gets crawled, and the embargo collapses. Move embargo state into JSON-LD:
{
"@type": "PressRelease",
"headline": "Example Corp Series B",
"datePublished": "2026-05-14T14:00:00Z",
"embargoUntil": "2026-05-14T14:00:00Z",
"releaseStatus": "under-embargo"
}
Crawlers honoring the field skip pre-embargo content. dateModified must reflect actual content edits, not template redeploys — otherwise crawlers deprioritize the source as noisy. A signed feed (RSS or JSON Feed with cryptographic signatures) lets newsrooms verify the kit hasn't been tampered with. When you pull a release, set tombstone metadata (releaseStatus: withdrawn) instead of returning a 404, so the citation chain stays auditable.
How to test the kit against both audiences
Four tests, run quarterly:
- Reporter test: can a journalist on deadline find the CEO's bio, a 300dpi logo, and the latest release in under 60 seconds, without an account?
- Crawler test: does Google's Rich Results Test parse the homepage as
Organizationand each release asNewsArticlewithout warnings? - Citation test: ask Perplexity, ChatGPT, and Google AI Overviews "who founded [brand]" and "when was [brand] founded." Verify the answer matches your structured data. If the engine says something different, your markup is missing or contradicted by another source.
- Asset test: does each downloadable file have a stable URL, a content hash, and a
Last-Modifiedheader that reflects reality?
If any of the four fails, you have a citation gap. Newsrooms that expose Organization JSON-LD and a clean PressRelease feed get cited cleanly; ones that ship a 200MB ZIP and a marketing PDF do not.
The 2026 press kit isn't a folder. It's an interface — one URL, two reading layers, and a structured contract with the engines that now decide who gets quoted.
Defne
Content Editor, Prfect