PDF Structure and Tagging

What are PDF tags?

PDF tags are structural elements that define the logical organization and meaning of content within a PDF document. Similar to HTML tags, PDF tags create a hierarchical structure that assistive technologies can understand and navigate. Tags identify different types of content such as headings, paragraphs, lists, tables, and images.

Without proper tagging, a PDF appears to assistive technologies as an unstructured sequence of text and images, making it difficult or impossible for users with disabilities to understand the content's organization and meaning.

PDF tag structure

PDF tags are organized in a tree-like hierarchy, starting with a root element and branching into increasingly specific content elements. This structure mirrors the logical organization of the document:

Document root: The top-level container for all tagged content
Structure elements: Major sections like headers, main content, and sidebars
Content elements: Specific content types like headings, paragraphs, and lists
Inline elements: Text formatting and inline objects within content

This hierarchical structure allows assistive technologies to provide meaningful navigation options, such as jumping between headings or skipping to specific content sections.

Common PDF tags

PDF documents use a standardized set of tags to identify different content types:

Structural tags

Document: Root element containing all content
Part: Large divisions of a document
Sect: Generic container for sections
Div: Generic block-level container

Heading tags

H1-H6: Hierarchical headings from most important (H1) to least important (H6)

Content tags

P: Paragraph text
L: List container
LI: List item
Table: Table container
TR: Table row
TH: Table header cell
TD: Table data cell

Inline tags

Span: Generic inline container
Link: Hyperlink
Figure: Image or graphic

Creating tagged PDFs

There are several methods for creating tagged PDFs:

From source applications

Microsoft Word: Use proper heading styles and structure, then export to PDF with accessibility options enabled
Adobe InDesign: Apply paragraph and character styles, then export with tagged PDF options
HTML to PDF: Well-structured HTML with semantic markup translates to good PDF tags

Using Adobe Acrobat

Auto-tagging: Acrobat can attempt to automatically identify and tag content structure
Manual tagging: Use the Tags panel to manually create and organize tag structure
Tag editing: Modify existing tags to improve accessibility

Best practice: Create tags from well-structured source documents rather than relying solely on auto-tagging, which may miss important structural relationships.

Checking tag structure

It's important to verify that PDF tags are properly structured:

Using Adobe Acrobat

Tags panel: View and navigate the tag tree structure
Content panel: See how content is organized within tags
Accessibility Checker: Automatically identify tag-related accessibility issues

Using screen readers

Navigation commands: Test heading navigation and other structural features
Content reading: Verify that content reads in logical order
Element lists: Check that headings, links, and other elements are properly identified

Regular testing with assistive technologies helps ensure that tag structure provides the intended user experience.