IMPORTXML Cheat Sheet: Essential Formulas for SEO and Beyond

Welcome to the ultimate IMPORTXML cheat sheet! This page covers essential formulas to help you pull data from any webpage straight into Google Sheets, from meta tags to headings and beyond.

Key Sections:

  1. Basic IMPORTXML Formula
    • Formula: =IMPORTXML("URL", "XPath")
    • Description: Pulls specific data from a webpage.
  2. Title Tag
    • Formula: =IMPORTXML("https://example.com", "//title")
    • Use: Extracts the title tag from a webpage.
  3. Meta Description
    • Formula: =IMPORTXML("https://example.com", "//meta[@name='description']/@content")
    • Use: Retrieves the meta description.
  4. H1 Tag
    • Formula: =IMPORTXML("https://example.com", "//h1")
    • Use: Extracts the H1 tag for SEO analysis.
  5. Multiple Data Points (Combined Query)
    • Formula Example:excelCopy code=ARRAYFORMULA(IMPORTXML("https://example.com", {"//title", "//meta[@name='description']/@content", "//h1"}))
    • Use: Fetches multiple elements (title, description, H1).
  6. Competitor Keywords (Example)
    • Formula Example: Varies depending on keyword location.
    • Description: Pulls visible keyword phrases from competitor pages.
  7. Troubleshooting IMPORTXML
    • Common Errors: #N/A, slow loads, XPath syntax issues.
    • Tips: Check URL, simplify XPath, reduce data size.
Data to ExtractFormulaDescription
Title Tag=IMPORTXML("https://example.com", "//title")Extracts the title tag of a webpage.
Meta Description=IMPORTXML("https://example.com", "//meta[@name='description']/@content")Pulls the meta description for SEO audits.
H1 Tag=IMPORTXML("https://example.com", "//h1")Retrieves the main H1 heading from a page.
All H2 Tags=IMPORTXML("https://example.com", "//h2")Pulls all H2 headings from a page.
Canonical URL=IMPORTXML("https://example.com", "//link[@rel='canonical']/@href")Extracts the canonical URL specified on a page.
Image Alt Text=IMPORTXML("https://example.com", "//img/@alt")Retrieves all image alt attributes.
Structured Data (JSON-LD)=IMPORTXML("https://example.com", "//script[@type='application/ld+json']")Pulls JSON-LD structured data from a page.
Open Graph Title=IMPORTXML("https://example.com", "//meta[@property='og:title']/@content")Fetches the Open Graph title for social sharing.
Open Graph Description=IMPORTXML("https://example.com", "//meta[@property='og:description']/@content")Fetches the Open Graph description.
Twitter Card Title=IMPORTXML("https://example.com", "//meta[@name='twitter:title']/@content")Extracts the Twitter Card title.
Twitter Card Description=IMPORTXML("https://example.com", "//meta[@name='twitter:description']/@content")Extracts the Twitter Card description.
All Links (URLs)=IMPORTXML("https://example.com", "//a/@href")Retrieves all hyperlinks on a page.
First Paragraph Text=IMPORTXML("https://example.com", "(//p)[1]")Pulls the first paragraph from the page body.
Published Date (Article)=IMPORTXML("https://example.com", "//meta[@property='article:published_time']/@content")Fetches the publication date of an article.
Author Name (Article)=IMPORTXML("https://example.com", "//meta[@name='author']/@content")Extracts the author’s name.
Breadcrumbs=IMPORTXML("https://example.com", "//nav[@aria-label='breadcrumb']//a")Pulls breadcrumb links (if structured as nav).
Product Price (eCommerce)=IMPORTXML("https://example.com", "//meta[@itemprop='price']/@content")Extracts product price on eCommerce pages.
Product Availability=IMPORTXML("https://example.com", "//meta[@itemprop='availability']/@content")Fetches availability status of products.
Review Rating=IMPORTXML("https://example.com", "//meta[@itemprop='ratingValue']/@content")Pulls the rating value from product reviews.
Number of Reviews=IMPORTXML("https://example.com", "//meta[@itemprop='reviewCount']/@content")Fetches the number of reviews for a product.
Video Embed URL=IMPORTXML("https://example.com", "//iframe[@class='video-embed']/@src")Extracts the URL for embedded videos.
Favicon URL=IMPORTXML("https://example.com", "//link[@rel='icon']/@href")Retrieves the favicon URL of a site.
Robots Meta Tag=IMPORTXML("https://example.com", "//meta[@name='robots']/@content")Fetches robots instructions (e.g., noindex).
All Paragraphs (for Content)=IMPORTXML("https://example.com", "//p")Pulls all paragraph text from the body content.

Leave a Reply

Your email address will not be published. Required fields are marked *