Unlock SEO Insights with Data Science: Visualising Your Website's Internal Link Graph

I recently came across a fascinating video that mapped all of Wikipedia into a data-filled graph. The visualisation was not only stunning but also incredibly insightful, showing the complex web of links between articles. This inspired me to apply a similar technique to websites, aiming to create something equally useful for SEO professionals.

Section 1: Inspiration from Wikipedia The original video showcased an impressive project that used data from Wikipedia dumps to create a comprehensive graph. By applying sophisticated algorithms like the Distributed Recursive Layout and Leiden community detection, the creators revealed the intricate link structure of Wikipedia. Watching this, I realised the potential for SEOs to gain similar insights into their own websites.

Section 2: Why Map Your Website? Visualising a website’s internal link structure offers several benefits:

Identify Link Opportunities: Discover pages that could benefit from additional internal links.
Detect Orphan Pages: Find pages that are not linked from other content.
Improve Site Structure: Optimise the overall internal linking strategy for better SEO performance.

Section 3: The Process Here’s a step-by-step guide to how I mapped my website:

Data Collection: First, we scrape the website to collect all internal links. Using tools like BeautifulSoup and Requests, we extract the URLs and titles of each page.
Graph Creation: Next, we represent the website as a graph. Each page is a node, and each link is an edge. This can be done using libraries like NetworkX.
Community Detection: Using community detection algorithms, we identify clusters of related pages. This helps in understanding the thematic groupings within the site.
Visualisation: Finally, we visualise the graph. This step transforms the data into a visual format that is easy to interpret and analyse.

Section 4: Practical Benefits for SEOs SEOs can leverage these visualisations in several ways:

Optimise Internal Linking: Ensure that important pages are well-linked and easily accessible.
Improve User Experience: Create a logical and user-friendly site structure.
Identify Content Gaps: Spot areas where additional content or links could enhance the site.

Section 5: Getting Started To replicate this process, you need some basic knowledge of Python and access to the necessary libraries. For detailed code and step-by-step instructions, subscribe to access our premium content.

Python Script

Subscriber Content

import pandas as pd
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import re
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from networkx.algorithms import community

# Step 1: Data Collection
def collect_data(base_url):
    to_visit = [base_url]
    visited = set()
    pages = []

    while to_visit:
        url = to_visit.pop(0)
        if url in visited:
            continue
        visited.add(url)

        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')

        # Exclude non-HTML pages
        if 'text/html' not in response.headers.get('Content-Type', ''):
            continue

        # Extract page title
        title = soup.title.string if soup.title else 'No Title'

        # Extract internal links excluding header/footer
        body = soup.find('body')
        links = set()
        for a_tag in body.find_all('a', href=True):
            href = a_tag['href']
            if not href.startswith('#') and not re.match(r'^https?://', href):
                full_url = urljoin(base_url, href)
                if urlparse(full_url).netloc == urlparse(base_url).netloc:
                    links.add(full_url)

        pages.append({'url': url, 'title': title, 'links': links})
        to_visit.extend(links - visited)

    return pages

base_url = 'https://www.python.org/'
data = collect_data(base_url)

# Step 2: Graph Creation
def create_graph(data):
    g = nx.Graph()  # Create an undirected graph

    # Add nodes and edges
    url_to_id = {}
    for page in data:
        if page['url'] not in url_to_id:
            url_to_id[page['url']] = len(url_to_id)
            g.add_node(page['url'], title=page['title'])

    for page in data:
        for link in page['links']:
            if link in url_to_id:
                g.add_edge(page['url'], link)

    return g

graph = create_graph(data)

# Step 3: Community Detection
def detect_related_topics(graph):
    return list(community.greedy_modularity_communities(graph))

related_topics = detect_related_topics(graph)

# Print community information
community_info = []
for i, cluster in enumerate(related_topics):
    for vertex in cluster:
        community_info.append({
            'Community': i,
            'Page Title': graph.nodes[vertex]['title'],
            'URL': vertex,
            'Degree': graph.degree(vertex)
        })

# Create a DataFrame for the community information
df_community_info = pd.DataFrame(community_info)
df_community_info.to_csv('community_info.csv', index=False)
print("Community information saved to community_info.csv")

# Step 5: Visualization
def visualize_graph(graph, related_topics):
    pos = nx.spring_layout(graph)
    fig, ax = plt.subplots(figsize=(20, 20))

    # Plot nodes
    colors = cm.rainbow([i / len(related_topics) for i in range(len(related_topics))])
    for i, cluster in enumerate(related_topics):
        color = colors[i]
        for node in cluster:
            x, y = pos[node]
            ax.plot(x, y, 'o', color=color, markersize=5)
            ax.text(x, y, graph.nodes[node]['title'], fontsize=8, color=color)

    # Plot edges
    for edge in graph.edges():
        x1, y1 = pos[edge[0]]
        x2, y2 = pos[edge[1]]
        ax.plot([x1, x2], [y1, y2], 'k-', alpha=0.5)

    plt.show()

visualize_graph(graph, related_topics)

Add content here that will only be visible to your subscribers.

Subscribe to get access

To access the full code and detailed instructions, subscribe to our premium content. For just £5 a month, you’ll get:

The complete Python script to map your website.
Access to over 40 other SEO scripts and tools.
Regular updates with new resources and templates.

Conclusion: Mapping your website using a graph approach can reveal hidden insights and opportunities for SEO improvement. If you’re interested in diving deeper, subscribe now to get access to the full code and additional resources.

Tagged ai, digital-marketing, machine-learning, search-engine-optimization, seo

Unlock SEO Insights with Data Science: Visualising Your Website’s Internal Link Graph

Python Script

Subscriber Content

Subscribe to get access

Leave a Reply Cancel reply