Knowledge Base

📝 Context Summary

This article describes a methodology for building an AI-powered SEO analysis workflow. The approach uses Python scripts to fetch data from Google Search Console, GA4, and Google Ads into local JSON files, then uses an AI coding tool (Claude Code, Cursor, or similar) to cross-reference the data conversationally. It also covers AI visibility tracking tools for monitoring GEO citation performance. The pattern replaces manual CSV exports and spreadsheet analysis with a fetch-store-query workflow that surfaces insights in seconds.

Building a Cross-Source SEO Analysis Workflow with AI

Most SEO analysis happens in silos. Google Search Console in one tab, GA4 in another, Google Ads in a third, AI visibility data somewhere else. The valuable insights — like “which keywords are we paying for that we already rank for?” — require cross-referencing across these sources, which traditionally means exporting CSVs and spending an afternoon with VLOOKUPs.

This article describes a fetch-store-query pattern that replaces that manual process. Python scripts pull data from Google APIs into local JSON files, and an AI coding tool (Claude Code, Cursor, or any LLM-integrated development environment) answers cross-source questions conversationally.

The total setup takes about an hour. After that, cross-source analysis that used to take hours happens in seconds.


The Fetch-Store-Query Pattern

The architecture is straightforward:

  1. Fetch — Python scripts authenticate against Google APIs and pull data (queries, traffic, search terms, spend)
  2. Store — Data lands in structured JSON files in a local project directory
  3. Query — An AI coding tool reads all the JSON files and answers cross-source questions conversationally
seo-analysis/
├── config.json              # Property IDs and client context
├── fetchers/
│   ├── fetch_gsc.py         # Google Search Console
│   ├── fetch_ga4.py         # Google Analytics 4
│   ├── fetch_ads.py         # Google Ads search terms
│   └── fetch_ai_visibility.py  # AI citation data
├── data/
│   ├── gsc/                 # Query + page performance
│   ├── ga4/                 # Traffic by channel, top pages
│   ├── ads/                 # Search terms, spend, conversions
│   └── ai-visibility/       # AI citation data
└── reports/                 # Generated analysis

This pattern is effectively the same architecture behind any data pipeline: extract, load, analyze. The difference is that the “analyze” step is conversational rather than dashboards or SQL.


Data Source 1: Google Search Console

The Google Search Console API provides query-level and page-level organic search performance data: impressions, clicks, CTR, and average position.

Authentication

GSC uses a Google Cloud service account with read-only access:

  1. Create a project in Google Cloud Console
  2. Enable the Search Console API
  3. Create a service account under IAM & Admin > Service Accounts
  4. Download the JSON key file
  5. Add the service account email (e.g., [email protected]) as a user in the GSC property with read access

What You Get

  • Top queries by impressions, clicks, CTR, and position
  • Page-level performance data
  • 16 months of historical data available
  • Up to 25,000 rows per API request

Key Insight

GSC data is the foundation for almost every cross-source analysis. Start here. The top 1,000 queries over 90 days gives you enough to run gap analyses, identify content opportunities, and prioritize improvements.


Data Source 2: Google Analytics 4

GA4 provides on-site behavior data: sessions, users, bounce rate, engagement metrics, traffic by channel.

Authentication

GA4 uses the same service account as GSC. Enable the Google Analytics Data API in your Cloud project and add the service account email as a Viewer in the GA4 property.

What You Get

  • Traffic by channel (organic, paid, direct, referral, social)
  • Top pages by sessions, engagement, and bounce rate
  • User behavior metrics per landing page
  • Conversion data by source

Key Insight

GA4 alone doesn’t tell you much about SEO. But combined with GSC, it answers questions like: “Which pages rank well in GSC but have high bounce rates in GA4?” — which points to content quality issues that pure ranking data misses.


Data Source 3: Google Ads

Google Ads provides paid search data: search terms, impressions, clicks, cost, and conversions.

Authentication

Google Ads requires a separate OAuth 2.0 setup (not the service account):

  • A developer token from the Google Ads API Center
  • OAuth 2.0 credentials from Google Cloud
  • A one-time browser authentication for the refresh token
  • Developer token approval typically takes 24-48 hours

If you’re using a Manager Account (MCC), one developer token covers all sub-accounts. If API access isn’t available yet, exporting 90 days of search terms as CSVs from the Google Ads UI works just as well.

What You Get

  • Search terms with impressions, clicks, cost, and conversions
  • Match type, campaign, and ad group data
  • Cost-per-click and conversion rate by keyword

Key Insight

The real value of Ads data is in what it reveals when compared against GSC. Keywords you’re paying for that you already rank for organically represent wasted spend. Keywords you’re paying for with zero organic visibility represent content gaps.


The Paid-Organic Gap Analysis

The single most valuable cross-source analysis you can run:

The question: “Compare GSC query data against Google Ads search terms. Find keywords where we’re paying for clicks but already have strong organic positions. Also find keywords where we’re spending on ads with zero organic visibility.”

What this surfaces:

Finding Implication
High organic rank + active ads Potential wasted ad spend — consider reducing bids
Ad spend + zero organic presence Content gaps — create organic content for these topics
Strong organic + no ads Paid amplification candidates
Low organic CTR + high impressions Title tag and meta description optimization opportunities

This analysis takes approximately 90 seconds with an AI tool reading the JSON files. The equivalent manual process — downloading CSVs, cross-referencing in spreadsheets, categorizing overlaps — takes several hours.

Other High-Value Cross-Source Questions

  • GSC + GA4: “Which pages get the most impressions but have high bounce rates?” (content quality issues)
  • GSC topic clusters: “Group queries by topic cluster and show which clusters have the most impressions but lowest average position.” (content investment priorities)
  • GA4 + GSC: “Which pages rank well organically but have low engagement metrics?” (user experience gaps)
  • All three: “Show me a complete view of our presence for [topic]: organic rankings, paid spend, on-site engagement.”

Data Source 4: AI Visibility Tracking

Traditional SERP positions are no longer the complete picture. Google’s AI Overviews, Bing Copilot, ChatGPT, and Perplexity all generate answers that may or may not cite your content. Tracking whether your content appears in these AI-generated responses is increasingly important — especially for GEO strategy.

Available Tools and APIs

Tool Type Cost What It Tracks
Bing Webmaster Tools First-party Free Copilot citations, grounding queries, page-level data. The most reliable AI citation data available — first-party, not estimated.
DataForSEO AI Overview API SERP API ~$0.01/query, $50 min Google AI Overview content and cited URLs. Also has an LLM Mentions API for brand tracking across platforms.
SerpApi SERP API From $75/mo (5,000 searches) Full Google SERP including AI Overviews. Good documentation and Python client.
SearchAPI.io SERP API From $40/mo Google SERP + separate Google AI Mode API for AI-generated answers with citations.
Bright Data SERP API SERP API ~$1.80/1,000 requests Google SERP with AI Overview capture. Also has an MCP server for agent integration.
DIY: Direct LLM API calls Custom Under $20/mo Send consistent prompts to OpenAI, Anthropic, and Perplexity APIs, parse for brand mentions. Perplexity’s Sonar API includes web citations and citation tokens are free.
  1. Bing Webmaster Tools (free, first-party) — the most reliable AI citation data available
  2. One SERP API for Google AI Overview data (DataForSEO is the most accessible)
  3. Direct LLM API monitoring for brand mention tracking across platforms

Current Limitations

AI visibility tracking is still maturing. Google does not publish AI Overview or AI Mode citation data through any official API — every third-party tool is approximating. Treat AI visibility data as directionally useful (a wind sock, not GPS). Bing’s Copilot data is the most reliable because it’s first-party, but it only covers the Microsoft ecosystem.


Practical Workflow

Initial Setup (~1 hour per site)

  1. Create a Google Cloud project and service account
  2. Enable GSC and GA4 APIs
  3. Add service account to each property
  4. Set up Google Ads OAuth (if applicable)
  5. Create the fetcher scripts and config file

Monthly Data Pull (~5 minutes)

Run the fetcher scripts to pull fresh data:

python3 fetchers/fetch_gsc.py
python3 fetchers/fetch_ga4.py
python3 fetchers/fetch_ads.py

Analysis (as needed)

Open the project directory in your AI coding tool and ask questions. The data is local JSON — the tool reads it directly and responds conversationally. No dashboards to maintain, no exports to refresh.

What This Doesn’t Replace

  • Historical trend analysis — platforms like SEMrush and Ahrefs maintain longer time series
  • Automated alerts — this is ad-hoc analysis, not monitoring
  • Client-facing dashboards — you still need Looker Studio or similar for ongoing reporting
  • Strategic judgment — the tool finds patterns across data sources faster than a human can manually. It doesn’t know what to do about those patterns. That requires understanding the business, the competitive landscape, and the goals.

Connection to GEO Strategy

This workflow directly supports Generative Engine Optimization (GEO) by making AI citation data queryable alongside traditional SEO metrics. When you can ask “which of our pages get cited in AI Overviews but have declining organic CTR?” in a single query, you can make faster, better-informed decisions about content optimization for both traditional and AI search.

The AI visibility tools listed above are also relevant for monitoring the performance of any content optimized for GEO — tracking whether increased fact density, semantic structure, and entity authority are actually resulting in more AI citations.

See also: Generative Engine Optimization (GEO) | AI Search Visibility Metrics

Key Concepts: Cross-Source Analysis Fetch-Store-Query Pattern Paid-Organic Gap Analysis AI Visibility Tracking GEO Citation Monitoring Google Search Console API GA4 API Google Ads API

About the Author: Adam

Building a Cross-Source SEO Analysis Workflow with AI
Adam Bernard is a digital marketing strategist and SEO specialist building AI-powered business intelligence systems. He's the creator of the Strategic Intelligence Engine (SIE), a multi-agent framework that transforms business knowledge into autonomous, AI-driven competitive advantages.

Let’s Connect

Ready to Build Your Own Intelligence Engine?

If you’re ready to move from theory to implementation and build a Knowledge Core for your own business, I can help you design the engine to power it. Let’s discuss how these principles can be applied to your unique challenges and goals.