Knowledge Base
📝 Context Summary
Building a Cross-Source SEO Analysis Workflow with AI
Most SEO analysis happens in silos. Google Search Console in one tab, GA4 in another, Google Ads in a third, AI visibility data somewhere else. The valuable insights — like “which keywords are we paying for that we already rank for?” — require cross-referencing across these sources, which traditionally means exporting CSVs and spending an afternoon with VLOOKUPs.
This article describes a fetch-store-query pattern that replaces that manual process. Python scripts pull data from Google APIs into local JSON files, and an AI coding tool (Claude Code, Cursor, or any LLM-integrated development environment) answers cross-source questions conversationally.
The total setup takes about an hour. After that, cross-source analysis that used to take hours happens in seconds.
The Fetch-Store-Query Pattern
The architecture is straightforward:
- Fetch — Python scripts authenticate against Google APIs and pull data (queries, traffic, search terms, spend)
- Store — Data lands in structured JSON files in a local project directory
- Query — An AI coding tool reads all the JSON files and answers cross-source questions conversationally
seo-analysis/
├── config.json # Property IDs and client context
├── fetchers/
│ ├── fetch_gsc.py # Google Search Console
│ ├── fetch_ga4.py # Google Analytics 4
│ ├── fetch_ads.py # Google Ads search terms
│ └── fetch_ai_visibility.py # AI citation data
├── data/
│ ├── gsc/ # Query + page performance
│ ├── ga4/ # Traffic by channel, top pages
│ ├── ads/ # Search terms, spend, conversions
│ └── ai-visibility/ # AI citation data
└── reports/ # Generated analysis
This pattern is effectively the same architecture behind any data pipeline: extract, load, analyze. The difference is that the “analyze” step is conversational rather than dashboards or SQL.
Data Source 1: Google Search Console
The Google Search Console API provides query-level and page-level organic search performance data: impressions, clicks, CTR, and average position.
Authentication
GSC uses a Google Cloud service account with read-only access:
- Create a project in Google Cloud Console
- Enable the Search Console API
- Create a service account under IAM & Admin > Service Accounts
- Download the JSON key file
- Add the service account email (e.g.,
[email protected]) as a user in the GSC property with read access
What You Get
- Top queries by impressions, clicks, CTR, and position
- Page-level performance data
- 16 months of historical data available
- Up to 25,000 rows per API request
Key Insight
GSC data is the foundation for almost every cross-source analysis. Start here. The top 1,000 queries over 90 days gives you enough to run gap analyses, identify content opportunities, and prioritize improvements.
Data Source 2: Google Analytics 4
GA4 provides on-site behavior data: sessions, users, bounce rate, engagement metrics, traffic by channel.
Authentication
GA4 uses the same service account as GSC. Enable the Google Analytics Data API in your Cloud project and add the service account email as a Viewer in the GA4 property.
What You Get
- Traffic by channel (organic, paid, direct, referral, social)
- Top pages by sessions, engagement, and bounce rate
- User behavior metrics per landing page
- Conversion data by source
Key Insight
GA4 alone doesn’t tell you much about SEO. But combined with GSC, it answers questions like: “Which pages rank well in GSC but have high bounce rates in GA4?” — which points to content quality issues that pure ranking data misses.
Data Source 3: Google Ads
Google Ads provides paid search data: search terms, impressions, clicks, cost, and conversions.
Authentication
Google Ads requires a separate OAuth 2.0 setup (not the service account):
- A developer token from the Google Ads API Center
- OAuth 2.0 credentials from Google Cloud
- A one-time browser authentication for the refresh token
- Developer token approval typically takes 24-48 hours
If you’re using a Manager Account (MCC), one developer token covers all sub-accounts. If API access isn’t available yet, exporting 90 days of search terms as CSVs from the Google Ads UI works just as well.
What You Get
- Search terms with impressions, clicks, cost, and conversions
- Match type, campaign, and ad group data
- Cost-per-click and conversion rate by keyword
Key Insight
The real value of Ads data is in what it reveals when compared against GSC. Keywords you’re paying for that you already rank for organically represent wasted spend. Keywords you’re paying for with zero organic visibility represent content gaps.
The Paid-Organic Gap Analysis
The single most valuable cross-source analysis you can run:
The question: “Compare GSC query data against Google Ads search terms. Find keywords where we’re paying for clicks but already have strong organic positions. Also find keywords where we’re spending on ads with zero organic visibility.”
What this surfaces:
| Finding | Implication |
|---|---|
| High organic rank + active ads | Potential wasted ad spend — consider reducing bids |
| Ad spend + zero organic presence | Content gaps — create organic content for these topics |
| Strong organic + no ads | Paid amplification candidates |
| Low organic CTR + high impressions | Title tag and meta description optimization opportunities |
This analysis takes approximately 90 seconds with an AI tool reading the JSON files. The equivalent manual process — downloading CSVs, cross-referencing in spreadsheets, categorizing overlaps — takes several hours.
Other High-Value Cross-Source Questions
- GSC + GA4: “Which pages get the most impressions but have high bounce rates?” (content quality issues)
- GSC topic clusters: “Group queries by topic cluster and show which clusters have the most impressions but lowest average position.” (content investment priorities)
- GA4 + GSC: “Which pages rank well organically but have low engagement metrics?” (user experience gaps)
- All three: “Show me a complete view of our presence for [topic]: organic rankings, paid spend, on-site engagement.”
Data Source 4: AI Visibility Tracking
Traditional SERP positions are no longer the complete picture. Google’s AI Overviews, Bing Copilot, ChatGPT, and Perplexity all generate answers that may or may not cite your content. Tracking whether your content appears in these AI-generated responses is increasingly important — especially for GEO strategy.
Available Tools and APIs
| Tool | Type | Cost | What It Tracks |
|---|---|---|---|
| Bing Webmaster Tools | First-party | Free | Copilot citations, grounding queries, page-level data. The most reliable AI citation data available — first-party, not estimated. |
| DataForSEO AI Overview API | SERP API | ~$0.01/query, $50 min | Google AI Overview content and cited URLs. Also has an LLM Mentions API for brand tracking across platforms. |
| SerpApi | SERP API | From $75/mo (5,000 searches) | Full Google SERP including AI Overviews. Good documentation and Python client. |
| SearchAPI.io | SERP API | From $40/mo | Google SERP + separate Google AI Mode API for AI-generated answers with citations. |
| Bright Data SERP API | SERP API | ~$1.80/1,000 requests | Google SERP with AI Overview capture. Also has an MCP server for agent integration. |
| DIY: Direct LLM API calls | Custom | Under $20/mo | Send consistent prompts to OpenAI, Anthropic, and Perplexity APIs, parse for brand mentions. Perplexity’s Sonar API includes web citations and citation tokens are free. |
Recommended Starting Approach
- Bing Webmaster Tools (free, first-party) — the most reliable AI citation data available
- One SERP API for Google AI Overview data (DataForSEO is the most accessible)
- Direct LLM API monitoring for brand mention tracking across platforms
Current Limitations
AI visibility tracking is still maturing. Google does not publish AI Overview or AI Mode citation data through any official API — every third-party tool is approximating. Treat AI visibility data as directionally useful (a wind sock, not GPS). Bing’s Copilot data is the most reliable because it’s first-party, but it only covers the Microsoft ecosystem.
Practical Workflow
Initial Setup (~1 hour per site)
- Create a Google Cloud project and service account
- Enable GSC and GA4 APIs
- Add service account to each property
- Set up Google Ads OAuth (if applicable)
- Create the fetcher scripts and config file
Monthly Data Pull (~5 minutes)
Run the fetcher scripts to pull fresh data:
python3 fetchers/fetch_gsc.py
python3 fetchers/fetch_ga4.py
python3 fetchers/fetch_ads.py
Analysis (as needed)
Open the project directory in your AI coding tool and ask questions. The data is local JSON — the tool reads it directly and responds conversationally. No dashboards to maintain, no exports to refresh.
What This Doesn’t Replace
- Historical trend analysis — platforms like SEMrush and Ahrefs maintain longer time series
- Automated alerts — this is ad-hoc analysis, not monitoring
- Client-facing dashboards — you still need Looker Studio or similar for ongoing reporting
- Strategic judgment — the tool finds patterns across data sources faster than a human can manually. It doesn’t know what to do about those patterns. That requires understanding the business, the competitive landscape, and the goals.
Connection to GEO Strategy
This workflow directly supports Generative Engine Optimization (GEO) by making AI citation data queryable alongside traditional SEO metrics. When you can ask “which of our pages get cited in AI Overviews but have declining organic CTR?” in a single query, you can make faster, better-informed decisions about content optimization for both traditional and AI search.
The AI visibility tools listed above are also relevant for monitoring the performance of any content optimized for GEO — tracking whether increased fact density, semantic structure, and entity authority are actually resulting in more AI citations.
See also: Generative Engine Optimization (GEO) | AI Search Visibility Metrics