gsc — Google Search Console data
gsc exposes Google Search Console data at the grain of
(page, query, day, search_type). One row = one search query that
showed one page of your site to someone on one day, on one search
surface (web / image / video).
Use this material when the question is about organic search:
which queries bring traffic, which pages rank for what, where CTR
is surprisingly low, where position is climbing or falling. If the
question is not about search, do not use gsc — its grain is wide
and it joins expensively.
Sample rows
Hand-crafted sample. Only page_id is shown as an ID (GSC has no
per-impression or per-session identifiers). The full schema is in
ai/materials.yaml.
| page_id | query | search_type | clicks | impressions | ctr | position |
|---|---|---|---|---|---|---|
| 1042 | best manual coffee grinder | web | 142 | 3,210 | 4.42% | 4.8 |
| 1042 | hario slim review | web | 98 | 410 | 23.90% | 2.1 |
| 1042 | coffee grinder under 50 | web | 6 | 1,880 | 0.32% | 9.4 |
| 1088 | hario slim | web | 312 | 720 | 43.33% | 1.3 |
| 1088 | hario slim vs porlex | web | 18 | 150 | 12.00% | 3.7 |
| 1201 | v60 brewing ratio | web | 480 | 2,100 | 22.86% | 2.4 |
| 1201 | how to use v60 | web | 210 | 9,800 | 2.14% | 6.9 |
Things the sample is trying to show at a glance:
page_id 1042ranks well for a narrow query (hario slim review, CTR 23.9%, position 2.1) but poorly for a broad one (coffee grinder under 50, CTR 0.32%, position 9.4). Classic long-tail story: the niche terms convert, the broad terms leak.page_id 1088is the product page. The branded query (hario slim) has CTR 43% at position 1.3 — expected. But the comparison query (hario slim vs porlex) is only position 3.7 with CTR 12%. Comparison intent is not being captured well; that is a content gap worth flagging.page_id 1201ranks well forv60 brewing ratio(position 2.4, CTR 22.9%) but poorly forhow to use v60(position 6.9, CTR 2.1%). Two queries, same page, very different performance — a strong hint that the page title/H1 does not match the broader query's intent.
Columns the sample glosses over
query_id— an integer foreign key into the keyword table. Preferkeywordfor reading;query_idis the key you would use for efficient filtering.position_x100— the raw stored position (×100). Thepositioncolumn shown in the sample is the decoded float.position_weighted— virtual column for impression-weighted position calculations.url,title,page_type,page_fetch_status— master-resolved columns (viapage_id) so you do not need to JOINallpvjust to get the URL of a page.
See ai/materials.yaml for the full list.
⚠️ JOIN caution
gsc has an N:M relationship with allpv on page_id:
One page can have dozens or hundreds of GSC rows (many queries, many days) for each page view. Joining
gsctoallpvwithout a filter can explode row counts (e.g. 100 page views × 150 keyword-days = 15,000 rows).
Always filter gsc before joining. Typical filters are on
keyword, on search_type, or on a specific date range tighter
than the overall query's time window.
Good first questions for gsc
- "Which search queries are driving the most clicks to my blog?"
- "Which pages have high impressions but low CTR? (i.e. title / snippet needs work)"
- "Which queries are we ranking for that we did not target?"
- "What is our average position for branded vs. non-branded queries?"
- "Where is position dropping month over month?"
Where to go next
allpv— page-view-level data to JOIN with./queryreference.