Technical SEO on a FastAPI blog: what this site actually does

Published: 2026-06-05

Most SEO guides are written for WordPress or SaaS platforms where you install a plugin and click "save". When you run a custom FastAPI app, none of that applies. You own the HTML, which means you have to wire up every meta tag, every structured-data block, and every protocol integration yourself. This post documents what the SEO stack on this blog looks like: what's implemented, where it lives in the code, and what breaks when you get it wrong.

The problem

The blog runs on weblog.antonnovikov.com, built with FastAPI and Jinja2 templates. Posts are Markdown files rendered server-side. There are two languages — English and Russian — each with its own URL tree. The goals were straightforward:

Every post indexed in Google and Yandex within hours of publishing
Bilingual alternates recognized correctly (no duplicate-content penalties)
Rich snippets in search results (article type, dates, breadcrumbs)
Feeds working for RSS readers

None of these come for free. You have to implement them.

How it works

The SEO stack is layered across three places: the Jinja2 base template (templates/base.html), the blog post template (templates/blog/index.html), and the router (app/routers/blog.py). Each layer handles a different concern.

Meta tags

Every page gets a <meta name="description"> populated from the post's description field in index.json. For individual post pages the robots directive is expanded:

html<meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1">

The max-snippet:-1 tells Google it can use the full meta description as a snippet (rather than cutting it). max-image-preview:large enables large image previews in Google Discover. These are opt-in — if you don't set them, Google defaults to conservative limits.

The description cap in the template guide is 160 characters, which matches the common snippet cutoff for desktop results. Keeping descriptions under that limit isn't a Google requirement, but it avoids truncation in SERPs.

Canonical URL

Every page has an explicit <link rel="canonical">:

html<link rel="canonical" href="https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog">

The canonical is computed in the template from the og_url context variable, which is assembled in _handle_blog_post:

python"og_url": f"{url_prefix}/{slug}",

Where url_prefix is either https://weblog.antonnovikov.com or https://weblog.antonnovikov.com/ru depending on language. This is the minimal correct approach — the canonical should always be the definitive URL without tracking parameters, session tokens, or alternate forms.

Open Graph and Twitter Card

Post pages get the full Open Graph article set:

html<meta property="og:type"              content="article">
<meta property="og:title"             content="...">
<meta property="og:description"       content="...">
<meta property="og:url"               content="...">
<meta property="og:site_name"         content="weblog.antonnovikov.com">
<meta property="og:locale"            content="en_US">
<meta property="og:locale:alternate"  content="ru_RU">
<meta property="og:image"             content="...">
<meta property="og:image:width"       content="1200">
<meta property="og:image:height"      content="630">
<meta property="og:image:alt"         content="...">
<meta property="article:published_time" content="2026-06-05T00:00:00Z">
<meta property="article:modified_time"  content="...">
<meta property="article:author"       content="https://antonnovikov.com/">
<meta property="article:section"      content="DevOps">

The og:locale:alternate is rarely documented but matters for bilingual sites — it tells scrapers that an alternate locale version exists. The article:modified_time is populated from the file's filesystem mtime, not a hardcoded date. This means editing a post file automatically updates the modified timestamp without any manual work.

Twitter Card is summary_large_image with reading-time data:

html<meta name="twitter:card"    content="summary_large_image">
<meta name="twitter:label2"  content="Reading time">
<meta name="twitter:data2"   content="5 min">

The label/data pairs show up as custom metadata under Twitter link previews. Reading time is passed from the template context as og_read_time (default 5 min; can be set per-post in index.json as read_time).

JSON-LD structured data

This is the most verbose part. Each post page emits two JSON-LD blocks: BlogPosting and BreadcrumbList.

The BlogPosting schema:

json{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Post title",
  "description": "Post description",
  "url": "https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog",
  "mainEntityOfPage": "https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog",
  "datePublished": "2026-06-05T00:00:00Z",
  "dateModified": "...",
  "author": { "@type": "Person", "name": "Anton Novikov", "url": "https://antonnovikov.com/" },
  "publisher": { "@type": "Person", "name": "Anton Novikov", "url": "https://antonnovikov.com/" },
  "image": { "@type": "ImageObject", "url": "...", "width": 1200, "height": 630 },
  "keywords": "kubernetes, k0s, helm",
  "inLanguage": "en",
  "isPartOf": { "@type": "Blog", "name": "Anton Novikov — weblog", "url": "https://weblog.antonnovikov.com/" },
  "wordCount": 1240,
  "timeRequired": "PT6M"
}

wordCount is computed from the raw Markdown source (simple .split() count) at render time and injected into the template context. timeRequired uses ISO 8601 duration format — PT6M means 6 minutes. Google uses wordCount and timeRequired to populate rich snippet metadata, and they appear in Google Search Console's rich result test.

The BreadcrumbList block provides the breadcrumb trail visible in search results:

json{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1, "name": "weblog", "item": "https://weblog.antonnovikov.com/" },
    { "@type": "ListItem", "position": 2, "name": "Post title", "item": "https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog" }
  ]
}

The index page and tag/series listing pages get WebSite, Blog, and CollectionPage schemas instead.

hreflang for bilingual content

Every post that exists in both languages gets hreflang link tags in <head>:

html<link rel="alternate" hreflang="en" href="https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog">
<link rel="alternate" hreflang="ru" href="https://weblog.antonnovikov.com/ru/2026-06-05-seo-fastapi-blog">
<link rel="alternate" hreflang="x-default" href="https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog">

The same pairs are also emitted in the sitemap, because Google recommends specifying hreflang in both places. The sitemap generation (generate_sitemap_xml()) cross-references EN and RU slug sets to find posts that exist in both languages:

pythonen_slugs = {p.get("slug") for p in en_posts}
ru_slugs = {p.get("slug") for p in ru_posts}
both_slugs = en_slugs & ru_slugs

For bilingual posts, <xhtml:link> elements are appended inside the <url> block:

xml<url>
  <loc>https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog</loc>
  <lastmod>2026-06-05</lastmod>
  <changefreq>never</changefreq>
  <priority>0.8</priority>
  <xhtml:link rel="alternate" hreflang="en"        href="https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog"/>
  <xhtml:link rel="alternate" hreflang="ru"        href="https://weblog.antonnovikov.com/ru/2026-06-05-seo-fastapi-blog"/>
  <xhtml:link rel="alternate" hreflang="x-default" href="https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog"/>
</url>

The changefreq for posts is never — published posts don't change content, only metadata. The lastmod comes from the post's date field in index.json.

Series navigation: rel=prev / rel=next

Posts that belong to a series get rel=prev and rel=next link tags pointing to adjacent posts in the series (ordered by date ascending):

html<link rel="prev" href="https://weblog.antonnovikov.com/2026-06-03-fastapi-s3-yandex-cloud">

Google deprecated these for pagination in 2019 but they're still useful for Yandex and for RSS readers that respect sequential navigation.

Atom feed

There are two Atom 1.0 feeds: /feed.xml (EN) and /ru/feed.xml (RU), each containing the last 20 posts with full HTML content:

xml<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Anton Novikov — weblog</title>
  <link href="https://weblog.antonnovikov.com/"/>
  <link rel="self" href="https://weblog.antonnovikov.com/feed.xml"/>
  <entry>
    <id>https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog</id>
    <link href="https://weblog.antonnovikov.com/2026-06-05-seo-fastapi-blog"/>
    <title>Technical SEO on a FastAPI blog</title>
    <updated>2026-06-05T00:00:00Z</updated>
    <summary>...</summary>
    <content type="html">...rendered HTML...</content>
    <category term="seo"/>
  </entry>
</feed>

Full content in the feed (as opposed to summary-only) improves usability for RSS readers and is indexed by some feed aggregators. The feed is cached in memory for 1 hour (_FEED_TTL = 3600) using a TTLCache.

The feed link is declared in every page's <head>:

html<link rel="alternate" type="application/atom+xml"
      title="Anton Novikov — weblog"
      href="https://weblog.antonnovikov.com/feed.xml">

IndexNow

When a new post is published via the admin API, an IndexNow ping is sent to both Bing and Yandex:

pythonasync def _indexnow_ping(urls: list[str]) -> None:
    payload = {
        "host": "weblog.antonnovikov.com",
        "key": "0c887d6665c1de09334e138e0c31962c",
        "keyLocation": "https://weblog.antonnovikov.com/0c887d6665c1de09334e138e0c31962c.txt",
        "urlList": urls[:10000],
    }
    endpoints = [
        "https://api.indexnow.org/indexnow",
        "https://yandex.com/indexnow",
    ]
    async with httpx.AsyncClient(timeout=10.0) as client:
        for ep in endpoints:
            r = await client.post(ep, json=payload, ...)

The key verification file lives at /0c887d6665c1de09334e138e0c31962c.txt in the project root and is served statically. Without it, search engines reject the IndexNow submission (they fetch the file to verify ownership). The ping is fire-and-forget — failures are logged as warnings but don't affect the API response.

Sitemap endpoint

The sitemap is generated dynamically at /weblog/sitemap.xml via generate_sitemap_xml(), which reads from the in-memory index caches and builds the full XML. It includes:

Static pages (antonnovikov.com, cv.antonnovikov.com, weblog.antonnovikov.com)
Every EN and RU post with lastmod, changefreq=never, priority=0.8
Tag listing pages for all known tags with changefreq=weekly, priority=0.5

There's a separate static sitemap.xml at the repo root for the main antonnovikov.com domain (the personal homepage), which is a simple static file not generated by the app.

robots.txt

textUser-agent: *
Allow: /

Disallow: /weblog/api/
Disallow: /weblog/stats
Disallow: /*.json$
Allow: /weblog/index.json
Allow: /weblog/index-ru.json

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

AI training crawlers (GPTBot, ChatGPT-User, CCBot, anthropic-ai) are blocked entirely. The API endpoints are disallowed (no indexing value, and they'd show up as weird URLs). The index.json files are explicitly allowed back in after the *.json$ pattern, because they're public post indexes.

Wiring it into FastAPI

The template context for a post page is built entirely in _handle_blog_post():

pythonctx: dict = {
    "og_title":       meta.get("title", slug),
    "og_description": meta.get("description", ""),
    "og_url":         f"{url_prefix}/{slug}",
    "og_date":        meta.get("date", ""),
    "og_modified":    og_modified,      # from file mtime
    "og_tags":        meta.get("tags", []),
    "og_read_time":   meta.get("read_time", 5),
    "og_word_count":  word_count or 0,  # counted from raw markdown
    "og_slug":        slug,
    "og_image":       meta.get("cover_image", ""),
    "og_body":        body_html,
    "og_series_prev": og_series_prev,
    "og_series_next": og_series_next,
}

The meta dict comes from _slug_index_en / _slug_index_ru, which are plain Python dicts rebuilt from index.json whenever the index cache is refreshed (every 5 minutes). No database query, no filesystem stat per request — just a dict lookup.

og_modified is assembled from the file's mtime:

pythonmtime = md_path.stat().st_mtime
og_modified = email.utils.formatdate(mtime, usegmt=False)[:16].rstrip()

This gives an ISO-ish string like "Wed, 05 Jun 2026" that gets injected into dateModified in the JSON-LD block. Not a perfect ISO 8601 date, but good enough for the structured data validator.

What can go wrong

IndexNow key file missing. If you redeploy without the 0c887d6665c1de09334e138e0c31962c.txt file in place, every IndexNow ping will return 403 Forbidden. The logs will show IndexNow https://api.indexnow.org/indexnow → 403. Verify with:

bashcurl -I https://weblog.antonnovikov.com/0c887d6665c1de09334e138e0c31962c.txt

Should return 200 OK. The file just needs to contain the key string.

hreflang only in <head> but not in sitemap, or vice versa. Google recommends consistency between both. If the post exists in EN but not RU, the hreflang blocks are omitted entirely (the both_slugs intersection handles this). If you add a RU post without the matching EN post, it won't get hreflang either — you'll see it in the sitemap as a standalone URL. That's correct behavior, not a bug.

Canonical URL with trailing slash mismatch. The canonical for the index page is https://weblog.antonnovikov.com/ (with trailing slash). The post pages have no trailing slash. Mixing these up causes soft 404 issues in Search Console. The template always uses og_url exactly as passed from the router — if the router produces the wrong form, the canonical will be wrong too. Check with curl -sI https://weblog.antonnovikov.com/some-slug and verify the <link rel="canonical"> in the response HTML matches the URL you hit.

Summary

Meta description, robots directives, and canonical come from index.json metadata and file mtime
Open Graph article tags and Twitter Card are generated per-post; the index page gets website/blog types instead
JSON-LD emits BlogPosting + BreadcrumbList for posts, WebSite + Blog for index pages
hreflang is set in both <head> links and sitemap; only posts that exist in both languages get the alternates
Atom feeds carry full HTML content, cached 1 hour in memory
IndexNow fires on publish to Bing and Yandex; requires the key file at a known URL
robots.txt blocks AI training crawlers and API endpoints, allows index JSON files explicitly