CT No.62: When 404s and 301s swarm

The tools and techniques you'll need to combat website zombies

Oct 22, 2020

The weather gods gifted Minneapolis several inches of snow on Tuesday, and I’m not happy about it! But if I don’t feel like going outside, I can cuddle up and watch scary movies, which I’m happy to do for the foreseeable future.

In this week’s issue:

Killing 404 and 301 zombie links on your website
A review of the best zombie killing tool, Screaming Frog
Content tech links of the week

If this is your first time on The Content Technologist, you can

Subscriptions are free forever for the first 1,000 subscribers, and we’re close… very close… with more Content Technologist features in the works.

The undead links of websites past: How zombies get in the way of web crawlers

On the internet there are zombies everywhere. Zombies are different from bots; they represent what once was but is no longer, the remnants of our digital past, the bits that were once human.

Zombies live at suck.com and televisionwithoutpity.com and wherever Alex Balk’s byline used to appear.

Snowfall, the interactive content from the New York Times, is is a bit of a zombie these days. The video thumbnails load, but the video doesn’t play.

As research for this newsletter (so committed), I watched Night of the Living Dead for the first time and totally loved it.

Social network zombies are abundant: not only Friendster and MySpace but also the network hoppers who stop using their Facebook/Insta/Twitter for a year, then get a new outlook on life and make a different Facebook profile, quit after a few months, then repeat until they have 17 zombie social media profiles and none of them works. Digital zombies are part of our offline identity.

I care for a zombie, a defunct local news magazine website that hasn’t been updated since 2012. I should probably retire the whole thing, but I’m proud of the work our team did, so it sits, benign and probably begging to be hacked with malware. (I don’t know if upgrading to https is worth the archival value. Probably not, but I can’t bear to let it go.)

If you run a website, you likely have some zombie-fied parts too — the links that go nowhere, or that redirect to new links that differ from the original content. Those are zombies.

But with the right knowhow and the right set of tools, we can fight zombies. Zombie fights are perfect for days when you need to want to get something done but have minimal motivation to do much of anything with thought.

I prefer Jarmusch’s vampire film, but the zombie one is pretty great too.

Because the way to fight zombies is to kill them off one by one, repeatedly, the same way they showed up in the first place.

How to fight zombie links (aka 404s and 301s)

First, we should understand what we’re dealing with. Two main types of zombies afflict the common website: 404 errors and 301 redirects.

The 404 errors are the most dangerous. Of course it’s bad for user experience if someone clicks on a link that doesn’t work; that’s a no-brainer. But most often 404 errors are hiding deep within the architecture of your website, and your users don’t find them.

What can find those 404 errors deep in your crypts and labyrinthine hallways to nowhere? Search engines.

The one on the left is DuckDuckGo.

A primer for the unfamiliar: Search engines scan all the text and metadata on your website, following every link to its ultimate end—the process is called a crawl. The bot in charge? A spider.* The spider will follow every link it can until you tell it not to, or it runs out of so-called crawl budget.**

Most websites don’t need to plan for crawl budget; if you have under 5,000 pages and everything’s working, search spiders are gonna crawl right through that buddy in no time.

But if the spider finds a dead link, or worse, a hallway full of 404 errors, it’s gonna call “done” way sooner. Spiders and zombies don’t mix. The spider sees one or two zombies and anticipates many more, which, if you’ve seen the movies, is a solid zombie prevention strategy.

There are always more where that came from.

If you want a search engine spider to see all the glorious content you have created and report back to Google, get your zombie 404s out of the way. They are undead, broken and going nowhere.

*It’s my 7th year deep in SEO and I still think the name is cute.

**I generally prefer to link to news/non-content marketing sources, but in this case, the “news” or specialist blog has ads all over it and loads like garbage. I very much prefer the clean, comprehensive version that explains the concept. This is one example of “how to get a link from a specialist even if your website doesn’t rank as high and you’re not an ‘official’ source.”

Combatting the not-dead-yet set: 301 redirects

Developers, UX folks, SEOs, users, everyone knows broken links are bad, so we build safeguards to protect from 404 broken link zombies. We cover the doors and walls with whatever plywood we have available. We mitigate the zombie impact.

When URLs are changed, the best way to communicate that change to a search engine is to implement a 301 redirect. The 301 http status code is the digital equivalent of post office mail forwarding: we don’t live here anymore, so please deliver mail to our new home.

Implementing 301 redirects is a perfect fix when you’re changing a large number of URLs or even domains. The 301 redirect indicates permanence: the old content is lost and gone forever and replaced with new content.

Redirect maps are central in any website redesign process, and good content managers have redirect processes in place to eliminate 404 errors whenever a URL is changed on their site. Take a piece of content down and throw up a new 301 redirect to guide both users and spiders from the old, busted URL to the new hotness. Make sure the 301 is in place in case any readers have bookmarked or linked to the content from external sites.

It works… for a bit.

Just as your best friend and sidekick will likely become zombiefied before the end of the film, 301 redirects can come back to bite you. One 301 redirect is fine. However, two or more 301 redirects in a row indicates that you may have a zombie problem.

The mail forwarding analogy works best, even though it has nothing to do with zombies: At least in the U.S., the post office will implement forwarding from one old address to a new address once, for a few months. The USPS will not follow you around from home to home, trying to find you at your new address. The system can’t handle that kind of change.

Search crawlers behave similarly. Search spiders will crawl through one 301 redirect, maybe two, and say, “ok, great, the new content is at a new address and I will let everyone know about it.”

When there are multiple 301 redirects in a row — especially within the same domain — that’s a signal to a spider that hey! There may be zombies. There may be 404 errors, more 301 errors, who knows. The spider just knows that the content is harder to reach and may turn away in favor of clearer, zombie-free roads.

Search engine spiders will absolutely avoid the train to Busan.

Technically, the links aren’t broken and if a user follows the link, they’ll arrive at the new content eventually. But they definitely appear shadier than working links — we’ve all been redirected to an unfamiliar website before. Where are you taking me? we ask the redirected. Will I find the content I seek? Or will there be malware?***

For search engines, redirects take up time and energy and crawl budgets. Like all automations, computers are doing some heavy lifting on your behalf. It’s in everyone’s best interest to make the path to content as unobstructed and zombie-free as possible.

***Every time you use a URL shortener or QR code, you’re using a 301 redirect. Make sure that’s the only 301 redirect before you get to the content. I’ve learned from experience — and maybe this is only my generation who downloaded boatloads of viruses onto our college computers while we were pirating music and movies? — but unless I absolutely trust you as a source, I’m not clicking on your random URL shortener if you don’t tell me exactly where I’m going.

301 redirects and website redesigns

If you’re redesigning a website and changing URLs, you are already going to have a slew of 301 redirects in place. That’s why it’s so crucial to ensure that there are no other internal redirects upon launch: Redirect chains are bad for users, for search crawlers, and they add up.

How does this happen with new websites? Usually a developer enters a URL in building a page that’s not the final URL. Either they didn’t have the final list of URLs or, even more often, someone flubbed a typo and all the automated links in the site’s most important directory say “contnet” instead of “content.”

The error is discovered during the QA/proofing process and yeah: it’s a huge pain to change all the internal links to every page whose primary directory is /contnet-management/ instead of /content-management/.

What often happens: to ensure the launch timeline stays on track, developers implement a quick redirect so every link on the site that pointed to the typo now redirects to the fancy new page. But it’s about as sustainable as punching a zombie in the arm.

Implementing internal 301s before a website is even launched is the equivalent to a zombie movie with the first scene set in a graveyard. The undead are on their way and they’re going to show up sooner rather than later.

The URL typo that never ends…

Yes, it’s a massive pain to change all of your internal links before launch. I totally get it. Hopefully you’re working with a CMS that lets you write a script to rewrite internal links en masse. With many newer CMSes, especially the no-code ones that are gaining popularity, you may have to rewrite them all individually.

But leaving them in means you’re starting with a mess, and search spiders don’t like messes.

Redesigns already mean that you’re changing URLs on most content. Before spiders even get to your website they’ve already encountered one redirect. Once they find all of these others that say “contnet” but redirect to “content,” they’ve established that your site has two redirects in a row. A redirect chain, the name for two or more 301s in a row, is a sign you may be kinda unhealthy. Maybe not trustworthy. Maybe decaying before your time. Maybe full of zombies.

If you want to be found, you have to ensure that spiders don’t think your site is filled with zombies.

How to find and fix your on-site 404 and 301 errors in one fell swoop

Obviously you can’t fix any errors outside of your own website. But I assure you, even the best of us gather broken links (404s) and redirects (301s) over the course of a year. They’re negligible errors in a one-off, but they add up and become a zombie problem.

To fix it, you need the right tools: a web crawler like Screaming Frog (freemium, reviewed below) or any other SEO/content audit tool should do the trick.

You can also access the Coverage report in Google Search Console, although it’s not quite as comprehensive and easy to use as a crawler. It’s like coming prepared to a zombie fight with a lighter, rather than a torch.

You can also crawl websites that are not live yet — and I suggest you crawl several times before you launch to ensure the healthiest website possible.

Here’s how to fix the errors on your own site.

Use a website crawler to scan your website just like a search engine. I use Screaming Frog, a freemium crawler reviewed here today. However, pretty much any website audit tool embedded into SEO or content auditing software can find your broken links.
Most web crawl tools present their crawl results in a spreadsheet. If you can, choose only HTML pages. Sort by status code.
Dive into the 404 errors first, as they are the most deadly: which pages on your site are linking to them? Good crawl tools will highlight all the internal pages that link to the 404 error. In Screaming Frog, they’re called Inlinks, but every tool is slightly different.
From there: hop into the healthy page your CMS with the unhealthy zombie links. Isolate the link you want to change, fix that link text, save and republish.
Next, tackle the 301 errors. Make sure you change the link to the page where the content is actually living. Copy and paste the URL from the link destination itself — do not type it in manually or you are asking for trouble.****
Finally, if your crawler highlights external, or offsite links, (most do), fix all the links offsite that end in 404s and 301s. Find better content worth linking, or eliminate the link entirely.

Here’s a little video I made, demonstrating the process with Screaming Frog. It’s quite easy, but it’s tedious.

Once you’re done cleaning your zombies, your users and search traffic will thank you.

****There are broken links on my website because I do not follow my own advice here.

I wish there were better tools for cleaning up social media zombie accounts as well, but tbh they’re all a little shady. While they’re finding dead followers in Instagram or Twitter, I’m always wondering what other data they’re gathering and sending. If you have social zombie-killing tools you’d recommend, reach out or

The gold standard in web crawlers: Screaming Frog review

When recommending tech for clients, my number one rule is: Don’t change what’s already working. Keep what’s already doing its job well.

Software companies and SEO agencies are always trying to make a better, sexier web crawler than Screaming Frog, which compared to all the pretty new tools out there, looks more like a DOS screen every year. But good old SF does its job, quickly, with few frills. It’s utilitarian, it’s worth every penny of the low license fee, and I know how to use it.

Screaming Frog is free for basic crawls up to 500 pages, and for most small websites, that’s all you need. An annual license for the unlimited full-featured version is under $200 in the current exchange rate.

Screaming Frog at a glance

In addition to identifying html status errors like 404s and 301s, Screaming Frog can illuminate a number of technical SEO issues, including:

Duplicate content identification
Missing titles, meta descriptions, and header tags
Overly long or incorrectly implemented page and meta content
Canonicalization
Site structure
Structured data/schema markup
Google snippet demo visualization

The tool generates high-quality sitemaps, links with Google Analytics and Search Console for individual page analysis, crawls Javascript websites that are harder to read than standard HTML, and generally supplies everything you need to audit a website’s health.

You can find oodles of resources to help you with Screaming Frog analysis. I like SF’s tutorials and this guide from Seer Interactive.

It’s mostly a professional-grade SEO or content auditing tool, but it’s worth learning if you want to get into the nitty gritty of website maintenance.

Content tech links of the week

Did something happen with Google this week? J/k. Here’s Matt Stoller, the monopoly expert, and his concise summary and links.
Google is indexing page passages now. Something like this is actually happening on my website, and I’ll explore in the future, for now here is Search Engine Land.
Ten years after many gambits to change the local news ecosystem, the future of local news is still strange. Here’s a couple of reasons to hope in Mark Stenberg’s Medialyte. And here’s why you should be wary, from the NYT.
Branded dives into how brands may fund misinformation even if they’ve already added a problematic website to their blocklist.
The Algorithmic Justice League is hosting a Drag vs. AI facial recognition event/virtual Halloween party, and it looks mega fun.
An odd and Halloweeny study of how our private playlist behaviors may shape public lives.

Fer funsies

My Halloween Spotify playlist, which is meant to be shuffled. All my favorite menacing pop and rock songs, some movie themes, yes Rocky Horror, no Disney, yes Werewolf Bar Mitzvah.
Some friends created a very fun Parenting x Scary Movies podcast, and even though I’m not a parent, it’s very much fun.

Visit The Content Technologist! About. Ethics. Features Legend. Pricing Legend.