Gwern Branwen
2016-12-27 15:55:47 UTC
When a website becomes relatively large and old, it's considered good
practice to set up logging or Google Webmaster Tools and look for
traffic to broken URLs on your site (misspelled URLs, files or pages
which have been moved/renamed etc) and set up redirects to the right
page. This saves time for all those visitors, and it also effectively
increases your traffic because many of those visitors would've given
up & been unable to find the right page and apparently search engines
give a little bonus to 'clean' sites.
(My interest in this is prompted by recently looking at an
institution's Google Analytics & Webmaster Tools and noting that
perhaps 1-5% of their site traffic is to broken URLs, and a decent
fraction of their search engine traffic is being driven by being
unable to find a particular page. I have a fair number of broken links
on gwern.net myself, which I have tolerated mostly because I didn't
know how to fix it inside the static site approach.)
The usual way to do this is to look at the broken URLs, and set up a
mapping from broken to correct in your Apache rewrite rules or the
equivalent in your web server, so any visitor loading a broken URL
gets a 301 redirect. This doesn't require any additional changes to
your website and works well. Along the lines of
Redirect 301 /old-page.html http://www.mydomain.com/new-page.html
But most Hakyll users will be using Amazon S3 or Github or another
file host precisely to avoid things like running your own Apache. How
can we fix broken links?
Amazon S3 has one supported method, but last I checked, it's weird -
it requires a file at the broken URL/filename and it also needs on-S3
metadata to be toggled. So it clutters up your Hakyll directory,
requires additional manual intervention for each URL, and could break
at any time if a sync tool changes the metadata (perhaps by
accidentally resetting it).
The most common non-server-based method seems to be HTML redirection:
writing a mostly-empty HTML file at the broken URL with a special META
tag, like this:
<meta http-equiv="refresh" content="0;
url=http://www.mydomain.com/new-page.html">
Possibly augmented with a 'canonical' link:
<link rel="canonical" href="http://www.mydomain.com/new-page.html">
(There are some other forms of HTML redirection using JS and iframes,
but they are worse. The META tag method doesn't require JS and is
widely used and understood by all search engines.)
This also requires cluttering up your source directory with lots of
repetitive HTML files... unless you generate them.
It would be straightforward to write a function `createRedirects`
which takes a target directory ("_site") and a `Map broken working` of
URLs, and for each pair broken/working, writes to
`target/relativeLink(broken)` a HTML template of
<html><head><meta http-equiv="refresh" content="0;
url=$working"><link rel="canonical" href="$working"></head>
<body><p>The page has moved to: <a href="$working">this
page</a></p></body></html>
Then this function could be added somewhere in `main` like
`createRedirects "_site" linkMap` and it'd generate an indefinite
number of redirects. So the user only needs to set up the mapping
inside their `hakyll.hs`, without cluttering the source code directory
or needing to use a web server - the clutter only exists inside the
compiled site / host.
This would be general enough that it'd be worth adding to Hakyll, I think.
How do other Hakyll users solve this problem? Has anyone done
something similar to my `createRedirects` suggestion?
practice to set up logging or Google Webmaster Tools and look for
traffic to broken URLs on your site (misspelled URLs, files or pages
which have been moved/renamed etc) and set up redirects to the right
page. This saves time for all those visitors, and it also effectively
increases your traffic because many of those visitors would've given
up & been unable to find the right page and apparently search engines
give a little bonus to 'clean' sites.
(My interest in this is prompted by recently looking at an
institution's Google Analytics & Webmaster Tools and noting that
perhaps 1-5% of their site traffic is to broken URLs, and a decent
fraction of their search engine traffic is being driven by being
unable to find a particular page. I have a fair number of broken links
on gwern.net myself, which I have tolerated mostly because I didn't
know how to fix it inside the static site approach.)
The usual way to do this is to look at the broken URLs, and set up a
mapping from broken to correct in your Apache rewrite rules or the
equivalent in your web server, so any visitor loading a broken URL
gets a 301 redirect. This doesn't require any additional changes to
your website and works well. Along the lines of
Redirect 301 /old-page.html http://www.mydomain.com/new-page.html
But most Hakyll users will be using Amazon S3 or Github or another
file host precisely to avoid things like running your own Apache. How
can we fix broken links?
Amazon S3 has one supported method, but last I checked, it's weird -
it requires a file at the broken URL/filename and it also needs on-S3
metadata to be toggled. So it clutters up your Hakyll directory,
requires additional manual intervention for each URL, and could break
at any time if a sync tool changes the metadata (perhaps by
accidentally resetting it).
The most common non-server-based method seems to be HTML redirection:
writing a mostly-empty HTML file at the broken URL with a special META
tag, like this:
<meta http-equiv="refresh" content="0;
url=http://www.mydomain.com/new-page.html">
Possibly augmented with a 'canonical' link:
<link rel="canonical" href="http://www.mydomain.com/new-page.html">
(There are some other forms of HTML redirection using JS and iframes,
but they are worse. The META tag method doesn't require JS and is
widely used and understood by all search engines.)
This also requires cluttering up your source directory with lots of
repetitive HTML files... unless you generate them.
It would be straightforward to write a function `createRedirects`
which takes a target directory ("_site") and a `Map broken working` of
URLs, and for each pair broken/working, writes to
`target/relativeLink(broken)` a HTML template of
<html><head><meta http-equiv="refresh" content="0;
url=$working"><link rel="canonical" href="$working"></head>
<body><p>The page has moved to: <a href="$working">this
page</a></p></body></html>
Then this function could be added somewhere in `main` like
`createRedirects "_site" linkMap` and it'd generate an indefinite
number of redirects. So the user only needs to set up the mapping
inside their `hakyll.hs`, without cluttering the source code directory
or needing to use a web server - the clutter only exists inside the
compiled site / host.
This would be general enough that it'd be worth adding to Hakyll, I think.
How do other Hakyll users solve this problem? Has anyone done
something similar to my `createRedirects` suggestion?
--
gwern
https://www.gwern.net
--
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
gwern
https://www.gwern.net
--
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.