[hakyll] Tags and incremental builds
Daniel Gnoutcheff
2016-12-04 17:10:30 UTC
Hello all,

I'm attempting to Hakyll-ify my employer's website [1], which contains
(among other things) a blog with tags. (Lots of tags.) I've
successfully used buildTags and tagsRules to generate per-tag archive
indexes, and it all looks more or less as I want.

However, I've noticed that every time I add/edit/delete a post, an
incremental build ('./site build') regenerates _all_ of the tag indexes,
including those for tags that are not (or were not) on the post I was
editing. Is this expected behavior? This is with hakyll v4.5.4.0 as
packaged in Debian 8.x/jessie.

I ask because this gets a bit annoying when your blog's author(s) have
managed to cram 110+ unique tags into 35 or so blogposts. ;) Many of
the tags are admittedly rather useless, but I'd like to avoid changing
historical content.

Anyway, thanks for the software, as always. :)


[1] https://softwarefreedom.org/
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Gnoutcheff
2016-12-09 22:41:37 UTC
Post by Daniel Gnoutcheff
However, I've noticed that every time I add/edit/delete a post, an
incremental build ('./site build') regenerates _all_ of the tag indexes,
including those for tags that are not (or were not) on the post I was
editing. Is this expected behavior?
To answer my own question: yes. tagsRules uses rulesExtraDependencies
to mark each tag index page as dependent upon all posts. If we don't do
this, things *almost* work, but then tag pages aren't properly
regenerated when someone adds an existing post to a tag.

Anyway, in a (possibly misguided) quest for tag pages that got built
only when needed, I came up with the abomination quoted below.

My strategy was this: for each tag, I cache the list of posts, along
with a version number that I increment each time the list changes. I
then embed the number in the Identifier path of the tag index page I
create. When someone adds a post to the tag, the Identifier changes,
tricking the dependency manager into rebuilding the page.

At first I tried using a hash of the post list, which promised to avoid
the need for a cache, but that became problematic when an edit caused
the contents of a tag's post list to cycle back to what it was sometime
in the past.

I trust that the mess below looks properly ridiculous to anyone in the
know. Having convinced myself that this can be done, I think I also
convinced myself that I shouldn't do this. :)
Post by Daniel Gnoutcheff
module MadTag
import Hakyll
import Control.Applicative ((<$>))
import Control.Monad (forM_)
import Data.Binary (Binary, encodeFile, decodeFile)
import qualified Data.Map as Map
import Data.Typeable (Typeable)
import Data.List (intercalate, sort)
import System.Directory (doesFileExist, renameFile)
createTagIndexes :: (Binary t, Eq t, Ord t, Binary a, Typeable a, Writable a)
=> Pattern
-> (Identifier -> Rules [t])
-> (t -> FilePath)
-> (t -> Pattern -> Compiler (Item a))
-> Rules ()
createTagIndexes pattern getTags' idxPathFor idxCompiler = do
entryAsocs <- getMatches pattern >>= mapM (\i -> fmap ((,) i) (getTags' i))
oldTags <- preprocess $
maybe Map.empty Map.fromList <$> maybeDecodeFile "_site/_hack_tagmap"
tagMap = Map.map sort . Map.fromListWith (++) $
[(tag, [entry]) | (entry, tags) <- entryAsocs, tag <- tags]
versionedTagMap = Map.mergeWithKey
(\k (oldGen, oldIds) newIds ->
if oldIds == newIds
then Just (oldGen, oldIds)
else Just (oldGen+1, newIds) )
(const Map.empty)
(Map.map $ \ids -> (0 :: Int, ids))
preprocess $ do
encodeFile "_cache/_hack_tagmap.new" (Map.toList versionedTagMap)
renameFile "_cache/_hack_tagmap.new" "_cache/_hack_tagmap"
forM_ (Map.toList versionedTagMap) $ \ (tag, (version, entries)) ->
idxPath = idxPathFor tag
idxIdent = fromFilePath $ intercalate "/"
[ "_virtual"
, idxPath
, show version
in create [idxIdent] $ do
route $ constRoute idxPath
compile $ idxCompiler tag (fromList entries)
maybeDecodeFile :: (Binary a)
=> FilePath
-> IO (Maybe a)
maybeDecodeFile path = do
exists <- doesFileExist path
if not exists
then return Nothing
else Just <$> decodeFile path
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jasper Van der Jeugt
2016-12-10 14:42:40 UTC
Yeah, since any post can be changed to include any tag, there is not
much one can do unfortunately.

There is a `cached` [1] combinator which is used to speed up e.g.
`pandocCompiler`, but it invalidates the cache if the underlying
resource has changed, so this is not what you are looking for.

However, I believe that it is possible (and not too hard) to add
another `cached` combinator which takes a checksum, and then compares
if the checksum has changed compared to the last run. If they're the
same, it re-uses the the item that's already on disk. This combinator
would then be relatively easy to integrate into where you create the
individual tag pages.

[1]: https://github.com/jaspervdj/hakyll/blob/hakyll-


On Fri, Dec 9, 2016 at 11:41 PM, Daniel Gnoutcheff
Post by Daniel Gnoutcheff
Post by Daniel Gnoutcheff
However, I've noticed that every time I add/edit/delete a post, an
incremental build ('./site build') regenerates _all_ of the tag indexes,
including those for tags that are not (or were not) on the post I was
editing. Is this expected behavior?
To answer my own question: yes. tagsRules uses rulesExtraDependencies
to mark each tag index page as dependent upon all posts. If we don't do
this, things *almost* work, but then tag pages aren't properly
regenerated when someone adds an existing post to a tag.
Anyway, in a (possibly misguided) quest for tag pages that got built
only when needed, I came up with the abomination quoted below.
My strategy was this: for each tag, I cache the list of posts, along
with a version number that I increment each time the list changes. I
then embed the number in the Identifier path of the tag index page I
create. When someone adds a post to the tag, the Identifier changes,
tricking the dependency manager into rebuilding the page.
At first I tried using a hash of the post list, which promised to avoid
the need for a cache, but that became problematic when an edit caused
the contents of a tag's post list to cycle back to what it was sometime
in the past.
I trust that the mess below looks properly ridiculous to anyone in the
know. Having convinced myself that this can be done, I think I also
convinced myself that I shouldn't do this. :)
Post by Daniel Gnoutcheff
module MadTag
import Hakyll
import Control.Applicative ((<$>))
import Control.Monad (forM_)
import Data.Binary (Binary, encodeFile, decodeFile)
import qualified Data.Map as Map
import Data.Typeable (Typeable)
import Data.List (intercalate, sort)
import System.Directory (doesFileExist, renameFile)
createTagIndexes :: (Binary t, Eq t, Ord t, Binary a, Typeable a, Writable a)
=> Pattern
-> (Identifier -> Rules [t])
-> (t -> FilePath)
-> (t -> Pattern -> Compiler (Item a))
-> Rules ()
createTagIndexes pattern getTags' idxPathFor idxCompiler = do
entryAsocs <- getMatches pattern >>= mapM (\i -> fmap ((,) i) (getTags' i))
oldTags <- preprocess $
maybe Map.empty Map.fromList <$> maybeDecodeFile "_site/_hack_tagmap"
tagMap = Map.map sort . Map.fromListWith (++) $
[(tag, [entry]) | (entry, tags) <- entryAsocs, tag <- tags]
versionedTagMap = Map.mergeWithKey
(\k (oldGen, oldIds) newIds ->
if oldIds == newIds
then Just (oldGen, oldIds)
else Just (oldGen+1, newIds) )
(const Map.empty)
(Map.map $ \ids -> (0 :: Int, ids))
preprocess $ do
encodeFile "_cache/_hack_tagmap.new" (Map.toList versionedTagMap)
renameFile "_cache/_hack_tagmap.new" "_cache/_hack_tagmap"
forM_ (Map.toList versionedTagMap) $ \ (tag, (version, entries)) ->
idxPath = idxPathFor tag
idxIdent = fromFilePath $ intercalate "/"
[ "_virtual"
, idxPath
, show version
in create [idxIdent] $ do
route $ constRoute idxPath
compile $ idxCompiler tag (fromList entries)
maybeDecodeFile :: (Binary a)
=> FilePath
-> IO (Maybe a)
maybeDecodeFile path = do
exists <- doesFileExist path
if not exists
then return Nothing
else Just <$> decodeFile path
You received this message because you are subscribed to the Google Groups "hakyll" group.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.