Discussion:
[hakyll] Tags and incremental builds
Daniel Gnoutcheff
2016-12-04 17:10:30 UTC
Permalink
Hello all,

I'm attempting to Hakyll-ify my employer's website [1], which contains
(among other things) a blog with tags. (Lots of tags.) I've
successfully used buildTags and tagsRules to generate per-tag archive
indexes, and it all looks more or less as I want.

However, I've noticed that every time I add/edit/delete a post, an
incremental build ('./site build') regenerates _all_ of the tag indexes,
including those for tags that are not (or were not) on the post I was
editing. Is this expected behavior? This is with hakyll v4.5.4.0 as
packaged in Debian 8.x/jessie.

I ask because this gets a bit annoying when your blog's author(s) have
managed to cram 110+ unique tags into 35 or so blogposts. ;) Many of
the tags are admittedly rather useless, but I'd like to avoid changing
historical content.

Anyway, thanks for the software, as always. :)

Later,
Daniel

[1] https://softwarefreedom.org/
--
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Gnoutcheff
2016-12-09 22:41:37 UTC
Permalink
Post by Daniel Gnoutcheff
However, I've noticed that every time I add/edit/delete a post, an
incremental build ('./site build') regenerates _all_ of the tag indexes,
including those for tags that are not (or were not) on the post I was
editing. Is this expected behavior?
To answer my own question: yes. tagsRules uses rulesExtraDependencies
to mark each tag index page as dependent upon all posts. If we don't do
this, things *almost* work, but then tag pages aren't properly
regenerated when someone adds an existing post to a tag.


Anyway, in a (possibly misguided) quest for tag pages that got built
only when needed, I came up with the abomination quoted below.

My strategy was this: for each tag, I cache the list of posts, along
with a version number that I increment each time the list changes. I
then embed the number in the Identifier path of the tag index page I
create. When someone adds a post to the tag, the Identifier changes,
tricking the dependency manager into rebuilding the page.

At first I tried using a hash of the post list, which promised to avoid
the need for a cache, but that became problematic when an edit caused
the contents of a tag's post list to cycle back to what it was sometime
in the past.

I trust that the mess below looks properly ridiculous to anyone in the
know. Having convinced myself that this can be done, I think I also
convinced myself that I shouldn't do this. :)
Post by Daniel Gnoutcheff
module MadTag
(createTagIndexes)
where
import Hakyll
import Control.Applicative ((<$>))
import Control.Monad (forM_)
import Data.Binary (Binary, encodeFile, decodeFile)
import qualified Data.Map as Map
import Data.Typeable (Typeable)
import Data.List (intercalate, sort)
import System.Directory (doesFileExist, renameFile)
createTagIndexes :: (Binary t, Eq t, Ord t, Binary a, Typeable a, Writable a)
=> Pattern
-> (Identifier -> Rules [t])
-> (t -> FilePath)
-> (t -> Pattern -> Compiler (Item a))
-> Rules ()
createTagIndexes pattern getTags' idxPathFor idxCompiler = do
entryAsocs <- getMatches pattern >>= mapM (\i -> fmap ((,) i) (getTags' i))
oldTags <- preprocess $
maybe Map.empty Map.fromList <$> maybeDecodeFile "_site/_hack_tagmap"
let
tagMap = Map.map sort . Map.fromListWith (++) $
[(tag, [entry]) | (entry, tags) <- entryAsocs, tag <- tags]
versionedTagMap = Map.mergeWithKey
(\k (oldGen, oldIds) newIds ->
if oldIds == newIds
then Just (oldGen, oldIds)
else Just (oldGen+1, newIds) )
(const Map.empty)
(Map.map $ \ids -> (0 :: Int, ids))
oldTags
tagMap
preprocess $ do
encodeFile "_cache/_hack_tagmap.new" (Map.toList versionedTagMap)
renameFile "_cache/_hack_tagmap.new" "_cache/_hack_tagmap"
forM_ (Map.toList versionedTagMap) $ \ (tag, (version, entries)) ->
let
idxPath = idxPathFor tag
idxIdent = fromFilePath $ intercalate "/"
[ "_virtual"
, idxPath
, show version
]
in create [idxIdent] $ do
route $ constRoute idxPath
compile $ idxCompiler tag (fromList entries)
maybeDecodeFile :: (Binary a)
=> FilePath
-> IO (Maybe a)
maybeDecodeFile path = do
exists <- doesFileExist path
if not exists
then return Nothing
else Just <$> decodeFile path
Later,
Daniel
--
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jasper Van der Jeugt
2016-12-10 14:42:40 UTC
Permalink
Yeah, since any post can be changed to include any tag, there is not
much one can do unfortunately.

There is a `cached` [1] combinator which is used to speed up e.g.
`pandocCompiler`, but it invalidates the cache if the underlying
resource has changed, so this is not what you are looking for.

However, I believe that it is possible (and not too hard) to add
another `cached` combinator which takes a checksum, and then compares
if the checksum has changed compared to the last run. If they're the
same, it re-uses the the item that's already on disk. This combinator
would then be relatively easy to integrate into where you create the
individual tag pages.

[1]: https://github.com/jaspervdj/hakyll/blob/hakyll-4.9.2.0/src/Hakyll/Core/Compiler.hs#L143-L167

Peace,
Jasper

On Fri, Dec 9, 2016 at 11:41 PM, Daniel Gnoutcheff
Post by Daniel Gnoutcheff
Post by Daniel Gnoutcheff
However, I've noticed that every time I add/edit/delete a post, an
incremental build ('./site build') regenerates _all_ of the tag indexes,
including those for tags that are not (or were not) on the post I was
editing. Is this expected behavior?
To answer my own question: yes. tagsRules uses rulesExtraDependencies
to mark each tag index page as dependent upon all posts. If we don't do
this, things *almost* work, but then tag pages aren't properly
regenerated when someone adds an existing post to a tag.
Anyway, in a (possibly misguided) quest for tag pages that got built
only when needed, I came up with the abomination quoted below.
My strategy was this: for each tag, I cache the list of posts, along
with a version number that I increment each time the list changes. I
then embed the number in the Identifier path of the tag index page I
create. When someone adds a post to the tag, the Identifier changes,
tricking the dependency manager into rebuilding the page.
At first I tried using a hash of the post list, which promised to avoid
the need for a cache, but that became problematic when an edit caused
the contents of a tag's post list to cycle back to what it was sometime
in the past.
I trust that the mess below looks properly ridiculous to anyone in the
know. Having convinced myself that this can be done, I think I also
convinced myself that I shouldn't do this. :)
Post by Daniel Gnoutcheff
module MadTag
(createTagIndexes)
where
import Hakyll
import Control.Applicative ((<$>))
import Control.Monad (forM_)
import Data.Binary (Binary, encodeFile, decodeFile)
import qualified Data.Map as Map
import Data.Typeable (Typeable)
import Data.List (intercalate, sort)
import System.Directory (doesFileExist, renameFile)
createTagIndexes :: (Binary t, Eq t, Ord t, Binary a, Typeable a, Writable a)
=> Pattern
-> (Identifier -> Rules [t])
-> (t -> FilePath)
-> (t -> Pattern -> Compiler (Item a))
-> Rules ()
createTagIndexes pattern getTags' idxPathFor idxCompiler = do
entryAsocs <- getMatches pattern >>= mapM (\i -> fmap ((,) i) (getTags' i))
oldTags <- preprocess $
maybe Map.empty Map.fromList <$> maybeDecodeFile "_site/_hack_tagmap"
let
tagMap = Map.map sort . Map.fromListWith (++) $
[(tag, [entry]) | (entry, tags) <- entryAsocs, tag <- tags]
versionedTagMap = Map.mergeWithKey
(\k (oldGen, oldIds) newIds ->
if oldIds == newIds
then Just (oldGen, oldIds)
else Just (oldGen+1, newIds) )
(const Map.empty)
(Map.map $ \ids -> (0 :: Int, ids))
oldTags
tagMap
preprocess $ do
encodeFile "_cache/_hack_tagmap.new" (Map.toList versionedTagMap)
renameFile "_cache/_hack_tagmap.new" "_cache/_hack_tagmap"
forM_ (Map.toList versionedTagMap) $ \ (tag, (version, entries)) ->
let
idxPath = idxPathFor tag
idxIdent = fromFilePath $ intercalate "/"
[ "_virtual"
, idxPath
, show version
]
in create [idxIdent] $ do
route $ constRoute idxPath
compile $ idxCompiler tag (fromList entries)
maybeDecodeFile :: (Binary a)
=> FilePath
-> IO (Maybe a)
maybeDecodeFile path = do
exists <- doesFileExist path
if not exists
then return Nothing
else Just <$> decodeFile path
Later,
Daniel
--
You received this message because you are subscribed to the Google Groups "hakyll" group.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "hakyll" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hakyll+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...