“You aren’t gonna need it” (acronym: YAGNI) is a principle of extreme programming (XP) that states a programmer should not add functionality until deemed necessary. XP co-founder Ron Jeffries has written: “Always implement things when you actually need them, never when you just foresee that you need them.”
I reckon the way through these is to look at the data. In the context of semweb stuff, more or less the richer the data the more the potential for serendipity. With caveats : thin data using well-known vocabs is more likely to be reused than fat data using obscure ones. Also there is likely to be a point of diminishing returns, if you’ve got to do loads of coding to generate/pull out something fairly insignificant, your time is probably better spent elsewhere.
What brought this to mind was working on this NewsMonitor feed aggregator thing. How much to pull out of the (RSS 1.0, RSS 2.0, Atom…) feeds given that there’s a huge amount of variety in the wild. For simple display all that’s needed really is: entry URL, title, content, date. But, do you e.g. smush atom:published and atom:updated to a single date field? I’m using SAX-based parsers so it’s been fairly easy to pull out a fair bit of detail (despite lots of testing/tweaking because of variety in the wild). I’ve wound up with DateStamp, Person and Link objects in my model alongside the obvious Feed and Entry. But there was one point where I thought I might have gone too far given there is a deadline, that was with pulling links out of content (as dcterms:references). But these are totally webby artifacts, and something in the back of my mind decided this would be worthwhile…
As it happens, this should be the case sooner rather than later. While building the basic aggregator functionality, I’d forgotten a major chunk of the planned functionality – the ability to discover new feeds related to the given topic. For that, links in content are a must-have.