April 22, 2003

Blogmatcher redux, RSS Search, Microsoft, HTML Tidy and Sebastian

A couple days ago I mentioned Ryo and his pet hermit crab: BlogMatcher. It appears that the crab has moved into a roomier conk shell, as Ryo stated:

I completely rewrote the searching/matching code in C (instead of PHP), and results pop up in less than a second now (in fact, often < 0.5 seconds). I've also added pagination for easier navigation.

I've refined the search algorithm a bit, partially inspired by your comments. I added some very basic statistical analysis in the scoring algorithm so that common URLs are scored down while uncommon URLs are marked up (at least in theory). It's still a little skewered for your blog, but I think the results have been improved somewhat. Also, the index has grown to 2900+ blogs (and I'm sure it'll continue to grow).

I gave it another whirl and was impressed with the results -- best of all there was no bitter aftertaste: Pure, Clean & Fresh.

Ryo was good enough to put enough an FAQ for your reading pleasure as well. The only major change that I would like to see added, is some sort of static page, to where people can click and automatically see the search results (as opposed to typing it in everytime).

In addition to BlogMatcher scanning Weblogs.com, no less than three other services do as well: Organica, Metaweblog and Penthouse (notice how everything semi-important with me always degenerates into something to do with sex).

RSS Search - as the same suggests, it is a search engine that scours RSS feeds. Now just like Feedster, typing in my name reveals that Tim Robbins is the Big Man on Campus. Again, he's a great actor, but in the blogosphere he best step down. Submit your blog here and hope for the best.

While looking for some dirt on the Opteron, I stumbled upon this article at ExtremeTech discussing the research and development projects over at Microsoft. Yes, despite my disdain for their business model they still work on some neat-o projects. One that I was particularly interested in was the StuffI'veSeen project. This software basically tracks and tags every bit of digital information you bump into throughout the day -- I'd like to see this installed on John Poindexter's computer.

Be sure to also visit Microsoft's Research Project page, the Terraserver is the same one I used to come up with my GeoURL.

HTML Tidy - This handy piece of software fixes those annoying incompliant markup schemes that people like myself create. In addition to the downloadable open-source version of the application, you can also use this browser-based online copy.

I'll end this with a remembrance for Sebastian, the pet hermit crab my younger sister gave me 7 years ago and I forgot to feed... Let us never mention his name again:

sebastian.gif

Posted by Tim at April 22, 2003 11:34 PM | TrackBack
Comments

you do know talking about crabs and sex in the same story is just wrong right? :)

Posted by: gnome-girl at April 23, 2003 12:05 AM

Hehe, maybe this is why I'm not married.

Posted by: Tim at April 23, 2003 12:23 AM

I think it was quite humorous. Marriage, hopefully not too far off. =) I am such an anxious little female. *Didi*

Posted by: Didi at April 23, 2003 12:26 PM

Thanks for the straight dope. That's such a much-needed tool. Such a wha? My grammar is going to hell. Once again the blogosphere points me to MingTV as a soulmate. But the guy looks scary!

Posted by: blogal villager at April 23, 2003 12:30 PM

>BlogMatcher
The service that BlogMatcher is doing is quite useful, and not just for people that are new to the blogosphere. I mentioned in Ryo's comments that there are some similar tools ( http://www.lazyweb.org/archives/004181.htm ) and gave him some suggestions ( http://blogs.iloha.net/ryochiji/entries/112.shtml#comments ), but it doesn't look like he's implemented them yet. Probably because he can't read my now-tagless HTML


>scanning Weblogs.com
Many, many tools do this. Daypop, Blogdex, Technorati, RootBlog, etc. http://www.brainoff.com/geoblog/ is a neat new tool that uses weblogs.com

Btw, the new blogger has a changes.xml file.

>RSS Search
Couple of these out now (see http://www.faganfinder.com/blogs/ ). Daypop searches RSS already, but some imrovements are coming in that direction ( http://www.danchan.com/weblog/daypop/62076 ). BlogDigger's got neat new stuff coming out integrating topics. I include Feeder on the previous link but it has been down for a while.

Also, Waypath's reworking seems to be up now. Must go take a look...

StuffI'veSeen (didn't follow your link but read aobut it elsewhere) seems like it could be extremely useful.

I send my respect to Sebastian.

Posted by: Michael Fagan at April 23, 2003 02:28 PM

what does that mean if you're number 2 on my chart with a score of 20??? Some hairyball dude was number 1 (I am little scared of that one) :)

Posted by: gnome-girl at April 23, 2003 04:52 PM

lol -- the hairyball guy is the admin at Blogalization.org.

You can always bribe Ryo with some risque furniture porn -- upholstery never looked better!

Posted by: Tim Swanson at April 23, 2003 07:34 PM

Tim>You can always bribe Ryo with some risque furniture porn -- upholstery never looked better!

I'm hoping that's an inside joke ;-P I'm a difficult person to bribe... unless you have bandwidth or spare CPU cycles.

Michael>Probably because he can't read my now-tagless HTML

I'm listening ;-) Right now I'm working on key-word search (instead of just links) but I'm still trying to figure how to sift through 200MB (and growing) worth of blogs.

Posted by: Ryo Chijiiwa at April 24, 2003 04:27 AM
Post a comment









Remember personal info?