Filtering at the library – how it’s going

Update: Skagirlie posted a comment – she says “Yeah, I like that my library blog is blocked at work just because the URL has “girlie” in it. Bess apparently doesn’t like to look at meta tags, or it would know that it’s a library blog, not a “girlie” site.”

That’s what I’m talkin’ about. Her site was marked “Porn” by Bess – even though IT’S NOT. It’s a way cool library techie blog. Thankfully, I just checked Secure Computing’s (they own Bess) URL Checker, and Skagirlie’s blog is now categorized as a “Message/Bulletin Board.” Not technically correct, but much better!

Bess has an option to review websites that are incorrectly categorized. I know I reported this one (I imagine Skagirlie did, too :-). It’s comforting to know that Secure Computing DOES check those requests, and correct categories when warranted!


My library filters – we have since July 1, 2004. We use Secure Computing’s Bess product for our filtering software, and we filter at a pretty low level – we’ve been filtering the category of Pornography, and we’re allowing all the exceptions possible (the exceptions are Education, For Kids, History, Medical, Moderated, and Text/Spoken only). We have also just started filtering the Gambling category (more on that below). You can find the complete list of the categories that Bess uses on their website.

How the Process Works
I currently have the utter joy of checking websites out when a library customer submits a Site Review Request. here’s what happens:

  • the library customer goes to a website that is filtered, and gets the “you’ve been blocked” warning page.
  • He/she has the option to send a “site review request” to the library, so a staff member can review the website to see if it really should be filtered.
  • I get the “site review request,” and then check each website to see if it should be filtered or if it can be unblocked (all hopefully according to Missouri and CIPA laws).
  • And then I unblock the site if it “passes go.”

Although I’m currently “in charge” of this process, it will soon transfer over to various public services departments. I’ve been doing it to make sure everything works and to set up procedures for the whole filtering process.

Gathering Some Statistics
But since I have to do this, I thought I’d have some fun with it (now, now – I know what you’re thinking…). Once we install the full version of this software, we’ll supposedly be able to get statistics (I’m guessing it’ll report things like what website categories patrons have been browsing, and how many library customers and/or websites have been filtered). But until then, I have kept my own stats on the Site Review Requests: I have kept each filtering request since July 2004 (don’t worry – no names are attached to the requests so privacy is preserved), and have dumped each request into categorized folders so I can sift some statistics out of this heap.

These statistics show how many websites, categorized as Pornography by the filtering software, are really porn sites. But only for websites that a library customer has asked the library to review. Make sense? Not sure how scientific these stats are, but they are rather interesting….

My Findings
So – that adds up to 7 months worth of statistics. That totals 855 requests (through January 2005) for the library to review a website that was lumped into a certain category. How accurate do you think it was? Hmm? Can you guess? Try this percent on for size: 42%. Yep, that’s right. 42%! Out of 855 requests to review a website categorized as pornography, only 42% of those websites have REALLY been porn sites. Dang!

So what have the other 58% been? (I’m rounding the percentages, fyi):

  • 9% – broken sites (they either don’t exist, were turned off, or the server was down when it was accessed)
  • 9% – dating and/or social networking types of sites (all those russian dating websites)
  • 3% – music sites. Especially hip-hop artist pages
  • 20% – redirect pages, marketing forms, and domain name placeholder pages
  • 17% – the rest of the “non porn” sites

Actually, I’m going to make two conclusions:

1. The official “I have an MLS degree” conclusion: The filtering software isn’t doing a good job of filtering by category, because it lumped 58% more websites into a certain category than it needed to… and I was just looking at ONE category. There are a lot more categories, and I’m guessing the statistics would be similar in those categories, too. So that’s bad!

2. Non MLS, “it is helping staff and customers” conclusion: The filtering software, while it’s being a bit “over zealous,” IS categorizing a lot of sites correctly. Our public services staff aren’t having to play “web police” as much since we installed the filter. And a large percentage (29%) of the incorrectly categorized pages aren’t useful websites – they’re either broken sites, sites that no longer exist, redirected marketing scam pages or domain placeholder pages – all pages that most likely weren’t what the library customer had in mind in the first place. So both of those are good.

Plus, we’re also able to use the filter to enforce a library policy. Per our library’s policy, people can’t gamble in the library. So we decided to turn on the Gambling category, too. We just did this, so I don’t know what it’s going to filter. But still, I think it’s pretty cool that we can use the software that we were required to buy (we wanted the E-rate money) for other non-CIPA stuff.

I’d love to hear how other libraries are dealing with filtering! Feel free to comment on this blog or email me!