Do Filters Work?

I just read Andy Woodworth’s post about filters, and was reminded about something. A couple days ago, I visited my church’s website while in the library. We filter both public and staff computers … and guess what I found (see the image above)? My church’s website was blocked, because 8e6 (our filtering provider) thinks it’s a porn site. Wow – my church is apparently much wilder than I thought!

  • OK – first off, my church isn’t really all that wild. Probably much the opposite!
  • Second – it’s most likely filtered because of overblocking. Some web filters block whole webhosting services because of content. For example, if the webhoster hosts 20 “naughty” sites and 2 “nice” sites, all 22 sites will be labeled “naughty” (until someone tells the filtering company they’re wrong – then they usually correct the problem).

Do filters work?
Honestly, yes and no. Yeah, sure – most of the “usual sites” can be blocked (but not all – filters don’t catch everything). And no – the example above is a great example of a filter in action, unfortunately.

Another complaint
I’m also going to complain about the Safelibraryproject website, and the ALA page they quote (from the Office of Intellectual Freedom). Because both sites seem to be putting a bit of spin on their ideas, to prove their points. Plus, there are some glaring problems on each page. Here’s what I mean:

Let’s start with Safe Library Project:

  • Just being picky here – guys, please get a proofreader! Your About page is labeled “Abou” – which would be forgivable if it weren’t for some other errors on the “Abou” page that could have easily been caught by proofing your content. Errors like these:
    • “Most all pornography commercial websites is hardcore” I think you meant “are” …
    • “the overwhelming amount of Internet porn is be soft-core” I think you meant “is” …
    • “This in not accurate” You are correct – not accurate at all!
  • Enough grammar cop stuff. How about this? “Most all pornography commercial websites is hardcore and therefore can be charged by prosecutors as obscene.” - ok. Can you prove that, with citations?
  •  “The seemingly endless number of free porn sites depicting actual or simulated sex and other lascivious depictions are also hardcore and can be charged as obscene.” Again, ok … “seemingly endless” … proof? With citations? “can be charged as obscene” … again – proof?
  • “Does ALA really think the American public is so uninformed…” The information you quote wasn’t really meant for the “American public.” It was meant for libraries creating public PC and Internet Access policies.
  • “The ALA site also strongly suggests that Internet filters are inadequate” – well, yeah – there’s a reason for that. See my example above.
I have no issue with their viewpoint (though I don’t agree). Viewpoints differ, and you have to have two sides for a debate. But if you make broad statements like they do, you should back them up with facts. Or you’re just blowing smoke.
And now for ALA. Go to the page Safe Library Project quotes (you have to copy/paste the link text, since for some odd reason they didn’t actually make it a link). I think some improvements are in order here, too. For example:
  • The paragraph Safe Library Project quotes is an odd one, to me anyway. For example … “In the millions of Web sites available on the Internet” – way more than “millions” now.
  • “there are some—often loosely called “pornography” – Loosely? What? Where did that statement come from?
  • “A very small fraction of those sexually explicit materials is actual obscenity or child pornography” - ok. That’s also pretty broad statement. Can you prove that, with citations?
  • This info hasn’t been updated for 10-11 years. A LOT has changed on the web in 11 years. Maybe time for a rewrite?
  • The “Related Files” link at the bottom of the page is a broken link. That makes ALA look a bit shabby IMHO.
So – phooey on the spin. Do you filter? Does it work? Do people complain? Is it as bad as the Safe Library Project people think? I don’t think so – what about you?

Fun with Filtering

For all you filtering fans (or anti-fans), check out this article: From Bess to Worse, at Slashdot. They claim that 30% of sites blocked by Bess are obvious errors. Wow. I checked this at my last library, and came up with 42% – that’s pretty bad (and pretty much matches what the article writer came up with).

Filtering might be a “have-to” in your neck of the woods – but you can work with your filtering vendor to get those errors down, or find a better filtering solution.

Any good filter suggestions?

Filtering Day at my Library

 

Wow – it’s been quite the filtering day here at the library. I had a question directed to me about filtering… I ended up talking to a Missouri Probation and Parole Officer about what we filter and how websites that are filtered behave.
We discussed risque images, pop-up windows, and just what happens when a library customer clicks on a link. Sometimes, pop-up windows galore start appearing. If our filter is working, the webpages featured in the pop-up windows are filtered, too. Each website that pops up has to be included in our filter’s database. If it is in our “bad websites” database, then the customer would see our “access denied” page instead of the actual webpage. But what happens if one of the pages that appears isn’t filtered? Then you get some “access denied” pop-up pages, and some actual pop-up pages. That could look confusing. And if our filtering server crashes (it does that sometimes), everything appears.
So now library staff are discussing things like: should we train our security officers and our public services staff on what should be filtered and what should not be filtered? More specifically – we’re following Missouri’s definition of “Explicit sexual material” – for all practical purposes, it’s pretty specific. But there are LOTS of images that would A. fall outside of that definition; and would B. still pull the triggers of some library and security staff. How do you train for that? I’m seeing a sitcom situation: someone standing in front of a training room, displaying large images on a screen… “is porn” <click> “is not porn” <click> “is porn” <click> “is not porn” <click> etc.
Then there’s the whole “checking to see if this site should be filtered or not is part of my job” thing. Is that harrassment? Do you ask for volunteers? Do you warn others when you start a “check the unfiltering requests” session so as to not offend passers-by to your cubicle? And on and on and on…
And the CIPA people said this would be easy.
 

Filtering at the library – how it’s going

Update: Skagirlie posted a comment – she says “Yeah, I like that my library blog is blocked at work just because the URL has “girlie” in it. Bess apparently doesn’t like to look at meta tags, or it would know that it’s a library blog, not a “girlie” site.”

That’s what I’m talkin’ about. Her site was marked “Porn” by Bess – even though IT’S NOT. It’s a way cool library techie blog. Thankfully, I just checked Secure Computing’s (they own Bess) URL Checker, and Skagirlie’s blog is now categorized as a “Message/Bulletin Board.” Not technically correct, but much better!

Bess has an option to review websites that are incorrectly categorized. I know I reported this one (I imagine Skagirlie did, too :-). It’s comforting to know that Secure Computing DOES check those requests, and correct categories when warranted!

*************************

My library filters – we have since July 1, 2004. We use Secure Computing’s Bess product for our filtering software, and we filter at a pretty low level – we’ve been filtering the category of Pornography, and we’re allowing all the exceptions possible (the exceptions are Education, For Kids, History, Medical, Moderated, and Text/Spoken only). We have also just started filtering the Gambling category (more on that below). You can find the complete list of the categories that Bess uses on their website.

How the Process Works
I currently have the utter joy of checking websites out when a library customer submits a Site Review Request. here’s what happens:

  • the library customer goes to a website that is filtered, and gets the “you’ve been blocked” warning page.
  • He/she has the option to send a “site review request” to the library, so a staff member can review the website to see if it really should be filtered.
  • I get the “site review request,” and then check each website to see if it should be filtered or if it can be unblocked (all hopefully according to Missouri and CIPA laws).
  • And then I unblock the site if it “passes go.”

Although I’m currently “in charge” of this process, it will soon transfer over to various public services departments. I’ve been doing it to make sure everything works and to set up procedures for the whole filtering process.

Gathering Some Statistics
But since I have to do this, I thought I’d have some fun with it (now, now – I know what you’re thinking…). Once we install the full version of this software, we’ll supposedly be able to get statistics (I’m guessing it’ll report things like what website categories patrons have been browsing, and how many library customers and/or websites have been filtered). But until then, I have kept my own stats on the Site Review Requests: I have kept each filtering request since July 2004 (don’t worry – no names are attached to the requests so privacy is preserved), and have dumped each request into categorized folders so I can sift some statistics out of this heap.

These statistics show how many websites, categorized as Pornography by the filtering software, are really porn sites. But only for websites that a library customer has asked the library to review. Make sense? Not sure how scientific these stats are, but they are rather interesting….

My Findings
So – that adds up to 7 months worth of statistics. That totals 855 requests (through January 2005) for the library to review a website that was lumped into a certain category. How accurate do you think it was? Hmm? Can you guess? Try this percent on for size: 42%. Yep, that’s right. 42%! Out of 855 requests to review a website categorized as pornography, only 42% of those websites have REALLY been porn sites. Dang!

So what have the other 58% been? (I’m rounding the percentages, fyi):

  • 9% – broken sites (they either don’t exist, were turned off, or the server was down when it was accessed)
  • 9% – dating and/or social networking types of sites (all those russian dating websites)
  • 3% – music sites. Especially hip-hop artist pages
  • 20% – redirect pages, marketing forms, and domain name placeholder pages
  • 17% – the rest of the “non porn” sites

Conclusions?
Actually, I’m going to make two conclusions:

1. The official “I have an MLS degree” conclusion: The filtering software isn’t doing a good job of filtering by category, because it lumped 58% more websites into a certain category than it needed to… and I was just looking at ONE category. There are a lot more categories, and I’m guessing the statistics would be similar in those categories, too. So that’s bad!

2. Non MLS, “it is helping staff and customers” conclusion: The filtering software, while it’s being a bit “over zealous,” IS categorizing a lot of sites correctly. Our public services staff aren’t having to play “web police” as much since we installed the filter. And a large percentage (29%) of the incorrectly categorized pages aren’t useful websites – they’re either broken sites, sites that no longer exist, redirected marketing scam pages or domain placeholder pages – all pages that most likely weren’t what the library customer had in mind in the first place. So both of those are good.

Plus, we’re also able to use the filter to enforce a library policy. Per our library’s policy, people can’t gamble in the library. So we decided to turn on the Gambling category, too. We just did this, so I don’t know what it’s going to filter. But still, I think it’s pretty cool that we can use the software that we were required to buy (we wanted the E-rate money) for other non-CIPA stuff.

I’d love to hear how other libraries are dealing with filtering! Feel free to comment on this blog or email me!