No, this isn’t a new — and patently disturbing — take on the old childrens’ tag game. Nor is it the prop list for a soon-to-be-released — and equally disturbing — construction-site-themed porn flick. Thankfully.
Rather, it’s what I said to myself earlier this evening, when I noticed that a few comments here — quite a few comments, as it turns out — were being inadvertently flagged as spam, and not being published. Or responded to. Or seen, at all, by myself or others. And as much as I’d like to blame the two pipes that directly caused the problem, it’s really the douche that put them there who’s to blame.
I’ll give you three guesses who that douche is. And ‘Massengill’ and ‘John Edwards‘ don’t count.
“It’s a lot like searching for diamond dust in a turd factory.”
I’ll tell you the details. It involves a bit of a computer programming lesson, but don’t worry — evidently, I don’t know what the hell I’m doing, so how complicated could the explanation be, really? It’ll be like having a monkey expound on the symbolism in Beowulf. You’ll be fine.
So, here’s the thing about comments. I’ve never moderated comments here, and I doubt I ever will. If and when people are kind enough to leave comments — or annoyed or confused enough, or legally obligated to inform me of pending legal actions, perhaps — then I’m more than happy to have them on the site. Tickled pink and puffy, even.
For every legitimate comment I receive, there are hundreds — maybe thousands — of ‘spamments’ incoming. There’s a constant stream, flying just under the shiny surface of the template and these words, of Viagra this and casino that and mesothelioma the other, and porn, porn, porn, always with the grandma midget donkey veggie porn.
What it is these bulkmailing bastards are trying to accomplish, I don’t know. A page rank boost for their sweaty sexpot sites? An honest attempt to sell me and my three readers genuine Canadian knockoff drugs and the latest in black market near-silicone breast implants? Just freaking cheesing me off? No idea.
Luckily, they’re not accomplishing any of these things, thanks to the fancy spam filter tucked away in my blog software. Every once in a while, some new spamspewing shuckmonkey will slip a ‘comment’ or twelve past the goalie, but all I have to do is open up the settings, update a spam filter pattern to match, and delete the offending chicanery. And since that’s the only stupid job I personally have in this little arrangement, that’s precisely the bit that I screwed all to hell for a few months, and never realized.
I’ve added a lot of spam filters manually over the past five years or so — an awful, howling testament to the relentless misapplied creativity displayed by these spamchucking weaselbots. Maybe if their mothers had only held them as small children, or a teacher had shown even an ounce of interest in their aspirations, they could have channeled their energies for good. But no. Evil it is. And so, evil I must repel from my site.
One of my most often-pinged filters looks something like this:
(The ‘…’ above is simply a placeholder for another three dozen or so words, each of which has shown up in one or more spamments and been added lovingly to this list, which sends any of their subsequent slimy spammy brethren to the junk comment purgatory pit.
I like to imagine brimstone in there. And rivers of fire. Also, unspeakable acts performed with pineapples, bowling balls and rabid wolverines. Spammers and their spawn are not on my Christmas card list.)
In case you’re not familiar with the syntax above, that jumble of words and punctuation is what’s known in programmer’s parlance as a ‘regular expression’. Why it’s called that, when it’s clearly quite irregular, and not especially expressive, is beyond me. That’s just what it’s called. But what does it mean?
Well. This is where the programming bits come in. Feel free to avert your eyes for a few paragraphs, if you’re allergic to these sorts of computery type things.
The slash at the very beginning and almost at the end enclose the actual code; like buns on a Big Mac, they’re just the wrapper for the meaty bits inside. And that little ‘i’ dangling off the end stands for ‘(case) insensitive’, so the program knows to ignore upper-casiness and lower-casitude within the pattern. No problem so far.
The first and last bits of the pattern are also slashes — forward slashes ( / ), to be exact. But since the pattern itself is enclosed by those same sorts of slashes, the ones inside have to be noted specially, lest the first one be mistaken for the ‘end of pattern’ slash, and the rest of the code go for naught. In this particular coding system, forward slashes are specially-noted — or ‘escaped’ by putting another character in front of them. Which happens to be a backward slash ( \ ), making a funny little V-looking thing ( \/ ).
(I wish I were making this up. But this is really the kind of shit we coders are expected to remember on a daily basis. It’s a wonder I have enough brain cells left to remember my fricking name.)
Forget the funky bits for a moment. The last part of the code is a bunch of words, enclosed in parentheses and separated by a funny straight-up-and-down sort of slash. We call that a ‘pipe’. And if you’ve read the title and ever coded a regular expression before, you probably already see where this is headed. For your sake — and everyone else’s, at this point — I’ll keep the rest of the explanation brief:
The parentheses serve to group whatever’s in between them, and the pipes act as ‘or’s. So in the pattern above, that middle bit is saying, ‘anything with the word ‘forums’ OR ‘files’ OR ‘download’ OR ‘catalog’ in it‘. The funny V-looking slash-slash bits, and the ‘i’ at the end expand that, so that any comment is counted as spam that has:
‘a forward slash, then any of the words in parentheses, regardless of case, then another forward slash‘
This is all well and good, because all those words I put in the parentheses are used in spamjacker URLs all the time, and always between two slashes. If you know about directories on web sites, then you may recognize that I’m attempting to filter out directories — like ‘/forums/’ or ‘/catalog/’ — that are likely to exist on some commercial site with pills or porn or pork rinds to sell, but not so likely on a site that a legit commenter would type in. Like a blog, or a MySpace page.
This pattern has weeded out many, many hundreds of spamments for me over the years, and for that, I’ve been extremely grateful. Until a few hours ago, when I looked a little closer, and noticed that what the code actually said was:
That’s two pipes in a row there, in the middle. Which means that in addition to all the words in the list I worked so hard on, at some point I also introduced an ‘OR nothing at all‘ in between those two slashes in the pattern. And a slash, then nothing, then another slash is, last I checked, ‘slash-slash’.
As in ‘colon-slash-slash’. As in, part of ‘http://‘, which has been the beginning of every single freaking link on the web for the past twenty years.
So basically, if someone left a comment, and it pointed back to their site — any site at all, spam shack, blog, vatican.com, charlieisthegreatestever.org, anything at all — it got flagged as spam. At least since I picked up writing again in November, and probably for many, many moons before that.
There’s one bright spot in my blogging day, and that’s logging in to see that some kind soul has been moved enough to leave a comment on one of my posts. And for months, I’ve been shooting myself squarely in the ass in that regard, without ever knowing it. I could only wonder why the comments seemed to have dwindled upon my return, blame everything on the crappy economy, and cry myself to sleep on my keyboard at night.
Which sounds sort of tragic, I suppose. Except that since realizing my mistake, I’ve now looked through no less than twelve thousand comments relegated to the junk folder, searching for the few spare notes left by real people. It’s a lot like searching for diamond dust in a turd factory. I’ve seen more spam in the last three hours than any one person should be subjected to in a lifetime. Frankly, I prefer the crying and wondering bit. At least a nice hot shower can wash away the shame from that.
So, long story marginally shorter, I managed to ‘rescue’ a couple dozen comments — in other words, roughly four hundred percent of the ones I’d seen in the past two months, not counting the ones I’ve written myself. And I trashed the steaming pile of actual spam comments, to start with a fresh slate. If only I could swap my eyes for a new set, too, I’d be just peachy.
Of course, I flushed a couple of thousand comments before I looked closely enough, so I may have missed a missive or two. For that, if you sent something that you thought I’d ignored, I apologize. And I’m now going to tackle the task of responding to each of the new-to-me comments that I’ve found, and hope that my adjustment to the filter — second pipe, yoink! — fixes the problem going forward.
Still. If you left a comment in the recent past, please feel free to submit it again. Or if you don’t recall what moved you at the time, drop a different line. If you haven’t commented, well, now’s the time. Somebody’s got to test the system, and heaven knows I can’t do it alone. Look at how solidly I mucked it up last time. Either way, really, it’s safest just to drop a quick comment, and see what happens. I might still have gremlins lurking in my filters. Only you can help me ferret them out. Don’t leave this douche crying on his keyboard.Permalink | 12 Comments