[Jenkins-infra] Flighting with wiki spam

Kohsuke Kawaguchi kk at kohsuke.org
Sun Mar 8 19:43:01 UTC 2015

It all makes sense. The main challenge is whether Confluence exposes those
actions via API.

(I started auto-deleting pages that are determined to be Indonesian with
99% confidence level.)

2015-03-08 11:31 GMT-07:00 Larry Shatzer, Jr. <larrys at gmail.com>:

> I wonder if you can have the bot also do two other things when it deletes
> the page. Purge the trash for that page (not all the trash for the space,
> but just the page)... Since that will delete the attachments (if any on the
> page). That has been another step I've been doing when I was manually
> cleaning up spam. Also if it is possible to invalidate their session, this
> works great if their login is also deleted at the same time, since it will
> slow them down, to either have to log back in, or try to create a new
> account. I've seen accounts that I've deleted still create pages until the
> synch with LDAP happens and really removes their account from Confluence.
> On Fri, Mar 6, 2015 at 11:14 PM, Kohsuke Kawaguchi <kk at kohsuke.org> wrote:
>> I started playing with this idea.
>> I set up a mailing list
>> <https://groups.google.com/forum/#!forum/jenkinsci-spambot>, feed wiki
>> notifications in here, and get a bot running. Right now, the bot tries to
>> determine whether the new page addition in Japanese, English, or
>> Indonesian, and just reply that info back to the list.
>> I'm going to keep it like that for a few days to make sure it's detecting
>> accurately, then I can implement the auto page removal.
>> I haven't yet implemented the page removal by reply. That'll come later.
>> 2015-03-02 12:59 GMT-08:00 Larry Shatzer, Jr. <larrys at gmail.com>:
>> I like the idea of spreading the load around, and possibly automating it
>>> via email (or irc) to fight spam.
>>> -- Larry
>>> On Mon, Mar 2, 2015 at 1:40 PM, Kohsuke Kawaguchi <kk at kohsuke.org>
>>> wrote:
>>>> This is just an idea.
>>>> I was thinking about how we can cope more effectively with Wiki spam,
>>>> and spread that workload.
>>>> What if we establish a mailing list based workflow? We'll create a
>>>> mailing list that spam fighters will join, and this list receives the
>>>> notifications from Confluence about new pages.
>>>> We'll have a bot monitor this list as well, and if it sees us replying
>>>> to a notification email with some keyword, say "BURN IN HELL", it'll go
>>>> delete that page. I think this simplifies the workflow for us humans quite
>>>> a bit, and it'll make it easier for multiple people to collaborate on this
>>>> task. The invitation only ML would serve as a kind of authentication
>>>> mechanism, to prevent the bot from going nuts.
>>>> The bot could evolve to do more actions, such as removing the user from
>>>> LDAP and perhaps feeding that information back to stopforumspam.
>>>> I've also experimented with a language detection library, and it seems
>>>> to work well. So our bot could automatically delete all new pages if it's
>>>> judged Indonesian beyond 99%+ confidence level, and it could auto-reply to
>>>> that list saying it deleted the page.
>>>> The accumulated archive will serve as a nice record of action to
>>>> analyze later.
>>>> Is something like this useful?
>>>> --
>>>> Kohsuke Kawaguchi
>> --
>> Kohsuke Kawaguchi

Kohsuke Kawaguchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20150308/f63dc6a1/attachment.html>

More information about the Jenkins-infra mailing list