Wikipedia talk:Suspected copyright violations/Archive 1

Archive 1 Archive 2

keeping the page manageable

I don't think the current ways we're managing the page are really working... i.e. removing items outright and explaining why in the edit summary (too hard to track changes), or listing our conclusions on the page (makes the list long and unwieldy). Therefore I think we should create sections, and move "incoming" items there. So if something is a false positive (i.e. not really copied from somewhere), we move it to the section for those. Other sections I suppose could be "Listed as copyvio/deleted", "Copyvio URL is a Wikipedia Mirror", and "Needs further review". Perhaps we could remove items after 24-48 hours. Thoughts? --W.marsh 17:56, 18 July 2006 (UTC)

Thanks for the suggestion! I feel that copyvios are fairly cut-and-dry; thus, I'm not sure that all of those categories are necessary. It may just use more time that could be better spent elsewhere and I think the benefits that we could get from such a categorization scheme are minimal. I'm not sure though, ;what do you think about the above concerns? Cheers, -- Where 20:21, 18 July 2006 (UTC)
Alright... my main thoughts were A) it would let us review eachothers work more easilly than digging through the file history and B) It would help you see any false positives and wikipedia mirrors that turned up. But if that's not useful... I personally find it much easier to just remove items from the list when they've been handled, I'll do that for now. --W.marsh 20:37, 18 July 2006 (UTC)

Richard R. Nacy

Using a different source, I have re-created the Richard R. Nacy article, which had previously been flagged by WhereBot as a copyright problem. --TommyBoy 05:58, 24 July 2006 (UTC)

Bioguide

The stuff from infoplease looks to be from the congressional bioguide. I think those are all public domain, but need the {{bioguide}} template on them. Also, if you want to find a good place to run this bot, try [[Category:Wikify_from_July_2006]]. --Rayc 03:34, 30 July 2006 (UTC)

Red cross

What is happening with Red Cross page? There have been several reports (like Spencer Township, Central Coast and United States Naval Observatory Flagstaff Station) which refer to www.redcross.org, however all of those reports were false positive. Maybe you should filter redcross links out? -- ReyBrujo 17:53, 30 July 2006 (UTC)

Thanks for pointing that out! I think that bug should be fixed now. -- Where 20:53, 30 July 2006 (UTC)

A couple of comments

First, could it be possible to fix the summaries, to include the article name in there? And second, this was a strange report, maybe some kind of bug? -- ReyBrujo 04:03, 22 September 2006 (UTC)

How is this page supposed to be used?

I'm confused. What is this page about? Am I supposed to add suspect pages to some list somewhere? Is there a category/macro to tag suspect pages with? Anyway, Joseph Todaro looks odd to me - a great big block of unformated text by a user with no other contributions. Can't find anything on Google, but that doesn't mean a lot. I've asked the user, but I'm not counting on getting an answer. Regards, Ben Aveling 13:14, 30 September 2006 (UTC)

This is a page where a bot places suspected copyright violations. After an editor has checked it, the entry is removed from this page. That's why this page is currently empty. If you suspect a copyright violation but you don't know for sure, then you can put this template {{Cv-unsure}} on the talk page of the article in question. See the template for instructions on how to use it. Garion96 (talk) 15:40, 30 September 2006 (UTC)
Thanks for that. I'll copy it to the top of the talk page, trust you don't mind? Regards, Ben Aveling 02:22, 3 October 2006 (UTC)
Hmm... that {{cv-unsure}} is quite orphaned, shouldn't it belong to a category and be included at Wikipedia:Template messages/Maintenance#Copyright violations? -- ReyBrujo 02:57, 3 October 2006 (UTC)

Bugs

  • Check this diff, the bot reported the article but did not report the URL. -- ReyBrujo 22:50, 15 October 2006 (UTC)
  • Another bug, the wikilink is bolded. -- ReyBrujo 01:31, 22 October 2006 (UTC)
  • Similarly, the bot did not report the URL here. Article deleted at 16:35, 23 October 2006, bot report at 16:36, 23 October 2006, could it be a run condition? -- ReyBrujo 19:42, 23 October 2006 (UTC)
  • Another with no URL reported. -- ReyBrujo 12:12, 24 October 2006 (UTC)

Quick idea: add user to the report

Could it be possible to add the user in the report? Sometimes I tag an article with a speedy tag, and while trying to check the history to put a {{nothanks-sd}} tag into the creator's talk page, the article has already been deleted. Thanks. -- ReyBrujo 20:13, 23 October 2006 (UTC)

That would be helpful... though it would be even MORE helpful to list out <nowiki>{{subst:Nothanks-web|pg=The Connecting Communities Cymru Project|url=http://aehindnt.swansea.ac.uk/ccc/en/aboutus.asp}}</nowiki> so we could quickly cut-n-paste it and put it on the user's page. It would make this process go a fair bit faster. --Interiot 20:18, 23 October 2006 (UTC)
I definently think that Interiot suggests would be helpful... though it might bloat the page a bit. Composing those nothanks tags is a bit tedius. Is there any way the bot could include a link to a pre-loaded version of the uploader's talk page, even? That would really make it painless to notify. --W.marsh 01:21, 24 October 2006 (UTC)
Well, I was going to suggest that, but I guessed we were not that lazy :-) I wonder if Where ever checks this page anymore. Anyways, there is no problem with bloating, the page doesn't usually have more than 4 or 5 reports. It could even use a table format if necessary. -- ReyBrujo 01:55, 24 October 2006 (UTC)
Not really lazy... Finding the creator's username to click on is easy, the link is in the history either prior to or after deletion. It's the copying-and-pasting of the various bits, especially if you haven't memorized the template yet, that I think does consume some extra time. Though we're not really processing that many copyvios, so it's not that big of a deal (though this system generally is a really good thing, so if there's any way to expand it, it would be nice to do that). --Interiot 02:40, 24 October 2006 (UTC)
Advantage of not being an admin I guess. I simply copy & paste it from the {{db-copyvio}} tag. :) Garion96 (talk) 14:28, 24 October 2006 (UTC)
For what it's worth, I've got a small bit of javascript you can add to your monobook.js, that automatically displays the filled-out {{subst:nothanks-web|...}} bit on Special:Undelete pages when the page was deleted as a copyvio, so it will work on other copyvios as well. See User:Interiot/js/nothanksweb gen.js. --Interiot 00:56, 28 November 2006 (UTC)

About nothanks templates

There is something I don't like from those templates, and I will be stating it here so that I don't need to post that in every nothanks template page. It says something like "If you object, please go to Talk:Page and post the correct information about copyright there" The main problem there is that administrators delete the article, and these users create the talk page to state why the article should stay. In the last week, I may have tagged 5 or 6 of these talk pages as {{db-talk}} because they are orphaned, while explaining to the users in their talk pages why their text was not accepted. Since most are speedy deleted, shouldn't the template be modified to prevent these orphaned talk pages from being created? -- ReyBrujo 03:30, 24 October 2006 (UTC)

I guess that's a leftover from the 14 day listing on WP:Copyright problems. Perhaps a seperate page on Wikipedia:Copyright problems would be more practical for that. Garion96 (talk) 14:31, 24 October 2006 (UTC)

List with detected copyvios

What would people think about creating a subpage with a list of wikilinks with all the reported copyvios? This way, if any comes back, we would be able to check to see if it has been correctly rewritten, and if so, add a note to the wikilink stating the article is now valid. This would help us catch old copyvios when they come back as simple rewordings, or when the bot is down. -- ReyBrujo 18:43, 10 December 2006 (UTC)

  • That might be a good idea, but if the bot is up it should find repeat copyvios and report them normally. Sometimes I like to go through my own deletion log back a few weeks or months, and see what's been re-created. --W.marsh 17:30, 17 December 2006 (UTC)

General advice

Nice idea! What about creating a custom message to put above WP:SCV too? Something like:

Some formatting required, though. We could create it at Wikipedia:Suspected copyright violations/Advice. Also, maybe we can create a template at Wikipedia:Suspected copyright violations/Check, where users checking the copyvios could just call it to get a summary to add here. In example, suppose the report is:

  • Parish Players -- http://www.aqqe76.dsl.pipex.com/Pages/Opening_page/Opening_page.html. Reported at 01:03, 6 February 2007 (UTC)

Then we could create a template so that users could append it below:

*: {{Wikipedia:Suspected copyright violations/Check|copyvio=yes|userwarned=yes}} -> Common copyvio
*: {{Wikipedia:Suspected copyright violations/Check|copyvio=no|userwarned=no}}   -> False positive
*: {{Wikipedia:Suspected copyright violations/Check|badmove=yes|userwarned=yes}} -> Positive due cut/paste move
*: {{Wikipedia:Suspected copyright violations/Check|copyvio=yes|url=http://...}} -> Copyvio but from another source

This way we could have standard messages. Not necessary, but would be nice and may help automatize this in a future (writing a Javascript method similar to the ones for AIV reporting). However, I still think we need to make the general advices visible in the article, not only here. -- ReyBrujo 02:59, 6 February 2007 (UTC)

  • Well at a glance I would oppose standard messages. It just seems simpler to type out the problem and not worry about some relatively complex system. I'd prefer to keep the advice on this page too... as we still just have a handful of savvy editors here, they'll see this page and I like the simple look of WP:SCV (and perhaps the bot does too). But the graphic you're using is nice, if you want to add it in that format to the top of the page. --W.marsh 03:33, 6 February 2007 (UTC)

Non admins deleting items on the list?

Well I was doing that since the start and I've just seen it was not the usual practice. Sorry about that then :)... On the other hand, it often happens that out of the 10 articles on the list, 8 or 9 are already speedy tagged. That makes it harder to find the remaining articles. Maybe we could do 2 sections, one with tagged but not deleted articles and one filled up by the bot? -- lucasbfr talk 23:06, 7 February 2007 (UTC)

I totally missed it too. I always have removed the articles from the list and put them in my watchlist to see if the article gets deleted/rewritten or if the editor ignores my mesage just removes the speedy tag. It does clutter up the page if you have to wait till there is a red link. If it's necessary I guess we should mention to editors that they write "done" behind every article they tag. Garion96 (talk) 23:17, 7 February 2007 (UTC)
Hey, don't feel bad. That bit was just added in the past day or so. I think it makes sense, which is why I clarified it on the project page. I think it also makes sense for us to indicate which articles we've tagged as copyvios, perhaps by adding "tagged" at the end of the entry. Or they could just make us WikiGnomes admins and then not have to worry about it.  ;-) But seriously folks, what do some admins feel about this? -- Butseriouslyfolks 00:54, 8 February 2007 (UTC)
Well, I added that a few days and it's not at all a requirement, just advice. Especially with the big CSD backlogs, the tag can just get removed and lost in the shuffle. If it stays here it will be more likely to get handled correctly, as admins check this page. Non-admins should still remove redlinks, no question about it. If non-admins don't like the advice, go ahead and remove articles when you tag them... though I do encourage you to try to check back and make sure the article was handle correctly. Anyone who contributes here for a few months should get nominated for adminship, so hopefully you'll all be admins soon... there's just some new blood lately (which is a great thing). --W.marsh 02:21, 8 February 2007 (UTC)
Oh, I didn't realize the text on the front page had been changed. I'm indifferent, it could be changed back to what it had been forever. --W.marsh 02:34, 8 February 2007 (UTC)
I did not change it, I swear! :-) -- ReyBrujo 02:41, 8 February 2007 (UTC)
Twas me. Just wanted to make sure everybody on the page was on the same page.  ;-) -- Butseriouslyfolks 03:11, 8 February 2007 (UTC)

This section can be deleted after a couple of days.

  • -- www.awesome-rope.info/sterling-rope.html. Reported at 01:25, 8 February 2007 (UTC)
    False positive? Article text is a bit brochure-ish, but I can't find that it was copied from somewhere. --W.marsh 02:28, 8 February 2007 (UTC)
    Yes, the first version of the article was virtually empty, only a infobox. The article apparently was deleted twice due copyvio, and the user recreated it correctly. -- ReyBrujo 02:45, 8 February 2007 (UTC)
I believe it is safe to assume that, if two users have agreed that it is too generic or a false positive, and you don't disagree with those opinions, the section can be deleted without having to copy it here. You can make it that at least one of the users agreeing must be an admin, but may not be necessary. -- ReyBrujo 04:21, 8 February 2007 (UTC)
I agree. I just didn't want to delete it before W got a look at it. I suppose I could have just left it on the main page, but it was disturbing me there.  :-) -- Butseriouslyfolks 04:28, 8 February 2007 (UTC)
Traditionally we've left stuff on the main page to discuss it if needed, and that's what I'd prefer we do... it's just the easiest way. --W.marsh 20:27, 10 February 2007 (UTC)

Wherebot Down?

Either the mighty Wherebot is down or else the world has suddenly woken up to copyright issues, as Wherebot has not reported any SCVs in nearly a day. I'm getting bored. I just left a message for Where to alert him to the situation. Although I forgot to ask him whether Wherebot will pick up where it left off or start with new posts after it is back online. (Done.) --Butseriouslyfolks 04:49, 22 February 2007 (UTC)

How strange. For some reason, the Wherebot process had terminated. I tried to make it so Wherebot should restart if this happens again and started it up again. -- Where 14:06, 24 February 2007 (UTC)

In the meantime

Per Where, User:Wherebot works off of a news feed, so any copyvios posted while Wherebot was down will not get flagged. Where is working on a process to review prior postings for copyvios, but this may not be implemented for a while. I've been poking about the recent changes and caught a few copyvios manually, and I'm going to continue to do so. If anybody is inclined to organize this effort, please do. --Butseriouslyfolks 21:27, 24 February 2007 (UTC)

Wherebot's down

Per Where's note on my talk page, Wherebot will be down for a couple more days. --Butseriouslyfolks 00:14, 12 March 2007 (UTC)

Bot down again. e-mailing Where. -- lucasbfr talk 16:37, 21 March 2007 (UTC)
Back up. --Butseriouslyfolks 19:17, 21 March 2007 (UTC)

Down again, emailing Where. --Butseriouslyfolks 00:25, 25 March 2007 (UTC)

  • Emailed twice but never heard back from Where. Are there any other copyvio checking scripts or utilities available? --Butseriouslyfolks 19:12, 30 March 2007 (UTC)

Backup bot

I don't know if it is feasable, but if someone has a computer that is always connected, maybe we should contact Where to arrange a backup bot in case the first one fails? I could do it, but I can't access my computer remotely so I can't start it when I am not at home. -- lucasbfr talk 07:26, 3 April 2007 (UTC)

'Twould be nice, as Wherebot is down again. --Butseriouslyfolks 17:31, 18 April 2007 (UTC)
There's a piece of code released under GPL. Unfortunately I can't bring a backup bot at the moment (I'm moving). Anyone got a UNIX system that could run a temporary bot? -- lucasbfr talk 11:08, 19 April 2007 (UTC)

Fixed bot downtime problem for good (I hope)

Apparently, the downtime problems with Wherebot were caused by the toolserver (where Wherebot is hosted) rebooting. Thus, I made the server check hourly to see if Wherebot is up and restart it if need be. Hopefully this will fix the problem :). -- Where 05:16, 22 April 2007 (UTC)

Three cheers for Where and his bot!! --Butseriouslyfolks 05:27, 22 April 2007 (UTC)

Backlog

With the re-keyed passages from Clicketyclack at Special:Contributions/Robertterwilliger, do we need the {{backlog}} box? --Knowpedia 05:18, 23 April 2007 (UTC)

Good point, but I think it's ok. Most of them have already been deleted. We're down to 5 there and just 3 or so more. Let's let it ride . . . --Butseriouslyfolks 06:43, 23 April 2007 (UTC)
Hip, hip, hooray!  :-P --Iamunknown 05:56, 12 June 2007 (UTC)

copyvio vs. advertising

I know we tend to tag stuff as copyvios, calling a spade a spade and all. But this apparently causes problems for OTRS and people in general sometimes, as when people are spamming us with copyvios, they eventually clue in and say "I release the rights" and then it just gets deleted as spam. It saves a lot of time if we just tag/delete it as spam (G11 blatant advertising) in the first place. If you think about it, spamming some copy from your official website is a very basic form of blatant advertising, since that text was written just to promote the company, and therefore meets G11 to a T. This is not a requirement or anything you'll get in trouble for not doing, but I've been trying it for the past month or so and the results have been positive. You should mention in edit summaries though that the text is also a copyvio, that kind of clears of the application of G11. But if you find it takes you too much more time to do properly, just stick with copyvio tagging as usual. --W.marsh 14:28, 8 May 2007 (UTC)

I try to apply both tags where it is clearly spam because I think it's fairer to let the contributor know that simply removing the spam elements or rewording the copyvios will not make the article acceptable. Both must be done. --Butseriouslyfolks 16:37, 8 May 2007 (UTC)

Reid Bryson

Reid Bryson looks to be copyvio of http://ccr.meteor.wisc.edu/bryson/bryson.html but the page creator claims that It is extremely sparse and summarized. I doubt if this violates any copyright. William M. Connolley 09:34, 16 May 2007 (UTC)

If my understanding of copyright is correct, fair use assesses the right to copy a small percentage of a text (small not being defined, and varying on the importance of the copied parts). The original "interesting part" of the source is roughly 3k chars, the disputed part of the article is roughly 1000 chars (although some of it has been rewritten). It is not a blatant copyvio, but I believe it still qualifies as one. -- lucasbfr talk 10:42, 16 May 2007 (UTC)
I have removed or rewritten all of the copyvio text. --Butseriouslyfolks 03:21, 18 May 2007 (UTC)


External Link is copyvio'ing Wikipedia

Not sure if this is the right way to bring this up -- if it isn't, please let me know -- now that I've prefaced myself with some CYA, it seems that Disney's_Adventures_of_the_Gummi_Bears is being used by one of its own external links as an unattributed source (site in question is "http://cortneywilliams.com/gummibears" , aka the 3rd external link in the list). Is there a simple way to deal with this or bring it up to the Powers That Be? Again, sorry if this is the wrong forum for discussion on this issue, and thanks in advance! JasonDUIUC 14:48, 27 May 2007 (UTC)

Well, there are thousands of sites like that. Usually one would contact the site to tell them that they need to attribute the information they take from Wikipedia. However, Wikipedia itself really only cares for sites that take a lot of articles without attributing, like most of the ones listed at WP:MIRROR. -- ReyBrujo 16:45, 27 May 2007 (UTC)

Wherebot is on the blink

It's reporting a copyvio with no source every 3-4 minutes, and these do not appear to be copyvios. I have notified Where. --Butseriouslyfolks 05:36, 24 June 2007 (UTC)

Wherebot appears to be back to normal. --Butseriouslyfolks 02:54, 25 June 2007 (UTC)
Hurray.  :) --Iamunknown 02:56, 25 June 2007 (UTC)

I should have mentioned a few days ago that I've been seeing newly created articles from WP:AFC pop up as copyvios with a dead source link. I assume they're being triggered by some sort of mirror when the blurb from AFC is posted into the mainspace for expansion. A lot of these are being helpfully created by User:Delldot, who does check them for copyvios, so if it appears to be a copyvio and you see that username, please take another look. -- But|seriously|folks  16:36, 5 July 2007 (UTC)

Happy birthday Wherebot

Too many candles, but Wikipedia lacks good free images of balloons.

Wherebot and thus this page are now 1 year old! By my rough estimation, based on wherebot's 6100+ edits to this page, we've deleted around 4,000-5,000 copyvios through this page. Recently I've been going through 6-month old pages that had no formatting when created, and essentially none are copyvios as detectable by the Google test - so in my (biased) estimation Wherebot has been a total success. A year ago, as was shown, there were a whole lot of copyvios sitting around in articles. I think we've slowed and maybe even stopped them from being introduced at article creation.

Nice work, Wherebot, Where, and everyone who's worked on this page over the past year. --W.marsh 03:25, 15 July 2007 (UTC)

  • Everyone clap. Good work! --Haemo 03:32, 15 July 2007 (UTC)
All hail the master and his creation!! -- But|seriously|folks  05:47, 15 July 2007 (UTC)
I found some balloons. Now it's a real party! ;) Woohoo! Its excellent how many copyvios we've detected. Good job everyone, and thanks Where and of course Wherebot! Happy b-day! :D --Iamunknown 06:02, 15 July 2007 (UTC)

Kilbot

So, uh, how am I supposed to handle these things? It's confusing! --Haemo 02:05, 27 July 2007 (UTC)

  • I've talked to the operator about it... he's going to look into making it act like Wherebot I think. The new bot seems to catch a lot Wherebot doesn't. There'll be a few kinks though it seems. --W.marsh 02:25, 27 July 2007 (UTC)

Wherebot malfunction

Wherebot's back to doing what it did in late June, reporting frequent false positives with no source. I left a note and email for Where. -- But|seriously|folks  05:30, 28 July 2007 (UTC)

A third bot soon!

See Wikipedia:Bots/Requests for approval/CorenSearchBot and comment if you wish! It will be set up to report to the main WP:SCV page possibly by tomorrow. It tags articles for speedy deletion itself, so should prefix the article name with "(tagged by CSBot)".

One unfortunate thing: we may get duplicates.

I should've let you guys know earlier, but Coren, the bot operator, has been very helpful and willing thus far, and I am sure will be willing to help us out. Cheers, Iamunknown 05:57, 6 August 2007 (UTC)

Looks great, thanks for the notice! -- But|seriously|folks  07:31, 6 August 2007 (UTC)
Note: Just to make sure there aren't any misunderstandings, CSBot doesn't tag for speedy deletion, it tags with its own notices— human intervention is still required (but the tag does prod many editors into fixing the articles themselves).

 Question: would people prefer the bot to log directly on Wikipedia:Suspected copyright violations instead of a subpage?

  • Pro: simpler, cleaner, and prevents duplicates.
  • Con: might confuse Wherebot (hard to tell in advance) — Coren (talk) 00:50, 7 August 2007 (UTC)
Directly to WP:SCV. We occasionally get manual user reports here as well. If it affects Wherebot, you can always make it log the other way. IMHO. -- But|seriously|folks  01:28, 7 August 2007 (UTC)
Wherebot just appends to the page, so it shouldn't get confused if other bots edit it. By the way, Coren, great job on the bot :). Since Wherebot only searches the first few words of the articles, perhaps it should be disabled once CSB matures. Please let me know what you think. Happy editing, -- Where 04:38, 7 August 2007 (UTC)

I think all bots having wherebot-style reporting is best... including reporting directly to WP:SCV. over the past year we've figured out how to handle it like that in a way that produces little overhead, so just 1-3 of us can be active on any given day and it's still not backlogged. But it's up to all the active people here and the bot operators. --W.marsh 04:43, 7 August 2007 (UTC)

Has anyone spotted any duplicates yet? I haven't. -- But|seriously|folks  05:07, 7 August 2007 (UTC)

Oh, the bots shouldn't add copyvio tags to articles... the whole point of this page is that a simple non-intrusive report can be made then copyright specialists respond as appropriate. There are inherently going to be a lot of false positives a bot just can't weed out. --W.marsh 13:44, 7 August 2007 (UTC)

I disagree. To date, CSBot got only two "real" false positives (that is, articles that really had different contents), and its matching will get better as I tweak it with feedback from previous matches. The tags are not destructive, and in many cases (at least a dozen to date) they have prodded the creating editor (sometimes just a passing new page patroller) to fix the copyvio themselves before intervention by someone from here— and that's taking into account that CSBot finds about 4-5 times as many articles as Wherebot does.
It obviously will never be able to distinguish legitimately very similar articles from the copies, or recopied PD material (though tag recognition could help with that), but needing a human editor to confirm that the article is okay and remove the tag is a Good Thing. — Coren (talk) 14:40, 7 August 2007 (UTC)
A human editor from here will be by before too long if it's just listed here. I think people get annoyed pretty easily when a bot tags something for deletion incorrectly... and it could also lead to them getting dealt with by admins not that familiar with text copyright issues. You're never going to make a bot that doesn't make some errors. --W.marsh 14:46, 7 August 2007 (UTC)
Oh, it's not tagged for deletion— that would be insanely dangerous. Have you seen the templates it uses? {{csb-pageincludes}} and such. — Coren (talk) 16:43, 7 August 2007 (UTC)
Well, that's better. But you might just be signing yourself up for a lot of grief... Where has gotten angry comments just because Wherebot reported stuff here that didn't turn out to be copyvios. It might be a daily thing for you if your bot is actually tagging articles directly. Anyway, I guess it's different than if it was tagging them for deletion directly, sorry for the misunderstanding. --W.marsh 17:09, 7 August 2007 (UTC)
I'm actually cautiously optimistic about the grief. To date, response to the tags has been rather positive; good faith editors are actually pleased to know they made a boo-boo and want to make things right. And at least a couple have learned about what is permissible or not from the event. In my book, that alone is worth some grief-soaking.  :-) — Coren (talk) 17:37, 7 August 2007 (UTC)

 Question: I notice I'm getting a higher (20-30%) rate of high positives on very short articles. I think it might be a good idea to ignore articles that are very short (like, less than 50 words or so). Makes sense? — Coren (talk) 18:13, 7 August 2007 (UTC)

To clarify; very short articles "They are the best band around!" are expected to have a much higher rate of false positives— the chances of one or to short statements being found randomly in some other page is very high after all. What I'm asking is: is there a point below which I shouldn't bother? After all, if the article is only 30 words, it's unlikely to be a significant copy of some web site even if it is verbatim. — Coren (talk) 18:16, 7 August 2007 (UTC)
It seems to be doing okay so far. Even short copy and pastes are still often blatant advertising, which is useful to find and deal with. Maybe on short articles it could require a longer phrase to match, I dunno. --W.marsh 19:39, 7 August 2007 (UTC)
Hm. When I return home tonight, I'll see if I can tweak the matching algorithm to bias score against smaller articles so that the shorter it is, the more exact the match required to flag. I'm a bit leery about messing with the matching code this late in trial, but I guess we can just keep an eye on it and see if it breaks. — Coren (talk) 19:52, 7 August 2007 (UTC)

CorenSearchBot trial, part deux

Well, CSBot finally works without human attention grumble captchas, and is now on. I'll keep an eye on it, but for the remaining day of trial or so it should run without pause. — Coren (talk) 23:29, 7 August 2007 (UTC)

 Question: on of the thing CSBot does is locate and tag (very mildly) obvious cut-and-paste copies of Wikipedia pages (looses history, and all). I now also log those (like this. What is the consensus here about both the detection/tagging and reporting of those? — Coren (talk) 00:23, 8 August 2007 (UTC)

  • Detecting duplicates is useful... although those are even harder to deal with than your average copyvio. Wherebot does this in a roundabout way, doing it directly seems great. Duplicate articles and copy and paste moves are bad and need to be fixed. --W.marsh 00:20, 9 August 2007 (UTC)

So, the trial is now over. I expect feedback on the approval page from SCV regulars will be valuable, so don't hesitate to go and chime in. Especially if you have reservations about what CSBot does or how it does it— having those fixed before we let it loose is a Good Thing. — Coren (talk) 00:46, 9 August 2007 (UTC)

Turns out approval was faster than I expected. CSBot will return in about 1 hour. — Coren (talk) 01:22, 9 August 2007 (UTC)

Good deal; CSBot is pretty dandy. --Haemo 01:36, 9 August 2007 (UTC)

Dumbbot question

For some reason, Dumbbot is adding item's listed by Coren's bot to Wikipedia:Copyright problems/2007 August 9/Articles. Any idea what's up here? At a glance it might be unnecessary. --W.marsh 13:33, 9 August 2007 (UTC)

Maybe because of the category the templates put the article in? I didn't know there was a bot that scraped those. — Coren (talk) 14:55, 9 August 2007 (UTC)

Bot

What everyones thoughts on a bot that automatically removes pages once they have been deleted? It would work in a similar to the HBC AIV helperbots. Ryan Postlethwaite 23:03, 10 August 2007 (UTC)

  • I would totally be apt for a bot to take charge of this process that adds some unnecessary Wikipedia space edits to user contribution history.¤~Persian Poet Gal (talk) 23:12, 10 August 2007 (UTC)
  • You mean remove redlinks on the page? I'd be all in favor of it! — Coren (talk) 23:13, 10 August 2007 (UTC)
Yup, that was the idea :-) I'll have a snoop around to see who could create one. Ryan Postlethwaite 23:14, 10 August 2007 (UTC)
If only such things could create themselves :P...thank goodness for bot creators.¤~Persian Poet Gal (talk) 23:16, 10 August 2007 (UTC)
  • User:HBC AIV helperbot has a similar function (as well as merging duplicates and turning on/off the backlog template). Perhaps it could be beaten into this function? Alternatively, if you can't find anyone, I could add this as a function of CSBot, I suppose, or make a distinct bot for the task. — Coren (talk) 23:30, 10 August 2007 (UTC)
    Silly me. You even used helperbot specifically as exemplar.  :-) — Coren (talk) 23:31, 10 August 2007 (UTC)

I would suggest a somewhat long grace period or run cycle, 15-30 minutes. This would let us see if there was some kind of a trend going on, like someone continually recreating an article, or a reporting bot malfunction (Wherebot sometimes just randomly reports redlinks). --W.marsh 03:14, 11 August 2007 (UTC)

I was thinking the same. -- But|seriously|folks  03:49, 11 August 2007 (UTC)

So, what's happening about this bot? Anyone picking it up? — Coren (talk) 21:08, 11 August 2007 (UTC)

Hmm. Can this function be built into your existing bot? I'd write it, except I'm unfortunately kind of wary right now about picking up any new BOTREQs, understandably, and I'm rather busy. — madman bum and angel 23:13, 11 August 2007 (UTC)
I've found someone to create a bot that could do this task and I will be putting in a bot request tomorrow, anyone with bot experience interested in running the bot? Ryan Postlethwaite 22:14, 13 August 2007 (UTC)

Apologies (of sort)

Please accept my apologies. After it's been running for a couple of days, it looks like CSBot gave all of you a lot more trouble and work!

In case that wasn't immediately obvious, that comment was tongue-in-cheek and meant more as an exclamation of pleasure at CSBot working as hoped than an apology. — Coren (talk) 21:03, 11 August 2007 (UTC)

Wherebot broken?

I've removed the large number of posts by Wherebot that had no url included. Forced recheck manually by CSBot, and no violations found. Bug? — Coren (talk) 05:15, 12 August 2007 (UTC)

Bug. It does that occasionally but always straightens itself out within 24 hours. I think this is the 3rd or 4th time it's happened. -- But|seriously|folks  05:49, 12 August 2007 (UTC)
Per WP:BOT if the bot is repeatedly malfunctioning and the botmaster is gone shouldn't the thing be blocked? --JayHenry 06:36, 12 August 2007 (UTC)
Yes. — madman bum and angel 06:38, 12 August 2007 (UTC)
Yes, per BOT policy it should be blocked ... but don't block it. It is harmless, and we just have to remove the false positives. --Iamunknown 06:55, 12 August 2007 (UTC)
Indeed; from the talk page, it seemed this was a bigger problem than it really is. It is disruptive, but it's easy to spot and ignore. — madman bum and angel 07:15, 12 August 2007 (UTC)
I'd prefer we just leave it unless it becomes really untenable. --Haemo 07:39, 12 August 2007 (UTC)
Me too. It only happens like once a month. -- But|seriously|folks  08:05, 12 August 2007 (UTC)
How about this:
<unindent> It might be nice to leave some sort of statement at User:Wherebot to the effect that there's no ghost in the machine. Last night when it was on its tagging spree I left a comment at User talk:Wherebot, and it took me a bit of poking around to realize that User:Where is evidently no longer active... and I'm not the only person whose comments are languishing. Even better would be if someone else got approval to run the bot in his absence. --JayHenry 17:33, 12 August 2007 (UTC)
Where may not be actively editing, but if you send him an email, he'll reply within a day or so. We had a problem a few months back with Wherebot shutting down and not restarting, and he happily fixed it. -- But|seriously|folks  19:44, 12 August 2007 (UTC)
He (Where) does indeed, in my experience, respond quickly to e-mail. --Iamunknown 04:12, 13 August 2007 (UTC)

bot owner request

Could the bots possibly ignore articles tagged with {{DANFS}}. These are invariably okay to use on Wikipedia, but might get deleted on accident if reported, or offend people who upload these articles. --W.marsh 18:41, 13 August 2007 (UTC)

CSBot does, or at least is supposed to ignore them now. Was any flagged in the last couple of days? — Coren (talk) 21:35, 13 August 2007 (UTC)
Several were today, but they seem to have been reported by Hermesbot. --W.marsh 23:35, 13 August 2007 (UTC)

Editcountitis and other diseases

Hey, CSBot now has 400 tagged articles since deleted. That's at least 400 fewer copyvios (not counting the rewritten articles). Yeay! — Coren (talk) 21:37, 13 August 2007 (UTC)

  • I fear we're slowing the crawl to 2 million articles... --W.marsh 23:35, 13 August 2007 (UTC)
Yeah, copyrights suck. -- But|seriously|folks  02:52, 14 August 2007 (UTC)

Manual second opinion

Hey there. One can now add a wikilink to User:CorenSearchBot/manual#Unprocessed requests and get a second opinion of a page from CSBot. — Coren (talk) 03:26, 14 August 2007 (UTC)

HermesBot

Is it just me or is HermesBot a little oversensitive?

Wikihermit: I suppose its sensitivity is tunable? Would you mind trying fiddling its sensitivity down for a day or so to see the difference? — Coren (talk) 04:20, 16 August 2007 (UTC)

Yeah. It seriously needs to be toned down; it's disrupting the process here. It does get it some of the time, but the noise is not worth it. — madman bum and angel 05:19, 16 August 2007 (UTC)
I'll second that. I'd rather have a few copyvios go through than leave our little crew of WP:SCV patrollers discouraged over a massive list that we will never be able to keep up with. -- But|seriously|folks  06:50, 16 August 2007 (UTC)
Agree with the above; he's way too sensitive. I'd also suggest blanket "ignores" for any and all naval ship articles. --Haemo 06:53, 16 August 2007 (UTC)
It would also be super if we could have the bots check to ensure they aren't reposting things but that's probably impossible. --Haemo 07:24, 16 August 2007 (UTC)
Not at all. — madman bum and angel 07:29, 16 August 2007 (UTC)

Would anyone mind if I ran HermesBot-reported articles through CSBot and then nuked them? This backlog is getting ridiculous, and it'll be even more so once we get back into peak hours. — madman bum and angel 07:30, 16 August 2007 (UTC)

Hey, well, we're down to a one day backlog O_o ---Haemo 07:37, 16 August 2007 (UTC)
Indeed; it looks better organized. Usually we only have a couple hours of backlog, though (4 to 5 articles). In other news, I just looked at an article where the user created the page then added {{copyvio}}. That's helpful! :D — madman bum and angel 07:38, 16 August 2007 (UTC)
You'd be surprised how often that happens. I basically found this page because I used to do this, but by monitoring the "newpages" log. This is much more efficient. --Haemo 07:40, 16 August 2007 (UTC)

I dunno, I think our goal should be to catch all copyvios, not just some of them. The original goal of this project was to make it reasonably unlikely that any new articles on Wikipedia would contain unfree, unattributed text. HermesBot certainly reported the most of any of the bots, but I hadn't noticed a higher rate of false positives. I think we just need to get better at handling the volume of reports generated... it is a lot of work, but I think it's important. The days of one active editor being all it took to keep this list from being backlogged are gone, I'm afraid. --W.marsh 13:04, 16 August 2007 (UTC)

There is a risk, however, that as the number of possible matches increase the humans' capacity to analyze them properly diminishes. I don't have numbers to back this up, but HermesBot leaves me with the feeling of a much higher rate of "marginal" matches— while it admitedly catches some that both the other miss. That's why I was wondering if reducing its sensitivity somewhat would leave us with a better pick of articles to check, but still catch the ones CSBot and Wherebot miss. — Coren (talk) 22:03, 16 August 2007 (UTC)
In fairness, I just reviewed 5 HB reports and only 1 was a false positive. And HB is now removing redlinks, which rocks! -- But|seriously|folks  03:27, 17 August 2007 (UTC)
Yep; he definitely seems to be improving. --Haemo 02:29, 18 August 2007 (UTC)
Yes, and YEAH FOR AUTOMATIC REDLINK ZAPPING!. Just had to say it loudly.  :-) — Coren (talk) 04:59, 18 August 2007 (UTC)
Someone needs to tell me if this is going on next time :-). I never really check this page :-p. ~ Wikihermit 21:30, 23 August 2007 (UTC)

Current CSBot stats and proposition

CSBot now has 925 reports to WP:SCV, and 798 deleted mainspace edits. Working hard to slow progress to two million articles... for great justice!

I've had a tought, and before I bring that to the BAG, I was wondering what other people tought:

What if, while its waiting for new articles, CSBot went fishing in older articles to see what it can ferret out? I'd exclude any webpage which has a link back to wikipedia (that will automatically exclude legitimate mirrors), and report copies by/from a single site just the once (so that illegetimate mirrors can be inventoried)?

That would also help ferret out copyvios that fell through the cracks, or that nobody ever noticed.

Good idea? Horrible idea? — Coren (talk) 23:09, 21 August 2007 (UTC)

I like the idea, but I would recommend doing it to a seperate page. Otherwise it could flood this page with mirrors who are not compliant, etc. Matt/TheFearow (Talk) (Contribs) (Bot) 23:17, 21 August 2007 (UTC)

A note: I wouldn't have CSBot tag those articles, since the false positive rate is likely to be higher and the probability that the external site is the one that ripped off Wikipedia increases with article age. — Coren (talk) 23:26, 21 August 2007 (UTC)

What might be more productive is monitoring large (1,000+ byte) additions to existing articles. Often unwatched articles tend to be repositories for dumped text from random websites... and this can go undetected for long periods of time on occasion. This has been proposed but never tried before, as far as I know. If you could get a list of pages to check easily (from one of the vandal monitoring services? I dunno), it might be worth a shot. The problem is inadvertently detecting reverts. --W.marsh 23:32, 21 August 2007 (UTC)

It's a good idea, but the number of non-compliant mirrors would be very high indeed. I'd suggest putting them on a separate page as well — perhaps also recommending excluding things in quotes? --Haemo 23:34, 21 August 2007 (UTC)

As for going through existing articles... we've done that before and it's excruciating work. It can be done, but it's very hard work, to go through about 200 the right way, it took several copyright experts the better part of a week's work. The whole idea behind WP:SCV is to nip copyvios in the bud before they get entrenched into articles... it's so much easier that way. I'm not saying the proposal's a bad idea... maybe a leisurely pace we could go through them. But it wouldn't be anything quick. --W.marsh 23:36, 21 August 2007 (UTC)

Something that might help (although that'd require a distinct bot) is that the matching algorithm I use allows me to present a sorta-parallel diff-looking result instead of a cardinal representing the amount of work needed to do the actual alignment (which is what I use right now). This would give a good tool to compare both texts side-by-side and see why they were deemed to match. That sucks for a bot, but a human could use this.
Well, actually, the algorithm doesn't give me the diff-looking thing. It gives me the data that can be used to generate the diff-looking thing.  :-) Nothing a bit of code won't cure. — Coren (talk) 23:43, 21 August 2007 (UTC)
Once articles have been around for awhile, they are not only propagated by mirrors but also copied into blogs, newspaper articles and all manner of websites. As W.marsh indicated, in order to see which came first, you have to go back into the history and figure out whether it was pasted in all at once or developed over time, and you often have to use archive.org to try to date the other website. It takes a lot of time, and most of the leads will probably be false positive. On the other hand, recent large additions are a fertile ground for detecting copyvios, and I think that would be a great place to focus attention. I also like the idea of having a side-by-side comparison to pinpoint what the bot sees as a copyvio. -- But|seriously|folks  04:50, 22 August 2007 (UTC)
I'd agree, monitoring large diffs would be great. I often go pishing in the recent changes for large diffs and I catch some copyvios, from time to time. -- lucasbfr talk 08:52, 5 September 2007 (UTC)

Track listings

Would it be reasonable to exclude bullet lists from comparison? I'd think that'd get rid of those track listing false positives, but I'm afraid more troublesome lists might slip by...

Or can someone else think of a reasonable way to not hit every track listing that gets posted? (The question on whether every single *beep* CD out there should have its own WP article in the first place is left as an exercise to the reader). — Coren (talk) 17:20, 4 September 2007 (UTC)

Good question. Most of the lists I've seen that are copyvios also have other copyrighted text that comes with them. My humble guess would be, maybe you could write a routine to ignore bullet lists that are copyvios of amazon.com. The Evil Spartan 17:35, 4 September 2007 (UTC)
Amazon.com and cdbaby.com. I think that would be a great start. Other types of bullet lists do get copied in with the bad kind of copyvios, and I don't think the noise from the CD tracklistings overshadows the signal (yet). -- But|seriously|folks  17:42, 4 September 2007 (UTC)

ClueBot II now removes redlinks

ClueBot II now removes redlinks every 10 minutes.  :) -- Cobi(t|c|b|cn) 19:29, 5 September 2007 (UTC)

Alptabot

From what I've seen so far, the new bot is doing a very nice job! -- But|seriously|folks  01:46, 16 September 2007 (UTC)

Thanks! I bugged Wikihermit for the source to user:HermesBot since he left. Alpta 03:03, 16 September 2007 (UTC)
I understand he's still around here somewhere . . . -- But|seriously|folks  08:10, 20 September 2007 (UTC)
Yep; that's definitely him. He got renamed (check the contributions), and now he's doing everything he used to do. Interesting, as is the username... — madman bum and angel 15:36, 20 September 2007 (UTC)

Transclusion

I switched the instructions at the top of the page to a transclusion format to make it a little easier for us to edit the list and a little harder for us to accidentally bork the instructions. I think I did it right. If anything else can be transcluded, assume I just didn't know how and go ahead and do it. -- But|seriously|folks  07:18, 26 September 2007 (UTC)

COBot

Welcome back to COBot, whose flurry a few days ago turned up many fresh copyvios not detected by the other bots. Now we just need a few more hands here to keep up with the new heavier workflow . . . -- But|seriously|folks  01:48, 10 October 2007 (UTC)

CSBot instructions

Hey there, I've added a list of frequently given instructions about how to fix copyvios to the top of CSBot's talk page. Could you guys take a look and see if I messed up or forgot anything? — Coren (talk) 18:44, 13 October 2007 (UTC)

I tweaked it a tiny bit. I 'spect that will save us a lot of work down the line. Great idea! -- But|seriously|folks  20:10, 13 October 2007 (UTC)

I will stop writing the article as there is no point if you are going to delete it. So go ahead. --Mattisse 19:18, 19 October 2007 (UTC)

However, it is not copyvio and you are discouraging people from writing needed articles. This bot is very unpleasant. It is the second time this week it has done this. I do not copyvio but from now on I will instantly cease an article when I get a bot notice. Unfortunately I continued to write Kansas v. Hendricks after the bot notice but I will not do that again. --Mattisse 20:05, 19 October 2007 (UTC)

Furthermore, the bot was slapped on even though I had citations, so it is a very stupid bot. --Mattisse 20:26, 19 October 2007 (UTC)
The bot is a bot. It detects text that appears elsewhere. It will exclude certain sources, but only if its programmer tells it to. The realm of sources that may properly be copied to Wikipedia is quite large, and it is impossible to program them all out. Still, 90% of what the bot flags is an improper copyvio. Don't take it personally if your article is among the 10% of false positives. It will be reviewed by a human before being deleted. -- But|seriously|folks  21:16, 19 October 2007 (UTC)
I take it personally because it happens to me several times a week now. I never create an article unless the citations are already in place. Never has an article of mind been deleted for copyvio (or has that ever even been suggested) and I have written hundreds. In fact, I have been criticized for been too citation prone. This nasty bot never used to grab articles until recently. It is a harassing bot and it definitely affects my willingness to contribute positive material. Every article that bot has grabbed has ended up being a DYK so that bot is off base and decreasing quality of life around here. --Mattisse 21:28, 19 October 2007 (UTC)
And please,be my guest and delete the article. I'm not going to waste any more time on it. --Mattisse 21:31, 19 October 2007 (UTC)
What is worse? Having lots of copyvio's on wikipedia, or your article being falsely tagged, after which the tag simply will be removed? Garion96 (talk) 21:41, 19 October 2007 (UTC)
There are already lots of copyvio's on wikipedia. Right now. I come across articles that have not been touched since 2005 that are straight copyvio. Why doesn't your bot go after those? I am tired of your bot. It has decreased my quality of life. As I said, from now on I will cease work on any article your bot tags. --Mattisse 21:48, 19 October 2007 (UTC)
The existence of such articles should be an indication to you of how serious the problem is and why the bot was created and is needed. It is definitely not a reason to stop. -- JLaTondre 21:50, 19 October 2007 (UTC)
(e/c)Several times a week? Per your talk history, CorenSearchBot has only left you two messages. In both cases, the article wasn't deleted. I am sorry you are offended, but copyright violations are a serious problem with new articles. We get way too many of them. The bot can only identify text that matches elsewhere. It cannot be certain what the copyright status of other webpages is. That's why the bot doesn't say it is a copyright violation; only that it matches other text. Each case is reviewed by a human and the tags on non-violations are simply removed. There should be no stigma associated with false positives. -- JLaTondre 21:50, 19 October 2007 (UTC)
It just started this week. Within a few minutes of creating an article, while I'm still receiving an error message from Wikipedia and I am not sure the whole article is saved, I get a message flash and the screen fills up with the delete message on the article. Really conducive to creating articles when that happens.
But I figured it out. There are many sites on the web that quote text from the Supreme Court rulings and other case law. For some reason, the stupid bot has decided one is a copyvio and all the rest are not. The bot hasn't noticed that most articles on court cases don't even have citations. I'm sure it is much more important to harass people within minutes of creating an article than anything else. Typical wikipedia thinking. Heck with this article. Wikipedia can have it. I'm deserting it. --Mattisse 23:52, 19 October 2007 (UTC)

remove http://supreme.justia.com/us/521/346/case.html & substitute another that User:CorenSearchBot doesn't think is copyvio but says the same thing)

For some reason the User:CorenSearchBot thinks this is copvio, so just use one of the other many sites that have case law quotations instead. The bot is stupid. --Mattisse 23:42, 19 October 2007 (UTC)

Going by my article history - you tagged me for copyvio for a site I did not even use! What's the deal? - Incompetent Bot!

I removed a source that I did not have to. I looked at your site and you said I had copyvio http://web.utk.edu/~scheb/Hendricks.htm but if you look at the article history that was not true. You tagged me for http://supreme.justia.com/us/521/346/case.html. That is contrary to what you Bot site reports. I don't think your bot knows what the heck it is doing. --Mattisse 00:30, 20 October 2007 (UTC)

Your bot accused me of copyvio of a site that itself is a copyvio !

Bot is screwed up and doesn't know the difference. Raising havoc in wikipedia editor's life for no reason. This bot should be stopped.--Mattisse 00:36, 20 October 2007 (UTC)

This bot tagged me as copyvio the same minute I created the article for the second time this week

[[1]] Look at the article history. And not only that, you were wrong.

  • [2] Here you tag me one minute after I create the article on October 10.

I want to know where I can complain about this. --Mattisse 00:50, 20 October 2007 (UTC)

I complained to ANI as tagging me twice this week within one minute of creating an article is wrong

I complained to ANI as I think this treatment is wrong and you seem to think it is O.K. And today your bot not only tagged me the same minute I created the article but the bot was wrong. The site you accused me of copyvio was actually a copyvio of the reference citation I gave in the article that you tagged. I have no hope that this will stop as this is the way of wikipedia. And you and your friends think this is an O.K. way to treat editors. But I do not. --Mattisse 01:18, 20 October 2007 (UTC)

I left him a polite explanatory note on his talk page. In time, he will learn to love the bot like the rest of us. -- But|seriously|folks  04:05, 20 October 2007 (UTC)
No. I will not create articles on Supreme Court decisions any more, as those are the only articles of mine the bot has tagged. It must have a fetish for that. And I object to the bot protecting a student's copyvio site of U.S. Supreme Court quotations in the U.K. when I had referenced by a footnote citation a legitimate source of the Supreme Court quotations. The bot is screwed up and I now will be skeptical of that bot's tagging. I will disregard it on other people's articles and abandon articles of mine, should it do it again. But as it is only Supreme Court decisions that it tags, just won't do any more of those. --Mattisse 14:04, 20 October 2007 (UTC)
For the record, you didn't put quotes around the part that was copied from the UK site, so it was plagiarism. Just adding a footnote when you took it word for word is not acceptable. The bot is a computer program that uses Yahoo to determine whether a page is a copyright violation. It cannot pass the Turing test, it is not capable of judgment, it cannot determine which of multiple sites containing plagiarism is the original. (Blaming the bot for that is grasping at straws, in my view.) As is stated abundantly clearly in the template it leaves, it's entirely possible for it to be wrong, and if so, just remove the template, explain on the article talk page, and continue editing the page. It's not harassment, and I'm sorry you felt that it was. I have been helping out on this page when I can for a little bit now and 99% of the time, it's not wrong, and it's incredibly helpful to root out the hundreds of pages we get a day that are copy and pastes of copyrighted text. I'm sorry that you were part of the <1%, but I don't think that it should have been viewed as any of the things that you say it was. —bbatsell ¿? 16:00, 20 October 2007 (UTC)
In all fairness, CSBot has a false positive rate closer to 10%— there are a lot of copies of copies of ultimately permissible text out there. But that false positive rate is, as far as I'm concerned, incredible given the difficulty of the problem set. — Coren (talk) 20:45, 20 October 2007 (UTC)

Can CorenSearchBot please only list articles on this page?

Seriously. Several established editors have complained about becoming overly stressed and wasting time dealing with the articles. On the discussions on this talk page, at the administrators' noticeboard, and on users' talk pages, administrators seems rather dismissive of editor's concerns that it is stressful. Well, I am here to testify that it is stressful. And I am strongly recommended for the second time that we listen to editors' concerns and restrict CorenSearchBot only to this page. Please. --Iamunknown 14:18, 20 October 2007 (UTC)

As opposed to placing a template? That would be very bad. Because I can tell you from experience working on this project page that a significant percentage of editors fix their own copyvio problems immediately once they see the template, so that we can merely take articles off this list rather than deleting them. Yes, it causes confusion occasionally, but I can tell you from personal experience that most editors understand what it is doing and accept it. This Matisse person is being extremely oversensitive and we should not succumb to his efforts to have us throw the baby out with the bathwater. -- But|seriously|folks  14:31, 20 October 2007 (UTC)
Butseriouslyfolks, several other editors have complained (one I can think of is User:Aude). This isn't just an over-sensitive editors, and my suggesting isn't based upon that. Getting a message telling you that you've created an article that infringes upon someone's copyright is stressful. We have editors here who have legitimate concerns. We should listen to those concerns. Tagging copyvios and notifying editors is not an end. It is an end to a mean - that of creating a free encyclopedia which is legitimately re-distributable under the GFDL license. When the menans by which we are doing this is overly stressful and multiple established editors complain, it is seriously time to rethink our means. --Iamunknown 14:38, 20 October 2007 (UTC)
Actually, Iamunknown, I think User:Aude is a very bad example. I don't know if CSBot has caught her(?) in a bad day, but she was being unreasonable. She was copying contents from her own website without licensing it properly but refused my suggestions to whitelist the site to avoid further tags; demanding instead that CSBot presume that "experienced editors" (however that is defined) that copy material from another site be presumed to own the copyright. You might want to read the conversation on her talk page which did continue after your intervention. — Coren (talk) 20:38, 20 October 2007 (UTC)
How do you know that a "significant percentage"[weasel words] fix their article because of the template? How do you know that they don't fix articles in spite of the template? This is Original Research on your part. When an article is less than one minute old, I believe the chances are that the editor is going to add to it anyway. How do you know that the bot was right to begin with and the so-called fixing was unrelated to the bot's inaccurate complaints or, as in my case, the editor did not spend significant time in trying to fix something that did not need fixing? Editing on wikipedia is stressful enough without the "under one minute of existence" tagging policy. "Personal experience" is OR. Most editors are intimidated into not complaining, as certainly I am most of the time, so you have no reliable data on this issue. --Mattisse 14:55, 20 October 2007 (UTC)
First, of all, that's not what the message says. It doesn't assume the editor is doing anything wrong or accuse the editor of copyright infringement. It says "The CorenSearchBot has performed a web search with the contents of this page, and it appears to include a substantial copy of: <url>." The very next thing it says is, essentially, if this is a mistake, just leave a note on the article's talk page. Which is tremendously innocuous.
Second of all, of course my personal experience is OR. I'm not writing an article here, so WP:NOR does not apply. I can't understand why you would not WP:AGF and respect the opinion, formed by experience, of an editor who has examined as many items reported to this page as anybody. (I have about 1700 edits to WP:SCV.) But you can verify this as easily as anybody else by going through the history of WP:SCV and looking for articles that were removed by humans, as opposed to redlinks deleted by bots. Then look at their histories to see how the author reacted to the CSB tag. For example:
  • http://en.wikipedia.org/w/index.php?title=Raj_Bhavan_%28Srinagar%29&action=history
  • http://en.wikipedia.org/w/index.php?title=Frances_Pinter&action=history
  • http://en.wikipedia.org/w/index.php?title=Guido_Maus&action=history
  • http://en.wikipedia.org/w/index.php?title=Postgraduate_Medical_Journal&action=history
  • http://en.wikipedia.org/w/index.php?title=Mishary_Rashid_Al-Afasy&action=history
  • http://en.wikipedia.org/w/index.php?title=Hava_Kohav_Beller&action=history
And that's just going back a few days, and only items processed by myself and Coren.
Respectfully, Matisse, the offense you are taking to the CSB tag is short-sighted. If it doesn't apply to your article, then delete the tag and continue writing, as many others do. As seen above, it's obviously helping some editors realize that their copied content should be rewritten, and having them do it themselves protects us from copyvio issues and saves the rest of us a lot of time and effort.
If you have suggested revisions to the template language, let's discuss that. But this bot and its processes are helping much more than they are hurting. -- But|seriously|folks  16:46, 20 October 2007 (UTC)
Oh, I'm just going to disregard it and desert the page if that happens again plus no more Supreme Court decision articles. I already have do much to do and to much stress on Wikipedia. I am implementing my "End the Pain Now" principle of deserting trouble on Wikipedia and I am certainly not going to bother with the steps involved with the bot you are talking about above in the first minute of creating an article. I'm just going to fool around on Wikipedia a while to relieve stress and to remember not to take wikipedia seriiously. But I am curious why the nasty bot has to do it's thing within one minute the article is created? Why not opt for outright torture? --Mattisse 17:06, 20 October 2007 (UTC)
After the first minute, articles are already propagated to mirrors and there will be a lot more false positives. Look, do what you want. Wikipedia will survive your desertion of articles, if that's really what you'd prefer to do. -- But|seriously|folks  17:36, 20 October 2007 (UTC)
Yes, wikipedia made it plain a long time ago that I am of little importance here. I know the technical types run this place and people like me are the bottom of the totem pole. It is not very rewarding writing articles. It is probably the least valued contribution and I do it for myself, not wikipedia as wikipedia could care less about me and my articles. I'm surprised you have even bothered to answer me at all. That is a first. So I thank you for that. Perhaps I've been spending too much time writing articles and need to fool around with other wikipedia things. Seriously, I do thank you for answering me. Regards, --Mattisse 17:54, 20 October 2007 (UTC)
I can't speak for the Wikiworld, but I certainly value article writers. (That should be obvious, as I have asked you at least three times to keep writing.) I find it to be difficult and time consuming work, which is why I end up spending more of my time doing Wikignome tasks. -- But|seriously|folks  17:58, 20 October 2007 (UTC)
Butseriouslyfolks, you are very very very different and I certainly do appreciate you and thank you for that. It is very kind of you to say what you have and it does help to know that there are people like you here that appreciate a hard working editor and do not immediately throw me into a category of an imbecile with condescending recommendations that I read up on what a copyvio is. I have probably more than 25,000 edits in mainspace articles and never have I been accused of copyvio nor has any of my articles until now. I realize that everyone is automatically treated like they are in kindergarten here, so it is wonderfully consoling to know there are a few people like you around that see me as a human being. There is no reward and much punishment on wikipedia for editing so thank you again so much. --Mattisse 21:21, 20 October 2007 (UTC)
Ironically, I could say the same thing about adminship. Thanks for the kind words -- they are few and far between. -- But|seriously|folks  21:34, 20 October 2007 (UTC)
I'm going to simply copy BSF, Iamunknown, and state that my own personal experience shows (and, indeed, a short perusal of the SCV history confirms it) that a great number of potential copyright problems are averted by the author fixing the problem when the tag is placed. Whether there is a causal relationship is less obvious, but many edit summaries confirm it, so do messages of thanks and mea culpa on my talk page (I regularly get "oops, I got lazy" messages) so that is reasonable to presume a large proportion of those would not have been fixed without the tag.

I understand some users might be annoyed, but I've read through my talk page history, and of the nearly 2500 tags CSBot has placed in its "life", less than 50 caused some sort of complaint— and of those 50 at least 40 were spammers/COI editors giving me their spiels about how they "are the webmaster of the orginal source" or how "a press release is obviously meant to be copied". When someone points out a source to me that's causing problems, or a PD attribution tag CSBot didn't know about, I've always been responsive (and, indeed, those editors have always been happy and collaborative).

In a word, I don't beleive there is a problem that needs to be fixed, and I think most other editors would also agree with me. I have been known to have only partial infallibility, however, so if you really see a problem I would genuinely want to hear details and see what I could do to help fix it. — Coren (talk) 20:10, 20 October 2007 (UTC)

Tweak the algorithm instead?

I think there's a different question that should be asked. Why did the bot actually tag the first version of this article and can the creator do anything to improve the algorithm. A human eye can easily tell that this wasn't a copyvio. I think we should ask Coren to see if he can tweak his algorithm for short articles like this. -- JLaTondre 17:41, 20 October 2007 (UTC)

>> procedures for the civil commitment of persons who, due to "mental abnormality" or "personality disorder" are likely to engage in "predatory acts of sexual violence." <<
Word for word taken from the cited source without quotes. That's 75% of the article. The first source that Yahoo found was actually a copyvio of the source Mattisse "cited", but it's hardly clear that "this wasn't a copyvio". —bbatsell ¿? 17:46, 20 October 2007 (UTC)
Hmm, you are correct. I did a search, but I must have screwed up because I didn't see that text. Ignore the question. -- JLaTondre 18:12, 20 October 2007 (UTC)
75% of an article 25 seconds old. If that is your casual disregard, then forget it. "mental abnormality" "personality disorder" "predatory acts of sexual violence" are quotes essential to the article and are still there. Civil procedures has to be in the article too. And of course Kansas v. Hendricks 521 U.S. 346 (1997) is a case in which U.S. Supreme Court upheld the constitutionality of Kansas' Sexually Violent Predator Act is also copyvio too, right. Please delete the article. --Mattisse 18:26, 20 October 2007 (UTC)
Enough, already. You have already made your displeasure with CSBot known; multiple times. You're making a mountain out of a molehill. The fact is, the article as was entered did have a majority of its text found trivially via a search engine, and it pointed that out. Whether that is actually a copyright violation is a matter of judgment for human reviewers who are quite smart enough to understand fair use, common sources and discern whether there is a problem or not.

As has been pointed to you multiple times here, on my talk page, and on AIV, you are overreacting to a harmless warning template placed by an automated process which is greatly beneficial on the whole. In the less than three months it has been active, CSBot has prevent over 1000 copyright violations from being added to Wikipedia, not counting the hundreds more that were avoided after the editor rewrote the article because it had been tagged.

Everyone has been civil and polite trying to point out what you can do about the warnings, giving hints on how to avoid getting them in the first place, and even suggestions to use an external editor to avoid the (at most once per article) rare edit conflict you might get because of the swift tagging. At this point, I think you are beginning to try everybody's patience (you certainly are trying mine) and there is nothing you accomplish by repeated complaining.

I, and everybody else, would be sorry to have you stop contributing because of so trivial a matter— but if having a 27 character tag placed at the top of an article you wrote is actually sufficient to cause you vast amount of stress; then perhaps you should rest some time? — Coren (talk) 19:54, 20 October 2007 (UTC)

Wherebot blocked

I have temporarily blocked Wherebot for twenty-four hours as it is malfunctioning. I'm aware that it does this from time to time, hence the short block; however, a contributor is creating articles en masse and Wherebot is flagging each one of them (for the record, the articles are non-violations), disrupting Wikipedia:Suspected copyright violations with blank reports by doing so. Where has been notified. — madman bum and angel 15:41, 8 November 2007 (UTC)

Uh oh!

We have a problem:

<?xml version="1.0" encoding="UTF-8"?>
<Error xmlns="urn:yahoo:api">
                The following errors were detected:
                <Message>limit exceeded</Message>
</Error>
<!-- ws03.search.re2.yahoo.com uncompressed/chunked Wed Nov 14 09:07:01 PST 2007 -->

Anyone knows how to ask Yahoo for a bigger limit? — Coren (talk) 17:08, 14 November 2007 (UTC)

I've unblocked Wherebot since we're otherwise undefended. The first 5 reports do not contain source links, but some of them may be copyvios. Let it catch a few more different types of articles before shutting it down, pls. If I get distracted and it needs to be stopped, hit me up on my talk pls. Thanks! -- But|seriously|folks  19:13, 14 November 2007 (UTC)
I've restarted CSBot on my own server, at least temporarily— the Yahoo limitation appears to be per IP and there are other tools on the toolserver that seem to use Yahoo (hence hitting the limit). — Coren (talk) 20:44, 14 November 2007 (UTC)
I have checked some of Wherebot's recent reports, and enough of them are not obvious copyvios to make me doubt that it is working properly, so I'm reblocking for now. -- But|seriously|folks  21:25, 14 November 2007 (UTC)

Bah. API change.

Looks like they changed the API yesterday. Small changes required; CSBot show now again be able to find new pages.  :-) — Coren (talk) 15:05, 17 November 2007 (UTC)


Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Suspected_copyright_violations/Archive_1&oldid=1190113030"